« Data Compression, Deduplication, & Single Instance Storage | Main | Regarding the Posting of Comments »

July 01, 2010

Comments

Dejan Ilic

Nice of you to give us a teaser of upcomming tech in Ontap. As a customer I feel that things are going a too slowly forward lately for my taste.

Now, regarding the mentioned compression, can you reveal what kind of data is the target group? We use Ontap deduplication on almost all our data except Exchange (due to MS not supporting it) but I see fairly low numbers on our enduser filedata (10-25%). Now such kind of data usualy compresses well and is realy static so the savings should be realy high.

Yet it depends on how you use the compression. The smaller the data-set (ie only within the 4K block) the smaller the gain I would expect.

Talking about 4K block. Deduplication keeps the "natural" 4K in WAFL unchanged. How do you solve it with the compression involved? Do you merge compressed blocks on harddisk? You will sudenly have sub-4K blocks involved. Or have I missunderstood something?

Vaughn Stewart

@Dejan - Thanks for the comments and for being a customer.

As for deduplicating a data set like Exchange, the array based storage savings technology is invisible to the application. It is my understanding that Microsoft has not published not support statements regarding lack of support for array based storage savings technologies.

Home directories tend to vary in the amount of file, and sub file redundancy. This in turn impacts the amount of storage savings with dedupe. We commonly hear customer stating 30% savings +/- 5%. Data compression is the ideal compliment to this data set as the contents of home dirs tend to compress rather well.

As for block size, all forms of data compression store the data in a non-native state. For WAFL less blocks will be read from disk to serve a file, which will then be expanded in array cache. Should the file be edited and saved additional work will be required by the array to recompress the content.

For additional information on the differences between dedupe and compression please see this post:

http://blogs.netapp.com/virtualstorageguy/2010/06/data-compression-deduplication-single-instance-storage.html

Andy Leonard

Excited to see compression arriving for ONTAP - we have lots of data that compresses well but doesn't dedupe (think millions of ASCII data files).

Are there any improvements coming in upping maxfiles and increasing performance with thousands of files per directory?

Mike Ivanov

Vaughn:

It's nice to see NetApp adding compression functionality into Ontap. We completely agree with your analysis of how compression combined with deduplication gives you the ultimate in data optimization (in fact Tom Cook recently wrote about it in his blog too). NetApp's approach to having it completely embedded is the correct (and safest way) to provide this functionality within primary storage since it is a read path operation. Compression combined with scalable and high performing dedupe is the right track.

Mike Ivanov - Permabit

www.facebook.com/profile.php?id=658313066

I asked for a license to try compression in our 7.3.3 cluster but was told its not actually available until 8.0.1?
Can someone please clarify which version of ONTAP will actually run compression routines?

thanks

Vaughn Stewart

@Andy - your data is why we have been doing the engineering work!

@Mike - Glad to see other storage vendors adopting storage efficiency technologies for production use cases.

@Fletcher - Compression officially release with DOT 8.0.1. At this time we are not providing early access (or PVR) for this functionality in 7.3.x.

The comments to this entry are closed.

TRUSTe CLICK TO VERIFY