This is a hard blog for me to write as I like to try and give people the benefit of the doubt. Sometimes individuals make mistakes or misread numbers and come up with inaccurate results. However, Christopher Kusek’s latest blog on EMC’s 20% guarantee program is so full of misrepresented information, cooked numbers and, seriously reckless configurations that I can’t help but believe it was an intentional attempt to mislead customers.
I do really hope that I’m mistaken on his motives and, perhaps Chris, after reviewing this information, can post an update clarifying his mistakes and set the record straight.
Chris claimed in his post that a configuration requiring 135TB of usable capacity would require over 350TB of RAW storage on NetApp whereas EMC would only require between 184TB and 206TB.
These numbers are so outlandish that anyone who has spent time comparing NetApp vs. EMC solutions—especially someone who has worked at both EMC and NetApp should quickly come to the realization that these numbers are clearly not right and need further investigation.
Chris’s first mistake was the choice of comparison tools. To gather the NetApp RAW storage required, he used the NetApp Storage Efficiency Calculator. For the EMC numbers, he used the EMC Capacity Calculator.
The NetApp Storage Efficiency Calculator is a tool we recommend customers use to identify how various NetApp technologies can save storage in a real-world model environment. In order to most accurately model these savings, customer waste in the form of over-provisioning, has to be taken into account. We call this customer generated waste “Overprovisioned Storage” in the tool and it can be easily identified by hovering over the dark grey section of the column. We separate this storage out so that customers can see the effects of enabling thin provisioning to combat overprovisioning.
The EMC Capacity Calculator tool is really just a visual raid-group builder, nothing more, and doesn’t take into account customer behaviors such as over-provisioning. Chris did not take this into consideration.
To get a better idea of the RAW storage requirements needed by NetApp, the Overprovisioned Storage should have been subtracted from the model as the EMC Capacity Calculator doesn’t factor this in. Leaving all other settings the same and only properly subtracting the over-provisioned storage, the new results are clear.
Configuration | NetApp RAW | NetApp Useable | Raw v Useable % | EMC RAW | EMC Useable | Raw v Useable % | Difference |
FILE+DB | |||||||
Default Checkboxes | 139.2 | 135 | 97% | 206 | 135 | 66% | -31% |
Uncheck Thin/Dedup | 184.5 | 135 | 73% | 206 | 135 | 66% | -8% |
Uncheck Snaps | 157.5 | 135 | 86% | 206 | 150 | 73% | -13% |
EMAIL/Collab | |||||||
Default Checkboxes | 144.2 | 135 | 94% | 184 | 137 | 74% | -19% |
Uncheck Thin/Dedup | 184.5 | 135 | 73% | 184 | 137 | 74% | 1% |
Uncheck RAID6/Snaps | 154.5 | 135 | 87% | 184 | 152 | 83% | -5% |
Even without addressing the issues in the reckless EMC configs (which I will do below), NetApp is still at least equal or better in default storage utilization than EMC. Once you enable deduplication, the model shows jumps from 19% to 31% in NetApp’s favor with these data types. Savings in virtual environments are much higher, but aren't being compared here.
Now, although it is very unlikely, I suppose the above oversight could have been made in haste. I’m sure Chris can clear up his confusion on the numbers after doing some more research.
What I found to be most disturbing with the post, though, was reckless configurations used in attempt to drive up the storage efficiency numbers of the EMC storage. Chris not only recommended using 14+1 RAID5 raid group sizes, something unheard of in production environments, but he also recommended that extreme raid group size with 600GB FC drives. This type of configuration, when deployed in a real world environment exposes the customer to extreme and unnecessary risk of data loss due to double disk failures. NetApp would NEVER recommend this type of configuration.
When attempting to reconcile the inadequate raid protection of the EMC numbers compared to the superior, high-performance, double disk failure protection that is both default and a best practice for NetApp solutions, Chris opted to downgrade the NetApp protection to RAID4, rather than bringing the EMC protection up to an adequate level. It appears this was in effort to avoid the additional efficiency penalty that EMC RAID6 requires when chosen.
Again, I want to give Chris the benefit of the doubt, but the facts don’t look good here. I do hope Chris will come back with an informed, real-world, production ready configuration that we can compare without all these shenanigans.
I’ll continue this discussion in Part 2, where I will share what I believe to be EMCs true motives for this guarantee program which, as you can see above, has no technical merit to actually beat NetApp’s storage efficiency.
emc doesn't need 14+1 R5 numbers to hit the 20% savings number. It works with all sort of raid configs and is does not rely on 14+1s
Posted by: timeotevolve | May 20, 2010 at 10:17 PM
Mike,
I don't quite understand how you can say that that if I have 135 TB of useable capacity requirements, that you can accomplish this with 139.2 TB of raw spinning disk.
When I use the NetApp storage efficiency calculator, It shows that for 135 TB of usable capacity it would require 204 TB of RAW. I did not change any values, just simply put 135 TB in my companies capacity requirements. That is more along the lines of 67% raw to useable capacity.
As for the EMC tool, and so we don't start the debate of best practices, or is that how a real EMC customer would deploy this, I used a very conservative all 4+1 R5 configuration, which in reality, most customers will deploy a mixture of RAID group sizes. In a roughly 35 TB NAS and 100 TB SAN configuration, I end up with 62% Raw to useable.
Now here is the interesting part. These EMC numbers are BEFORE I account for Virtual Provisioning, and dedupe alone. The NetApp numbers already take into account all of this stuff. Then I add things like FAST v2 and FAST Cache, I can get even more efficient.
Posted by: Pete Richardson | May 21, 2010 at 05:39 AM
Hey Pete,
I wonder if we are related :). Thanks for the comment.
The 135TB on 139TB of physical disk is achieved by block deduplication reducing the amount of physical storage the 135TB of data would consume.
The NetApp efficiency calculator is a top down look at storage efficiency, where as the EMC calculator is a bottom-up look at RAW to usable capacity. The tools aren't outright comparable.
All LUNs have some wasted space in them. Typical customers provision nearly 60% additional storage than they need. For example, a filesystem on a 1TB LUN may only be consuming 300GB of capacity. The additional 700GB is allocated, but not holding data. This is over-provisioned storage. The EMC calculator is just looking for a number of usable capacity that will be provisioned to the host (eg. the 1TB LUN). The NetApp Calculator is looking for the actual "consumed" capacity (eg. the 300GB within the LUN) as the input. By having you put in the estimated consumed capacity and allowing the tool to estimate typical "wasted/Overprovisioned" space, we can get a rough idea on how much thin provisioning may save you.
If you just want a RAW to usable comparison with no additional efficiency features, NetApp uses another tool for that which is almost identical to EMCs calculator. If you think about it, the RAW to usable numbers (without other features) should almost always be similar as both NetApp and EMC provision Raidgroups into usable capacity.
NetApp does have an additional 10% overhead for our virtualization layer that EMC doesn't have, but NetApp also recommends much larger (14+2 thorough 20+2) raid group sizes for all production workloads regardless of performance needs that improves our efficiency. When considering these together, the RAW to Usable numbers should be a near wash. I'll demonstrate that in my next post.
The discussion here SHOULD be on how we can reduce usable capacity after the raidgroups are built, as that is where the differences lie.
I hope this helps. The calculator can seem a bit confusing and I'll touch base with some contacts internally to see if we can have the wording updated to prevent misunderstandings like this. Thanks again for the post!
Posted by: Mike Richardson | May 21, 2010 at 04:04 PM
Guys, the secalc.com site has a bug (that Kusek exploited) when calculating netapp storage with all efficiencies turned off.
226 base 10 TB (or about 200 base2) are needed to provide over 150TB usable without any of the space efficiency features turned on.
I used the internal Synergy calculator to figure that out.
So, it looks to me EMC will have to start giving away a bunch of storage.
D
Posted by: Dikrek | May 22, 2010 at 11:16 AM
Dimitri, I honestly have to say I think it is absolutely adorable and sweet that you are claiming I "exploited a bug on the secalc" whereas, as I stated in my post and as I'll state here, had I KNOWN there was a clear bug, I would have not published what I did as I had found it. This is actually the first time I had ever used the NetApp Space Efficiency calculator in the several years since it was published - Can't blame me for assuming that the data that it would report to me (while I was at NetApp, and now that I am no longer) would be anything but accurate, considering the longevity it has been out there.
I do appreciate all of the attention you guys have been paying on this matter, and please do let me know when the calculator has indeed been corrected so I can run through the figures again - considering on both of our sides these are customer public calculators and being that they are intended to reflect an accurate picture of a customers storage needs, I'd like that message to continue to be honest, forthright and true; and not let things like bugs misrepresent the data.
Thanks and I look forward to putting this matter to bed! Until that time, I won't have a whole lot to share on this subject!
Christopher!
Posted by: Cxi | May 23, 2010 at 06:44 PM
What happens when you use 2tb SATA drives under ontap 7? Isn't there a 16tb limit still under ontap 7? Wouldn't the aggr max out at 8 or 10 drives(6+2 or is it 8+2)? Please recalculate the numbers with 2TB SATA drives with ontap 7 (what most of your customers use), to put the calculation to the test - even with your misleading use of raid groups sizes..wow - 20+2s!!. btw ...I've never seen any ntap customer run 20+2s, ever, and I've come across 100s of ntap environments.
Posted by: timetoevolve | May 25, 2010 at 06:33 PM
So again, I'm trying to compare real world configs. Not the configs EMC would like to compare. I would not recommend 2TB drives on 7 because of the aggr size limits. 64bit aggrs were designed for these types of large capacity drives. There is more to this than capacity alone as 64bit aggrs provide much better performance as well. I'm not even sure 2TB drives are supported in 7.3. However, when running similar configs in 8.0 with 16 disk raidgroups and 2TB drives, I do show that, from a pure RAW to Usable comparison, the EMC config would require about 16% less RAW. The primary reason why SATA is less capacity efficient for us than FC is due to the block checksums we use. We've opted to use a checksum layout that requires 11% overhead on SATA disks, but significantly enhances the performance we can get out of the drive.
Mike Riley blogged on this a while back. http://blogs.netapp.com/efficiency/2009/08/measuring-storage-efficiency-part-ii-taxes.html
As Mike states, the checksums make sense, especially when you start running additional space savings technologies on top of the drives, such as deduplication and compression and are part of the reason why our dedup performance is so high, even on SATA.
As far as the 20+2 comment, the raidsizes are based of the drives used and how large the aggr can grow. 20+2 is perfect for 450GB drives. Since 450GB drives are newer, that may explain why you haven't seen the config. I don't see how using 20+2s are misleading or risky, especially since they have many times the failure protection of a RAID 4+1. Are you also discouraging the use of 4+1?
Posted by: Mike Richardson | May 25, 2010 at 08:08 PM
while you guys are fixing your calculator, perhaps you could take a look at the RAID 6 numbers? Turn off all features EXCEPT RAID 6 and the netapp calculator claims storage savings (huh? for RAID6??)
Posted by: ex-auspexian | July 23, 2010 at 01:24 PM