This is the blog archive of Mike Richardson, to be used for reference purposes only

July 21, 2010

Does Your Data Wait in Line?

I think many people misunderstand write caching and how it plays a role in storage performance.  I believe this is mostly due to competitors trying to compete with the success of NetApp Flash Cache.

First, A Flash Cache Detour:

I’ve been able to see the great results of Flash Cache myself at local customers.  Below are some real charts from a customer POC that shows the benefits of Flash Cache in their environment.  Yes, they went with NetApp after seeing the results.

The first chart represents a NetApp controller with real-world test I/O patterns generated by their application.  Flash Cache was disabled on this controller until near the end of the run, where you can see the disk usage drop dramatically.  Disk utilization prior to that point was nearly 50% on 2TB SATA drives. Network throughput represents the green line, while the blue line represents disk throughput.Customer POC 1 of 2


The below chart is another controller with NetApp Flash Cache enabled throughout the test.  You can see disk throughput is significantly less (0-4%) as over 90% of the I/O is being handled straight out of Flash Cache.  Those are impressive results that are allowing this customer to get some serious distance out of these 2TB SATA drives.Customer POC 2 of 2

With so many NetApp customers seeing the benefits of Flash Cache (over 1PB sold), it is easy to understand why our competitors are trying to differentiate themselves. One of the arguments that is brought up often is in regards to write caching.  Our competitors frequently argue that we have somehow overlooked write performance in our architecture by only heavily caching reads.

Onto the Details:

We use write cache differently than our competitors.  Our competitors store writes in cache to act as a buffer during I/O bursts.  The write cache allows I/O acknowledgments to be returned to the host nearly instantaneously and the data written to disk at a later date. This works well, for a time, but eventually the laws of physics catch up.  Eventually, the data stored in cache has to be written to disk and, therefore, the disk has to be fast enough to receive the amount of I/O and also agile enough to process the pattern of I/O (random vs. sequential) before the cache fills up.  If the cache fills up, the true performance characteristics of the drives emerge to the clients in the form of latency and retries.

The following diagram represents this unfortunate, yet common, scenario with write caching.  The different colored dashes represent I/O from various hosts each writing to their dedicated location on disk.  Because traditional arrays cannot dynamically choose data placement, disk heads are forced to seek all over the platter to serve various client write requests.  Slow writes mean full cache and full cache means client latency.

Legacy I/O Contention

This type of caching only lets your data wait in line before getting to disk.  If you compare it to an amusement park, large write caches mean bigger lines—not faster rides.  NetApp’s approach is to keep the line short and focus on fast rides.

Instead of using a large write cache as an I/O buffer, we use a fraction of the amount of cache our competitors do.  Buffering is not the primary purpose of our write cache, data analysis is.  By storing new writes in cache, it gives our arrays the ability to analyze that data, compare it to the current disk structure and dynamically determine the most efficient place to put it on disk.  See, in nearly all environments disks are easily able to handle the throughput requirements of application I/O.  What drives lack is the agility to respond to different I/O patterns (sequential vs. random) with the same level of performance.

For demonstration, I configured a single 3 disk NetApp aggregate (2 parity, 1 data) to demonstrate how much random write I/O I could get out of a single 1TB 7200 SATA drive. I use a full random write workload of 4KB blocks on the disk and measure the maximum sustained I/O I could achieve.  Even though a 1TB data disk would max out at around 80 IOPs or less with random I/O.  NetApp can dynamically modify the random I/O pattern into a sequential workload—something a 1TB drive can handle far better!  The result is over 4600 random write IOPs with an average response time of 0.4ms.  The best news is that as drives become larger and their areal density increases, more and more sequential IOPs are possible.  So, while our competitors will struggle more and more with the random I/O capabilities of large SATA disks, NetApp customers will benefit from the sequential I/O enhancements!

1TB disk layout

1TB Disk 4600 IOPS 

The below diagram is an example of how we can cache I/O for a short period of time, analyze the best placement on disk and then place that data as sequentially as possible on disk.  This minimizes head seek time and allows us to get orders of magnitude more write IOPs out of SATA disk than our competitors can even get out of 15K RPM FC drives.NetApp I/O Optimization

What Does This Mean To You:

As datacenters become extremely dense through virtualization technologies and consolidation practices, storage arrays will need to accommodate not just the additional throughput requirements of thousands of virtual host systems, but they will also need to respond to hundreds of differing I/O patterns.  The scalability of NetApp Flash Cache along with the flexibility of data placement inherent in Data OnTap, allow NetApp customers to get far better performance and resource utilization with low-cost SATA disks than with more costly, competitive solutions.

So, which do you prefer: Long lines or fast rides?

June 25, 2010

Everyone Has a Tandy - The Human Mind and the Evolution of Information Technology

I think sometimes we get so caught up in how fast technology is changing and evolving that we lose track of the bigger picture of life as humans and how things fit together in our world.

There are important parallels in our world between how humans are designed for maximum efficiency and how technology is, or can be, in a similar fashion.  I think it is necessary to understand these parallels to appropriately assess current technology design and accurately architect improvements.

First, I think it’s important to realize that both technology and the human body must ultimately be designed to work with the human mind.  When considering what all the image human body does to ensure proper operation, it is beyond amazing.  Many systems within the body operate with limited or the complete absence of conscious thought—controlling heart rate, cellular food processing and waste removal, the flow of and oxygenation of the blood and even the storing and processing of information.  We are limited by our conscious thought.  There are only so many conscious “threads” we can process at one time, and so, to prevent our conscious mind from getting overwhelmed, nearly all of the most important functions of our body must be delegated to automated systems.  Some of us can’t even pat our head and rub our stomach at the same time…imagine if you had to consciously control your heart rate or it would fail?  Most of us wouldn’t survive a day and wouldn’t be much use to anyone around us during that time as keeping our heart beating would demand nearly all of our attention.

Now lets shift gears and consider how far we have come with information technology.  Consider one of the most primitive, yet advanced for its time, forms of long distance information technology—the telegraph.  The most widely known Morse Telegraph was invented in 1837 and used Morse code to transmit and receive  information.  Even the fastest telegraph operators still transmitted at well under image100WPM.  What if the internet still required telegraph operators to transfer information?  It would be impossible.  Processing this type of “RAW” information quickly is just too much for the capabilities of our consciousness.  It’s a human limitation.  In a similar fashion, imagine if you had to do the parity calculations of your RAID groups yourself, before they could be written to disk. SSDs would not help you there!

The truly amazing thing about our technological change over the last hundred years or so, is not how much technology has advanced, but what hasn’t changed.  The human mind’s processing capability is largely the same as it was thousands of year ago. Yet, we are able to use technology—almost as an extension of our bodies—to work around the limitations of our minds and process, share and store (literally) mindboggling amounts of information at incredible speeds.

The mind is the ultimate “Application Programming Interface (API)” for which we must program all technology to interface.  As time advances, we don’t have the luxury to “upgrade” our mental processing capability or the information storage capacity of our minds.  Imagine if all the software we use today still had to be written to accommodate the hardware limitations of the Tandy 1000 or the Commodore 64?  Software would be designed a lot differently!  Yet, this is how we have to approach software design as it interacts with the human mind.

I think there are three main parallels we can draw between the human body and Information Technology.

Information Storage

This is the most important, as in order for any kind of decision to be made, information is required.  Occasionally, I’m so zoned into watching TV that I am completely oblivious to my wife speaking to me.  Afterwards, I have a fleeting thought that she may have said something, but with no recollection of what it was, I have to guess at a response.  This “guess” usually lands me in hot water.  This is how life would always be if we couldn’t store information. 

How long should information be stored?  It varies widely depending on the information.  Generally, though, it should be stored for as long as it is needed to make decisions.  Your tax statements may be needed for years, anniversary dates for a lifetime.  However, the information on the trajectory of a baseball flying at you is only needed for so long as to decide where to place your glove—a second or two at most—then it can be discarded.

While the human mind can store a large amount of information, the vast majority of what we store is no longer retrievable after about 18 seconds—the typical duration of short-term memory.  I recently had to remember where I put our passports after a trip we took nearly 2 years ago.  Needless to say, the task was impossible and, although I did find them, it was due more to searching than recollection.

CAPACITY: To improve the storage capacity of our minds, we have to offload that capacity to technology.  Technology, when properly implemented, can store near-limitless amounts of information in a way that can almost guarantee no data-loss.  Storage technology acts as an extension of the mental storage capacity of our minds.  Even better, though, the massive capacity of storage technology allows information from many human minds to be stored collectively, in one location.  Just like the BORG.

STORAGE: The trouble with information is that it is bound by time.  Information exists only in the present unless recorded to be accessed in the future.  So, the speed at which we could store information is crucial.  We could accomplish the capacity requirements of information by writing everything with pen and paper—and did so for centuries— however, information is generated so quickly that it could not be recorded fast enough through past means.  This is why much of world history is lost forever.  Imagine writing out the TB or PB of information you store in your datacenter on paper.  It would take more than your lifetime.  If we can not record information in the small instance in time it exists, it will forever be lost.  Storage technology allows the human mind to reliably store GB, TB, even PB of information in real time on storage media.

RETRIEVAL:  Loss of information can have serious consequences.  Imagine forgetting where you live, your wife’s name or how many kids you have.  I’m reminded of the Seinfeld episode where a car rental company doesn’t honor Jerry’s car reservation.  The classic line is something like “anyone can just take the reservation, the important thing is holding the reservation!”

Likewise, anyone can store information, the important thing is retrieving it. Thank goodness our minds are incredible enough to store loads of important information in long-term memory.  However, even long-term memory is not very reliable—leading us to forget things often—and, even worse, information in our memory dies with us unless recorded elsewhere.  Speed of retrieval is just as crucial.  What good is having information stored if you can’t access it by the time you need it.  Information is only valuable before making the decision that requires the information. Not being able to remember your wife’s birthday until the day after has the same consequences as forgetting it all together.  Storage technology allows us to not only reliably store centuries worth of information in a way that maintains data integrity far better than our minds can, but it also gives us the capability to retrieve large amounts of that data at the moment it is needed, greatly improving the accuracy of the decisions we make.

Information Presentation

Having information available quickly does no good if you can’t interpret it in time .  Our consciousness has limited processing capability and so, to compensate, our body presents information in a way our consciousness can handle.  Likewise, storage technologies must manage the benefits of being able to store and retrieve large amounts of information by presenting it in a way our limited minds can still understand.

FILTRATION:  To aid in the management of information, the mind filters out information that is less important.  Consider your sense of touch.  In a restful state, you are largely oblivious to the sense of touch on your skin.   For example, look at a hair on your arm.  Can you feel the presence of that hair while nothing is touching it?  Most likely not, because its condition is irrelevant and therefore, the “status” imageof that hair is filtered from your consciousness.  Now tug on that same hair and your mind immediately makes you aware of it.  Being aware of the constant condition of all the skin sensors in your body at all times would be overwhelming.  In contrast, rare conditions that leave a person without the ability to recognize pain are extremely dangerous to one’s safety. The body aids the mind by suppressing irrelevant information to prevent overload, while ensuring crucial information, such as pain receptors, get top priority and even interrupt conscious thought.  Similarly, information technologies allow us to quickly filter through PBs of irrelevant information to provide is the important information we need, while interrupting us when crucial information becomes available that we must be made aware of and react to immediately.  

FORMATTING:  Sometimes even after information is filtered there is still too much for our minds to easily process.  Our eyes along with our mind, take the raw informationimage of shapes and colors and present our consciousness with clear pictures of our surroundings that we can use to make quick decisions.  Today I received a 7mb perfmon  output from a collegue looking for assistance decoding the several hundred column spreadsheet.  I couldn’t make out any comprehendible conclusion just looking at the raw text, until I imageformatted it into a few graphs. Likewise, information technology is able to format raw data into forms such as charts, graphs and images that allow our minds to much more easily process this “organized” information. Without the impressive formatting capability of information technology, our minds would choke on the details long before we ever saw the big picture.

Decision Delegation

The final and most brilliant aspect of the human body is found in decision delegation. Even after filtering and formatting pertinent information, the human body is still faced with far to many decisions for our conscious minds to respond to.  Through the process of delegation, we allow our bodies to process information and make decisions in real time, without requiring input from our consciousness.  Take reflexes for example.  Reflexes allow our muscles to arbitrarily move our bodies out of danger image in a fraction of the amount of time it would take our minds to acknowledge, process and respond to the same stimuli.  Similarly, the body has dozens of regulatory systems that we have little awareness of.  The Circulatory, Lymphatic, Endocrine and Digestive Systems are all examples.  I believe that these near-perfectly automated systems are intentionally walled off from our less-than-reliable consciousness for our own safety.  I can’t even reliably take out the trash, I’m glad I’m not responsible for my own digestive system!

With information technology we must mimic the body’s model and strive for the same ultimate realization.  We have far surpassed the bodies ability to store, share and present information to our conscious minds.  Yet we are left with so many decisions  just to manage what we have created that we are at a saturation point.  Only the delegation of decisions to automated systems, as the body does, can help us ease the tension on our limited minds and allow us to move forward.  Like our body’s predictable behavior, technology can be automated in a way to improve quality by removing the variations that come with human interaction. The good news is that the automation has already begun.  Technologies that detect data loss and automatically repair it from parity, or monitor and adjust performance needs on the fly or identify loss of data availability and switch to redundant infrastructure are already in place. The more complex a system becomes, the greater the need for automation.  Through Information Technology we are able to manage the automation with far less effort than would be required to manage each system.

Unlike our bodies, we do have the ability to establish policies in automation.  This gives us incredible control in interpreting and reacting to information with very little thought required.  Through automation policies, we can program how we would respond to stimuli into the automated systems and then allow the systems to decide for us in the future.

I believe policy-driven automation will play a significant role in our future for both IT organizations and human life.  We must continue to offload information processing and decision making from our limited consciousness into the limitless resources of our technology infrastructures.  I challenge you to take a look at your infrastructure after reading this, I’m sure you will be able to draw many parallels between how your body works and the elements found in your datacenter.  I hope this blog will help to challenge the way you think and approach technology from now on with a more automated mindset.

May 25, 2010

Playing to Lose, Hoping to Win: EMC’s Latest Guarantee (Part 3)

With this post I’m wrapping up my analysis of EMC’s latest guarantee.  I spend nearly every day delivering unbelievable storage efficiency savings to new customers, most of which—at least in my neck of the woods—are EMC converts. The post-sales feedback I receive from customers are generally very positive, which is also consistent with the industry surveys I’ve seen—so it is not just my territory. 

I have to admit, when I first heard of the guarantee, I thought it couldn’t be true.  EMC has traditionally evaded challenges based on real figures and instead has chosen to play in the fantasy land of marketing.  EMC still refuses to submit industry standard benchmarks and I’ve never heard of them doing a right-of-return.  To see a guarantee that must be backed by real numbers, in the real world, caught me by surprise.  I’m a numbers guy and I know the numbers don’t add up.

When numbers don’t add up I have to find out why.  At first, I was hoping that there were some additional details to their very vague announcement—something that would give me some real numbers. No dice.  I was starting to become suspicious until Christopher Kusek’s original post on the guarantee where he seemed to “compare” NetApp and EMC’s savings.  If I had any doubt this offering was targeted at NetApp, Christopher’s blog erased it.  As much as I’m starting to like Christopher, his numbers were way off on the original post and have since gotten only slightly better.  I used both the EMC Calculator and the NetApp internal calculator in Part 2 of this series to accurately compare 2 real world configs and demonstrate that the RAW to Usable conversion is a wash and the real comparison should be in the virtualization layer.  As I expected, once the real numbers were out, EMC quieted down.

So, if the numbers don’t add up, how can they offer such a guarantee?  This is where I realized I was thinking about it in the wrong way.  See, I now suspect EMC knows they cannot win in the numbers with this guarantee.  But, they also realize they are losing to NetApp without it.  The strategy?  Offer a marginal guarantee to compete directly with NetApp’s efficiency and the NetApp 50% and 35% Efficiency Guarantee.

Let’s look at a few reasons why I believe this is the case:

  • Details are Very Limited: EMC touts the lack of detail in their guarantee as a statement of their confidence in the numbers.  Perhaps.  But, business decisions must be calculated; legal makes sure this happens to reduce risk.  If you open yourself to true potential liability, legal wants to make sure the bases are covered in the details.  If you approach legal with a 20% discount program disguised as a guarantee, the calculated loss is already there.  I don’t think EMC needs the details, because the details wouldn’t help their case in a dispute.  I think they are planning to give up the storage when challenged regardless.
  • SAN and NAS Required: The only real details/requirement listed in the announcement is that 20% of SAN and NAS are required.  Why would both be required?  Maybe both are required to meet the efficiency numbers? Or, perhaps requiring both protocols adds to the initial cost and helps protect the profits when they have to give up some storage later? Furthermore, why not offer the guarantee on all platforms, including the CX, the V-MAX and even the V-Plex?
  • Storage Growth is Explosive: With storage growing at 30-60% year over year for many companies, winning the initial business (even if you have to give away some storage) still lets you win big in the long run.  When competing with NetApp, EMC already has to give away storage in the form of steep discounts to protect future business.  This could be just a new and more sophisticated way of doing it.

When you think about it, EMC is still a great marketing company and this announcement reinforces that.  They can now say “We have a guarantee too!” when competing against NetApp and I’m sure they will use it to their full advantage.  They’ve done a good job of limiting the guarantee to 20% on one platform and ensure customer’s buy both NAS and SAN to protect their profits.  It is a good, calculated business decision.

With all that said, I could be wrong and just creating my own chaos theories. However, if I’m right (and I do see the numbers every day), you could be misled into storage strategies that—while looking good initially—cost you significantly more than you planned in the future.  EMC won’t keep giving you storage for long.  How do you protect yourself and make sure that EMC’s guarantee is more than just a marketing gimmick?  You may consider asking them to commit to a right-of-return for your specific configuration and let you return the gear if they don’t meet your criteria.  From my experience, NetApp will do whatever it takes to make you comfortable with our solutions and our claims in your environment before, during and after the sale.

May 24, 2010

Playing to Lose, Hoping to Win: EMC’s Latest Guarantee (Part 2)

In this post, I’m continuing my analysis of EMC’s latest guarantee and Christopher Kusek’s blog with the deceptive inaccurate information.

First, some kudos to Christopher.  Christopher has finally updated his blog to remove the 14+1 Raidgroup config and note that the EMC Capacity Calculator and the NetApp Storage Efficiency Calculator can’t currently be compared to each other.  I applaud these steps in the right direction and hereby restore some of Christopher’s credibility :).

More work still needs to be done as the NetApp config he created with his own recollection isn’t quite accurate (it is not horribly wrong though) and the (2) 12+2 raidgroup config he compared it to on the EMC NS doesn’t seem to be possible in the real world.  I’m a realistic guy though, and despite all my efforts here, I don’t expect Christopher to come right out and say EMC is wrong about their claims in the guarantee.  So, I’ll have to take his silence after the facts are presented as the closest possible alternative to agreement.

First, a little guarantee of my own.  You can expect to find 50% less SPIN on this blog than the comparable competitors blog :)!

My objective here is to compare real-world NetApp and EMC configs to clearly show the RAW to usable conversion.  I’ve been at NetApp for nearly 5 years and configured  NetApp storage prior to joining.  I’ve worked with dozens of customers on hundreds of projects with several PB of total storage configured.  I’m solidly qualified to discuss NetApp best practices from a perspective of the field and what I recommend to customers every day.

With all that said, my opinion, here, is that this post in particular is largely a waste of time for the following reasons.

  • Storage Efficiency does not happen at the raidgroup level it happens at the virtualization level.  The process of building raidgroups is largely unchanged since even the DASD days.  I’m not sure why EMC is directing people to a raidgroup calculator to demonstrate storage efficiency savings…is their efficiency story that bad?  What are customer supposed to say after they create a raidgroup…”wow that is so efficient!”?  Customers need more; and this discussion they started is a diversion from that.
  • NetApp and EMC both use raidgroups to create the containers for our storage.  The RAW to usable capacity can’t be that much different.  Sure, there is a 10% overhead for NetApp virtualization, but we make up for that with larger raidgroup sizes that EMC doesn’t yet recommend.  And, with dedup alone savings of 70%,80% and even 90+%, does a 10% overhead really matter?
  • EMC’s guarantee isn’t about actual savings or efficiency.  It is a marketing mechanic I’ll explain in the 3rd part of this series.

Onto the details…

Ok, so I’ve attempted to configure 2 separate NS-480 configs with a 50/50 split between NAS and FCP.  I then built like configs in the NetApp world based off our best practices and what I recommend to my customers every day.  The NetApp config is the same, as we mostly recommend the same general aggr best practices regardless of disk failure protection or performance.

This is an EMC promotion, with NO caveats.  So, they must compete with real world customer configurations, as they will do when they actually market their configs against us.  EMC didn’t guarantee 20% savings over the EMC proposed NetApp config but, NetApp real world configs.

Besides, they even went to far as to say Even though our competitors try to match it by altering their systems, turning off options, changing defaults or tweaking configurations—no amount of adjustments can counter the EMC unified storage advantage.”  So theoretically, despite all my efforts, I should not be able to match EMC’s RAW to usable numbers.  So beware, if I succeed in this impossible task, I may destabilize the universe as we know it!

EMC Configuration A: 4+1 RAID 5

EMC configs vary widely.  Many customers still use and tell me EMC recommends RAID 10 for many workloads.  Obviously comparing RAID 10 to NetApp RAID DP, while true in the real world, would generate too much ranting from the critics.  Instead, while working with an EMC customer this week (they are migrating a portion of their data to NetApp) I asked them what their best practice RAID group configuration was.  This customer uses all 4+1 Raid 5 raidgroups, which I consider to be probably the most typical configuration.  I even found several EMC reference architectures that use 4+1.ScreenHunter_04 May. 24 18.27

I used 450GB FC drives and pretty much filled up one cabinet of the calculator with 4+1s and adequate spares.  I had enough space left over for one final 3+1.  Here is raidgroup layout.  I’m not an expert at configuring EMC, so please feel fee to comment if this is a bad 4+1 config.

I also left off snapshots, as configuring some of the usable as snapshot reserve has the same effect on both EMC and NetApp.  One thing I noticed was that you have to configure NAS and FCP storage separate.  I wonder if you can’t repurpose NAS to FCP and vise versa on the fly as you can with NetApp.  You will notice on the NetApp config that we just configure capacity, not NAS capacity or SAN capacity as it can be used for both.

ScreenHunter_05 May. 24 18.27

Here are the results for the EMC capacity calculator.  Started with 165 total disks, configured EMC’s best practice of 6 hot spares as the calculator required and was left with 24.86TB of NAS space and 24.77TB of SAN space.  The grand total of usable capacity for this configuration was 49.63TB.  Again, please comment if this not a typical config.

NetApp Configuration A: 20+2 RAID DP

Remember that we are comparing real-world configs here, not the config EMC would like to compare against.  I regularly recommend configurations like the below in production.  However, there are differences between the configs and I’ll get to those after I present the config. 

First, the NetApp calculator doesn’t allow me to specify the exact number of disks, I have to configure whole shelves.  To get at least 165 disks in the calculator, I had to configure (7) 24 disk shelves for a total of 168 disks.  NetApp normally recommends 2 spares for the first 84 disks on each controller and 1 spare for each 84 disks after that.  168 disks between 2 controllers comes to a total of 4 spares.  However, to make up for the fact that we have an additional 3 disks (168 compared to 165)  I configured an additional 3 disks as spares so they wouldn’t be counted towards the usable capacity.  So, 4 spares, plus 3 un-needed spares comes to a total of 7 spares.

image The tool I’m using is an internal capacity calculator available to NetApp employees and partners.  it is the same tool we use to configure usable capacity in the field.

The configuration I recommend is to the left.  With 450GB FC drives, the maximum drive count you can have in a 32bit aggr is 44.  This divides evenly into 2 raidgroups of 20+2.  I am usually comfortable recommending between 16 and 22 RG size, although NetApp supports FC raidgroup sizes up to 28 disks.  Starting with the same amount of total disks (168 – 3 un-needed spares), the remaining disks are split into 8 RAID DP raidgroups. After subtracting an additional 138GB for the root volumes, the total usable capacity for either NAS or SAN is just under 52TB.

Option A Results

EMC 50TB usable

NetApp 52TB usable (2TB or 4% more)

Ok, so the examples above differ in a few ways.  If you are an EMC employee, you are probably saying they are not fair because we compared Raid 5 to much larger Raid 6 raidgroups.  This lends a capacity advantage to NetApp.  True, but these are both real world configs.  I would never recommend smaller raidgroups or a lesser raid type (raid4) for performance or availability reasons.  EMC, however, doesn’t recommend RAID 6 for all configurations, especially for random write workloads.

If you are a NetApp Employee, you are probably saying they are not fair because the NetApp config offers a higher degree of data protection through double disk failure resiliency than the EMC config.  True, but again, the 4+1 config is commonly what EMC recommends and we are talking about storage efficiency, not resiliency.

Alas, to satisfy both arguments we can combine both requirements into the config.  Larger RGs are allowed to help the capacity numbers for EMC (regardless of performance needs) and only double disk failure protected capacity can be considered usable.

EMC Configuration B: 12+2 RAID 6


For this config, I used the largest RG size I can to reduce the number of parity drives.  The weirdness of the config though, left 1 empty drive spot in the top 6 trays.  We aren’t considering wasted real-estate here, only RAW to usable space, so the empty slots don’t come into play.  This has dropped the total drive count to 159 and 6 spares.  Since the first 5 disks (1.27B of usable capacity)  must be used for vault, and, as far as I could configure them, they must be RAID 5, they won’t count towards the usable space.  The remaining 6+2 on the first tray is RAID 6 and will be counted.  My customer tells me that EMC doesn’t really recommend you put real data on the vault drives anyway.  As before, I configured the appropriate number of hot spares the tool recommended.


To the right is the capacity breakout.  The total usable capacity is only slightly more, but that is because the total number of drives decreased as well.  When we subtract the usable capacity of 1.27TB that is the non double disk failure protected vault raidgroup, the total useable capacity comes to 49.54TB.

NetApp Configuration B: 20+2 RAID DP

The NetApp configuration for option B is largely the same, the only thing I changed was to allocate a total of 9 un-needed imagespares to make the RAW drive count equal.  This brought the total spare count to 13 (4 + 9 un-needed spares).

The configuration is listed to the left.  After subtracting the root volume space of 138GB, we are left with a total usable capacity for SAN or NAS of about 49.7TB.

Option B Results

EMC 49.54TB usable

NetApp 49.7TB usable (about 160GB or .03% more)

You see, either way, the usable to RAW comparison comes up to a be a wash.    I could make some very strong arguments about the performance and data protection benefits of using only NetApp RAID DP.  There is some efficiency gained there because it is a quite innovative approach that is still superior to RAID6 in many ways, even today.  But, that is not the purpose of this post.

I’m sure there will be some questions from Christopher and others, but I hope this helps move the discussion on from the same RAW to usable comparison that we were competing on a decade ago and start talking about what customers are really interested in.   If you want to compete with us on our level, stop sending your customers to build their own raidgroups with your raidgroup efficiency calculator and start talking about the big picture, the complete solutions and how your innovations address business needs.  That is what NetApp does, and our customers love it!

May 20, 2010

Playing to Lose, Hoping to Win: EMC’s Latest Guarantee (Part 1)

This is a hard blog for me to write as I like to try and give people the benefit of the doubt.  Sometimes individuals make mistakes or misread numbers and come up with inaccurate results.  However, Christopher Kusek’s latest blog on EMC’s 20% guarantee program is so full of misrepresented information, cooked numbers and, seriously reckless configurations that I can’t help but believe it was an intentional attempt to mislead customers.

I do really hope that I’m mistaken on his motives and, perhaps Chris, after reviewing this information, can post an update clarifying his mistakes and set the record straight.

Chris claimed in his post that a configuration requiring 135TB of usable capacity would require over 350TB of RAW storage on NetApp whereas EMC would only require between 184TB and 206TB. 

These numbers are so outlandish that anyone who has spent time comparing NetApp vs. EMC solutions—especially someone who has worked at both EMC and NetApp should quickly come to the realization that these numbers are clearly not right and need further investigation.

Chris’s first mistake was the choice of comparison tools.  To gather the NetApp RAW storage required, he used the NetApp Storage Efficiency Calculator.  For the EMC numbers, he used the EMC Capacity Calculator.

The NetApp Storage Efficiency Calculator is a tool we recommend customers use to identify how various NetApp technologies can save storage in a real-world model environment.  In order to most accurately model these savings, customer waste in the form imageof over-provisioning, has to be taken into account.  We call this customer generated waste “Overprovisioned Storage” in the tool and it can be easily identified by hovering over the dark grey section of the column.  We separate this storage out so that customers can see the effects of enabling thin provisioning to combat overprovisioning. 

The EMC Capacity Calculator tool is really just a visual raid-group builder, nothing more, and doesn’t take into account customer behaviors such as over-provisioning.  Chris did not take this into consideration.

To get a better idea of the RAW storage requirements needed by NetApp, the Overprovisioned Storage should have been subtracted from the model as the EMC Capacity Calculator doesn’t factor this in. Leaving all other settings the same and only properly subtracting the over-provisioned storage, the new results are clear.

Configuration NetApp RAW NetApp Useable Raw v Useable % EMC RAW EMC Useable Raw v Useable % Difference
Default Checkboxes 139.2 135 97% 206 135 66% -31%
Uncheck Thin/Dedup 184.5 135 73% 206 135 66% -8%
Uncheck Snaps 157.5 135 86% 206 150 73% -13%
Default Checkboxes 144.2 135 94% 184 137 74% -19%
Uncheck Thin/Dedup 184.5 135 73% 184 137 74% 1%
Uncheck RAID6/Snaps 154.5 135 87% 184 152 83% -5%

Even without addressing the issues in the reckless EMC configs (which I will do below), NetApp is still at least equal or better in default storage utilization than EMC.  Once you enable deduplication, the model shows jumps from 19% to 31%  in NetApp’s favor with these data types.  Savings in virtual environments are much higher, but aren't being compared here.

Now, although it is very unlikely, I suppose the above oversight could have been made in haste. I’m sure Chris can clear up his confusion on the numbers after doing some more research.

What I found to be most disturbing with the post, though, was reckless configurations used in attempt to drive up the storage efficiency numbers of the EMC storage.  Chris not only recommended using 14+1 RAID5 raid group sizes, something unheard of in production environments, but he also recommended that extreme raid group size with 600GB FC drives.  This type of configuration, when deployed in a real world environment exposes the customer to extreme and unnecessary risk of data loss due to double disk failures.  NetApp would NEVER recommend this type of configuration.

When attempting to reconcile the inadequate raid protection of the EMC numbers compared to the superior, high-performance, double disk failure protection that is both default and a best practice for NetApp solutions, Chris opted to downgrade the NetApp protection to RAID4, rather than bringing the EMC protection up to an adequate level.  It appears this was in effort to avoid the additional efficiency penalty that EMC RAID6 requires when chosen.

Again, I want to give Chris the benefit of the doubt, but the facts don’t look good here. I do hope Chris will come back with an informed, real-world, production ready configuration that we can compare without all these shenanigans.

I’ll continue this discussion in Part 2, where I will share what I believe to be EMCs true motives for this guarantee program which, as you can see above, has no technical merit to actually beat NetApp’s storage efficiency.

May 01, 2010

Large Scale Data Protection

 recent blog post by Curtis Preston on backing up a 300TB DB has sparked a flurry of responses and tweets.  Curtis has a customer with an EMC V-Max that is unable to backup a 300TB Database.   He asked for help to determine if it can be done.  The answer, Curtis, is a resounding “Yes, with NetApp!”

I blogged on this type of problem and the solution back in 2008 and most NetApp customers listened—even many EMC customers listened and switched to NetApp :).  But alas, EMC still hasn’t gotten the message. Chuck is still hung up on making copies for backups.  I suppose you can’t blame Chuck.  EMC has a veritable potpourri of “backup” products they need to sell.  Encouraging customers to go to a much simpler and more scalable in-place data protection strategy would obsolete much of their portfolio. And, it seems even with their latest technology, V-Max still has trouble taking snapshots. However, this blog isn’t (just) about exposing holes in EMC’s portfolio, it’s about helping customers with real problems.

The same snapshot technology that customers use in all of our arrays to protect all types of data on all storage protocols can be implemented on the smallest dataset and scale to the largest multi-PB datasets.  It is easily managed as a simple solution with a wide range of application integration (SnapManager) products that bring added business value to an in-place, snapshot backup strategy and centralized management and automation through NetApp Protection Manager.

As I stated in my blog in 2008, NetApp Snapshots protect data in-place and can quickly and easily be restored in place.  Eliminating the need for old-fashioned copy-based backup technologies (sorry EMC).

It is the best of all worlds, maximizing RTO and RPO, eliminating host overhead and minimizing infrastructure bandwidth.  It also stores backups in a native format that can easily be read for granular, record/tablespace level restores or used for reporting.  The native format and NetApp flexclones also make Disaster Recovery tests a breeze.  No more shipping tapes and waiting days for data to be copied back to a primary tier just to validate it.

Now to the details:

The specific solution I designed and implemented was to protect a 300+TB Oracle database with 1-2TB of daily change and to allow for future growth to over 1PB.  The database has since nearly doubled in size and the data protection (snapshot time) still takes around 10 seconds.  After seeing how this scales in a customer environment up close, and knowing in-depth how our snapshot and consistency group technology works, I’m extremely confident this type of solution can scale to multi-PB and beyond very easily.  There are many other similar and larger solutions other NetApp consultants have worked on.  My customers are extremely happy with the NetApp snapshot solution they have, no matter the scale.

I can’t give any details about specific customer designs, but I can speak in generalities about what makes this type of solution scale well and some things I’ve found that help when designing large scale data-protection solutions with Oracle.

  • Typically, customers are going to choose RAC for resiliency and scalability.
  • Snapshots work the same regardless of protocol (NFS, FC, iSCSI, FCOE).
  • ASM is not required, but has several benefits.  First, large DBs most likely will not fit on a single array, they will be spread across a farm of arrays.  ASM works extremely well in distributing I/O load across all controllers.  Consequently, it also distributes change rate well across all controllers and makes lun additions, removals and migrations very easy.  Rebalances can cause short periods of increased change rate and I/O activity, so be aware and plan for this.
  • Archive log mode and hot backups are preferred.  Both Oracle and NetApp recommend using archive logging  and performing hot backups to make snapshots cleaner and recovery easier.  In some cases, these large DBs have so much log activity, that database performance suffers too great with archive logging and, so , it is disabled.  In this case, cold (offline) snapshots are preferred, but, if necessary, we can also work with you to develop a solution for online, clean, crash-consistent snapshots.  These snapshots are taken when not in backup mode with no archive logging, yet can be cleanly recovered as point-in-time images using Oracle crash and instance recovery mechanics.  Again, hot or cold backups are preferred, if possible.
  • Consistency group snapshots are key. NetApp arrays can create efficient, coordinated snapshots across multiple volumes and arrays for applications , like databases, that honor dependant writes.  Consistency group snapshots briefly fence I/O while snapshots are occurring to ensure consistency.  NetApp arrays also process snapshots along atomic block boundaries up to 64k.  So we prevent fracturing a Oracle block during snapshots to ensure database consistency.
  • Asynchronous or Synchronous replication copies the incremental block changes and latest snapshot images from the primary storage tier to a secondary tier.  The secondary tier can be used for flexcloning DR tests, granular restores or reporting.  Flexclones don’t use any additional space for read activity, but consume space changes are made to them.
  • Larger controllers such as FAS6080s will most likely be recommended because of their ability to scale in both capacity and performance, but the solution can work just as easily across a larger number of smaller controllers.  Consistency group snapshots are created in parallel, across all controllers, so there is no single bottleneck in performing the snapshots that require more horsepower.
  • Raid DP (high-performance raid-6) is preferred.    Even when protected with a secondary tier, the event of a double disk failure on the primary storage will be an unpleasant event.  You will be protected against double disk failure without performance impact, using Raid-DP.
  • The V-Series solution works the same way.  So you can reuse that storage you already purchased without having to buy new spindles.  Or, you can use a combination of V-series and NetApp disk, even on the same V-Series controller.  Drive performance characteristics need to be similar.
  • NetApp Professional Services can help you get there.  We can not only design a solution that meets your needs, but we can also help you migrate to it.  Please use our expertise if you go down this path.

I’ll be willing to answer any questions, with the exception of customer specifics.  If you are interested in this type of solution in a deployed customer environment, ask your NetApp sales rep and we might be able to get a reference call.

Edit:  Decided to add a reference architecture picture to better describe a typical solution.  

Large Scale Data Protection


April 16, 2010

mbralign on Steroids

Much of the customer engagements I work on involve migrating VMware guests from other vendors storage onto NetApp.  Because most other vendors are ignorant or deceptively quiet about importance of proper alignment on VMware, one of the most important things we do during a migration is fix VMware guest alignment.

You can get more information about guest alignment and mbralign HERE and HERE. That is not the purpose of this blog.

Unfortunately, since MBRAlign by default uses the ESX console, a workaround is necessary to align guest in an ESXi environment. The workaround involves using a linux guest to mount the NFS datastore and perform mbrscan and mbralign tasks from the guest. What is lesser-known is that the same workaround can seriously boost the performance of mbralign jobs even in non-ESXi environments. 

Vmware limits the resources available to ESX consoles to improve guest performance.  This means mbralign jobs will also be limited in performance when run on the ESX console.  If you have dozens or hundreds of VMs to align and are under shorter maintenance windows or just don’t want to be up all night doing alignments, try running mbralign from a Linux guest.

I had a customer this week who saw mbralign performance improve from 10MB/s to 80MB/s.  Definitely worth it for the hundreds of VMs to be aligned as they move them to NetApp.

Sorry, this approach doesn’t work for VMFS datastores, in that case, try increasing the mbralign block size (--bs) or consider increasing the resource limits for the ESX console.

Happy Aligning!

April 15, 2010

Goofus and Gallant

One of the best parts of my job is working with a variety of business and IT departments.  I get to see numerous examples of how IT can be ran well and how IT can be ran very poorly.  Over the years I’ve determined there are a few main differentiators between well run IT organizations and the ones that are struggling to survive.

So what are the defining aspects that make up a successful IT organization?

Business Aligned: IT goals are directly aligned to business goals.  IT understands how they impact the business and work hard to make sure their internal customers are satisfied with the services they are providing.  IT changes direction base on the desires of their internal customers.

Clear Strategy, Architecture and Process: Successful IT organizations carefully translate their customer’s desires into an IT strategy that brings those desires to life.  Architectures are developed specifically to meet customer requirements.  Methodologies and solutions are carefully selected to seamlessly integrate into the architecture and accommodate broad variations with the fewest possible moving parts. They are extremely proactive, anticipating customer needs and potential problems early in the design phase.  Architectures, methodologies and processes are continuously improved to better serve their customer’s changing requirements. 

Strong Leadership: IT leaders effectively communicate business needs as IT goals.  Strong leaders discourage silo-building and technology bias while rewarding open-minded conversation and new, innovative ideas.  They avoid vendor FUD and validate their direction with industry proven approaches and Proof of Concept testing.   Effective leadership extends throughout the organizational tree. Vision and direction are effectively communicated and committed to across all departments.

Effective Cooperation: Teams regularly cooperate across the organization to build comprehensive solutions that meet a variety of business needs. Respect, candidness and collaboration are pillars of the organization.  Individuals put the desires of their customers and fellow teams ahead of their personal preferences.

In contrast, what makes up the struggling IT organization I’ve seen?

Departmentally Aligned: IT and Departmental goals are developed introspectively, focusing on the current pain points of technology and cost.  IT direction is rigid and frequently forces the business to adapt to IT capabilities.

Segmented Strategy, Architecture and Processes: Struggling It organizations have disjointed strategies.  Individual departments identify and implement technologies that alleviate departmental problems without regard for the impact on adjacent departments.  Organizations are extremely reactive, continually shifting resources to plug holes in current methodologies and technologies that cause organizational pain.  Architectures aren't planned, but rather appear from a conglomeration of various, overlapping departmental technologies wound together in complex, undocumented meshes.  Process improvement is too complex and politically volatile to attempt.

Weak or Segmented Leadership: IT leaders don’t clearly articulate vision, instead focus almost entirely on tactical efforts.  In response to business pressures, leaders approve shortcuts, one-off, siloed solutions and frequently bypass Proof of Concept testing to quickly meet urgent needs.  They fall victim to vendor FUD regularly, discouraging innovation and embracing the status quo in effort to play-it-safe.  Communication of IT vision and direction is segmented and twisted by middle management that fails to agree or understand the greater vision.

Lack of Cooperation: Individuals routinely hoard information and sabotage the efforts of others to advance their own interests.  The culture is poisonous.  Respect and candidness between departments is rare, while back-room conversation is venomous.  Internal customer’s preferences are seen as hindrances to departmental progress.

So, where does your organization fall?

March 11, 2010

Ranting Without Tiers

I’m really surprised at how much discussion is going on right now with the tiering fad.  Let’s be real.  Automated tiering is a gimmick.  It is something vendors really want to convince customers they must have in the future.  This sales pitch about “the right data to the right place at the right time” is the same pitch we’ve heard about ILM in the past.

Tiering is positioned as some kind of ILM in a box solution.  We all know it is far from ILM and I don’t think the engineers developing things like AO and FAST really intended it to be sold as ILM.  Yet, sales has their own ideas of what customer’s will buy…as usual.

The foundation of the ILM discussions in the past was all about tiers.  “The right data to the right place at the right time.”  Today, when the people that actually understand the technology, talk about automated tiering, the discussion is all about improving cache misses.  EMC clarified this in a recent discussion about FAST v2.

“What FAST adds to this is the simple ability to get into Flash those regularly accessed (read) blocks that otherwise would be a read cache miss and require a trip to slow disk.”…”So, for Symmetrix at least, FAST and FLash is all about reducing response times of Read Misses” – Barry Burke

So FAST v2 is about getting into Flash regularly accessed read blocks that would otherwise require a slow trip to disk.  Uh…isn’t that exactly what the NetApp Performance Acceleration Module does TODAY? What we do today, without all the Johnny Mnemonic Autonomic Subsonic data migration mumbo-jumbo?

See, what all of this really boils down to is meeting the customer’s requirements for Capacity, Reliability and Performance at the best possible cost.  Tiering, much like cache has done traditionally for decades, improves the cost/performance ratio of storage.  Using Flash instead of RAM as cache takes the cost/performance metric to a whole new level.  Now you can have multi-TB caches that allow you to get by with far fewer and/or larger capacity drives.  The best part is that it’s easy!  Just turn it on.  If your architecture prevents you from using Flash as cache and you develop a similar approach by moving data all over the backend disks to try and compete near the same cost/performance ratio, so be it.  Glad you found a way.  But let’s not get hasty in declaring it the only way…as though it should be a requirement on every RFP.  Again, the real requirement is cost/performance.

For as much fervor as this topic has garnered in the last few weeks, it is really an irrelevant discussion.  My customer’s aren’t even talking to me about tiers.  They are happy with the cost/performance they are getting and are ready to move to the next level.  Maybe EMC’s FAST is a response to customer feedback about their platforms being too rigid…I don’t know.

What is the next level?  Functionality.  It’s about taking something that has long been considered a commodity resource and turning it into something that can add significant value from the business…not just suck dollars away from it.  Intelligent storage arrays that provide integrated multi-protocol access, data protection, data reduction technologies, secure multi-tenancy and advanced application integration will be the benchmark of the future.

February 25, 2010

Don't Cry for Me: Why NetApp is not Dependent on Tiers

A lot of strange speculation and chaos theories have occurred lately in regards to NetApp's tiering strategy.  Maybe I can help clear up why NetApp is not so dependent on tiers as our competitors.


First, we fundamentally solve customer problems differently.  NetApp has always tried to improve efficiencies at a software level to drive additional efficiency in regards to both capacity utilization and performance utilization of hardware.  Our competition seems to have always focused on the next fastest hardware to overcome current limitations.


The present day effects of these approaches are now very apparent.  NetApp has a highly unified and efficient storage operating system that can maximize the capacity and performance specifications of the  underlying hardware.  The result is more functionality and performance at lower cost.  Our competition has been driven into managing multiple "purpose-built" siloes, each with varying hardware specifications.  The result is limited functionality within each silo (to gain functionality, additional siloes must be implemented), and rigid performance scaling because little has been done in software to improve the utilization of the underlying hardware.


Allow me to explain further.  At typical 15kRPM FC/SAS drive can provide around 220 random IOPs at 20ms.  A typical 7.2kRPM SATA drive can provide around 40 random IOPs at the same latency.  That is a huge limitation and it is the reason why SATA cannot easily support highly transactional workloads.   For sequential workloads ,however, SATA is very close to FC, pushing hundreds or even thousands of sequential IOPs.  This is because the data density helps make up for the slower rotational speed on sequential I/O.  So, SATA is still a very good candidate for archive, backup and other sequential workloads.  As SATA drives get larger and denser, the sequential performance will increase as well.


Here is where NetApp innovation really shines.  Unlike traditional arrays processing transactional workloads, we can group random writes into sequential stripes.  This gives sequential performance to random write workloads and eliminates the random write penalty for SATA.  It is part of the reason why we have never needed a very large write cache, since we write to the drives sequentially, we don't have the back-pressure issues that traditional arrays have when trying to flush random I/O from cache to disk.


Our competition can't do this, so they need another solution to eliminate back-pressure on the spindles.  Sticking to what they know, they focus on hardware as the solution.  Selling additional, faster hardware (lots of SSDs), and attempting to move the busy data between hardware to maintain SLAs.  I suppose it works (or will work eventually), but it will always require more hardware and management.  They should have maximized the efficiency of the more affordable drives before immediately beefing up hardware.  It's like the DBA that demands additional spindles, processors and memory rather than improving their database structures.


I think Microsoft also understands the idea of working smarter, not harder.  In Exchange 2010, they gained significant performance increases and stretched storage hardware much further by moving away from a small I/O, highly random writes and, instead, are performing large sequential I/Os.  This, in addition to caching improvements to reduce random reads, lets them now recommend using SATA for Exchange.  Something that was once unimaginable.  They even went as far to state that Flash is best utilized in Exchange Server 2010 when used as a cache in the storage stack.  Are you starting to see the parallels here?


What amazes me is that our competition has suddenly been shocked and surprised to learn we aren’t banking our future on tiers like they are.  I wonder if they thought they were actually winning up until now?


Are we anti-tiering?  I don't think so.  Will we continue to use and support hardware with differing performance characteristics  in the future?  I'm sure.  But I think we will continue to watch the market and focus our solutions on addressing the largest number of customer challenges at the lowest possible cost.  I'm confident, with our present direction, we can continue to do that better than our competitors.