« Site Recovery Manager: It’s Not Just for VMs | Main | VMware Communities Roundtable Podcast #48 »

May 21, 2009


Val Bercovici

Great post Vaughn.

Native / integrated failover support with high performance load-balancing via automatic queue depth management is exactly what MPIO should be.

Kudos to the vSphere team!

Andrew Miller

Thanks as well -- very helpful.

A similar breakdown focused on NFS would be wonderful as well.

Vaughn Stewart

Chad Sakac and I are wrapping up a joint post from EMC & NetApp on NFS. Stay tuned!

Chad Sakac

Vaughn - thanks, there are a few things that need correction. There are major difference between the SATP (Active)+PSP (RR) combination and PP/VE.

Here are a couple:

1) RR is literally round robin - one I/O down one path, then one I/O down another path. There is no treatment for varying degrees of congestion. This can happen for many reasons. PP/VE will dynamically rebalance paths on an I/O by I/O basis. It's analagous to pinning guest CPUs to pCPU vs. letting the vmkernel scheduler sort it out.

2) On port congestion, you're right that it's usually the array ports that are the source of congestion - but it's not generally due to the port queue depth or LUN queue depth in the array - despite the CLARiiON slam :-) but rather the service time. The service time is a function of how busy the spindles and the storage processors/filer heads are - not the ports themselves. I'm guessing most customers have 14+2 aggregates - so the service time is a function of how busy the aggregate is (like on the CX, it's a function of the RAID group configuration). Really deep queues with bad service times = really bad latency. But - when the port queues do start to fill - (and they do - with all arrays amigo :-) - that's exactly why PP/VE (and not RR) is so important. Backing off the array port that is particularly busy can improve the entire IO path efficiency - from the ESX host to the array - and efficiency is what it's all about.

3) ALUA - Most of the major mid-range array players support ALUA (and MPIO and NMP) - it's certainly true of EMC, HP, LSI Logic, NetApp - open is good. The only reason we add PP/VE is if we can demonstrate "better than free" - after all PP/VE (which uses the MPIO framework) isn't free, so every customer needs to evaluate it's value relative to what NMP/MPIO deliver on their own. In the ALUA case, it's particularly important - as the first "A" stands for Asymmetric. In ALUA, there is always a "good" path and a "bad" path - where an internal bus is being used to transverse between storage controllers (aka Storage Processors/FAS heads). These internal busses are generally used for other things (CX case = CMI bus for write cache mirroring; NetApp case - I believe - please correct me if I'm wrong - it's used for partitioning and mirroring the NVRAM). These buses are not designed to be able to handle the entire workload of the aggregate of all the front-end ports. Hence, Asymetric Logical Unit Access (ALUA). Round Robin treats all paths equally. For ALUA to really make sense, you certainly don't want all paths treated equally - as loading the bad path = degraded overally array performance.

I'm not saying NMP sucks, or PP/VE rocks - only that this is a topic where there is a lot of "there there".

Thanks for the post - now let's finish that NFS post :-)

Vaughn Stewart


Thanks for the dialog and feedback. I believe it adds to the quality of the conversation.

You are correct, poor service times (or hot disks) do make for poor I/O performance. No multipathing technology in the world resolves this issue.

Let us not conflate this design concern with needing to balance I/O to individual object queues versus I/O which is sent to global pooled queues.

Scaling queue depth globally seems to be a more advanced architecture. The storage array will have significant I/O load as it must manage the aggregated workload of ESX/ESXi clusters. Eliminating LUN queue-full events from occurring allows one ot not have to worry about issues where a hot LUN adds latency to a storage path and in turn negatively affects the I/O of other VMs.

IMO - purchasing advance path management software is a requirement to solve an issue which has its roots in physical, not virtual, architectures.

I will blog more on this topic next week.

Aaron Chaisson

Vaughn, unfortunately you seem to be missing the benefits that PowerPath provides as virtualization and consolidation increases. PowerPath/VE is actually MORE critical for virtual environments where maximizing the utilization of a given investment while minimizing the effect on performance is a primary design criteria. As Chad said, PowerPath determines the best path on an IO by IO basis based on expected service time (service time is the time an IO spends in the queue + the response time for that IO once it is sent). Your focus on queue depth is only part of the equation and a minor one at that considering that the vast majority of customers don’t queue enough IO. In fact there are probably at least 99 customers who aren't even filling their queues for every 1 who is queuing too much. The thing is that service time is not entirely dependent upon LUN, target or Port load, but is instead often effected by traffic patterns over shared network paths especially as consolidation ratios increase. This means that something as simple as a batch process on a VM running on another ESX server in the cluster could result in increased latencies for ALL IOs over those shared network resources (even if there is almost no queuing at all). This is because increased load (ie. increased utilization) results in increased response times for ALL IO on that shared path. This is proven using simple and widely accepted queuing theory. Little’s Law (Response Time=Service Time/1-utilization), shows that as utilization increases, any incremental bump in utilization has an exponentially larger effect on response time. This means that higher utilization rates are more sensitive to utilization fluctuations resulting in unpredictable performance the greater you consolidate and this is true regardless of what protocol you are using. Hell, it doesn’t even matter what you are queuing. Understanding this principal will change how you select a line at a toll booth or supermarket ;)

Now, isn’t the effective increase in utilization of assets a fundamental tenet of virtualization and a cornerstone to its value prop? if so, wouldn't you want to look at a tool that could help you drive up asset utilization.

In reality, what PowerPath does is that it recognizes network and storage fluctuations (by monitoring response times) then factors in queue depths to determine the expected service time for any given IO. Based on this PowerPath rebalances loads accordingly, guaranteeing each IO the fastest service time possible over ALL available paths. This is why we typical experience 2x -3x lower response times and 2x-4x more throughput over a given set of resources.

Finally, as 10:1 or 20:1 consolidation becomes 100:1 or 200:1 in the near future, converged fabrics will be a requirement. In these 10Gb or faster networks, iSCSI, NFS, FCoE, and production, backup and management networks will all traverse the same physical cables. This means that fluctuations in any type of communication could affect storage performance. As you said, PowerPath is the only product of its kind on the market that can be used to properly navigate these fluctuations.

So, what this shows is that PowerPath/VE is actually MORE critical for virtual environments where maximizing the utilization of a given investment while minimizing the effect on performance is a primary design criteria. PowerPath, in effect, allows us to more efficiently leverage the physical assets that we have deployed.

Vaughn Stewart


Thanks for the reply. It appears that NetApp has some work to do educating the world on our storage architecture and how it avoids these issues.

I am not challenging the capabilities of PP VE. Rather I am pointing out that the need for this technology is directly linked to the technology inherent in the storage array.

In your reply you repeated Chad's point that a hot LUN can present the same negative effect on performance as a full LUN queue.

On this point I agree with you, so let's discuss how we avoid this scenario.

With a NetApp FAS, V-Series, and IBM N-Series arrays our LUNs aren't bound to specific disk drives. Instead they are virtual and are serviced by a pool of drives referred to as an aggregate.

This info should not be new to you. Chuck had been publicly knocking our LUN virtualization implementation for years.

This design reduces hot spots with LUNs and LUN queues. As such our customers do not required to purchase additional multipathing technologies.

Now don't get me wrong here. NetApp Array's still require utilization monitoring like an EMC array. With NetApp array one monitors and addresses global or pool utilization levels not individual LUNs.

If this model seems familiar it should as it is very similar to how one monitors resource utilization levels within an ESX/ESXi cluster.

I'd like to extend an invitation to demonstrate our technology for you.

Thanks again for the follow up.

Tom Giuliano

Further to Aaron's post, there are several key benefits to using PowerPath/VE in VMware vSphere environments:

• Automation: PowerPath/VE lets you automate optimal server, storage, and path utilization in a dynamic virtual environment to provide predictable and consistent information access—just set it and forget it. While native multipathing products require constant monitoring and manual rebalancing, PowerPath/VE automatically rebalances the I/O to maximize performance and availability.

• Standardization: With PowerPath/VE, PowerPath now unifies the management of major server, operating system, and virtual platforms across virtual and physical environments. Rather than deploy and support a variety of point products, customers can use PowerPath/VE to standardize on a single multipathing solution across entire environments.

• Protection: PowerPath/VE lets you insulate the virtual machine guest operating system and applications from path management, and automatically detects and reroutes I/O following a path failure to ensure that applications stay online. Since PowerPath Multipathing capabilities are added to the VMware kernel, the virtual machine guest operating system has no knowledge of PowerPath, so applications running on the guest virtual machine are shielded from potential disruption.

• Simplified management: PowerPath/VE eliminates the need to monitor and rebalance the dynamic environment. It also reduces the number of multipathing software variants to manage across heterogeneous physical and virtual environments.

• Improved performance: PowerPath/VE increases the number of always-active initiators, resulting in increased performance with multiple data streams per host. I've received reports from beta customers of a 2X performance improvement.

Additionally, it's important to clarify that PowerPath/VE for vSphere is a MPP (Multi Path Plugin) similar to what NMP is, but more advanced and intelligent. It is not a 3rd party/vendor created SATP (Storage Array Type Plugin) or PSP (Path Selection Policy) as initially indicated.

Vaughn Stewart


Thanks for jumping into the discussion; however, what is missing from this conversation is the root cause; traditional storage architecture which include individual LUN queues.

Let's not forget this problem is exacerbated by architectures with shallow queue depths.

I don't believe we are in disagreement. If a customer has an EMC array they really need to consider PP VE.

Alternatively customers can consider storage arrays which virtualize storage access and management. This design provides block level data deduplicaiton, integrated snapshot based backups whose storage can also be used with SRM, zero cost and I/O offloading VM clones, and more all while avoiding said root cause.

Why purchase PP VE to fix an issue associated with physical architectures when you can implement a virtual one instead?

Aaron Chaisson

;) Thanks for the invite. Next time I’m down in Raleigh, I’d love to see your technology in the hands of an expert. That being said, the problem with this post is that you are trying to disparage EMC arrays indirectly by discussing a tool that you clearly don’t actually understand. You keep talking about “traditional” vs. “virtual” storage as the center point of this discussion (terms that alone could be debated … with abject futility over this medium). What we are telling you that you are starting with a false premise. Our point is that PowerPath is not now, nor ever was, a tool designed to solve a “storage problem” inherent in our or anyone else’s array and no matter how many times you keep trying to say it, it doesn't make it true. PowerPath is a tool designed to optimize performance resulting from anything that could effect PATH efficiency and the fact is that as consolidation rates increase, such as the case with server virtualization, efficiency becomes more important, not less.

Val Bercovici

Very revealing discussion you've spawned here Vaughn. I have always been a fan of PowerPath in the early days of SAN's before host OS support became SAN-aware.

But in this decade, from an overall configuration management perspective in a highly virtualized environment at scale (i.e. such as one would find in Cloud Infrastructures) - removing as many 3rd-party layers as possible is key to overall simplicity.

Therefore, I still stand by my original statement:

"Native / integrated failover support with high performance load-balancing via automatic (global) queue depth management is exactly what MPIO should be."

Many of EMC's articulate arguments here are very storage admin-centric. I content that while intelligent storage adds tremendous value to a server virtualization environment - it should do so as transparently as possible to application and server admins.

After all, storage is still the tail to Server Virtualization's dog :)

Chad Sakac

Val - I agree :-)

From one Canadian to another with respect....

Configuration of multipathing (even in vSphere) is complex, manual, and involves a lot of knowledge of the array internals. I get the sense that you haven't tried this on any vSphere environment of any small scale. Aaron and I have.

BTW - this is true of SATP + RR (which Vaughn discusses) with block devices, and also of NFS datastores. It's complex at moderate scale, hard at enteprise scale, and updating when changes happen is extra hard (in spite of being MUCH MUCH better than ESX 3.x)

Conversely - you install PP/VE via the vSphere Host Utilities and that's it - DONE.

Again - this isn't because we don't believe in the native NMP. For a bit of fun - issue this command against a vSphere ESX 4 host: "esxcli nmp satp list". How often does EMC appear? That, my respected colleague, represents engineering work - plain and simple.

EVERY EMC array was supported on the VMware vSphere HCL **DAY 1**. No other VMware partner pulled that off. That, my respected colleague, represents engineering work - plain and simple.

Likewise our goal is to make storage in the Virtual Datacenter INVISIBLE.

we're doing that in the current vSphere generation via vCenter plugins that are freely available (I know the NetApp ESX Host Utilities are available, not sure on cost) and via making the arrays themselves (for NFS and VMFS use cases) VM-object aware - post going up on that soon.

Our management suite integrated with the vCenter APIs and ESX CIM APIs and were VMware Ready a full year before Onaro and VMinsight.

In future vSphere generations - this will only increase via the vStorage VAAI integration (vStorage APIs for Array Integration - formerly known as vStorage APIs, formerly known as VMware Aware Storage - thank goodness VMware is better at making products than sticking with names (personally, I like the first - VMware Aware Storage describes what the goal is).

Now why do I say this all? Only to point out that NetApp is a respected competitor - but let there be no implication that EMC isn't laser focused on this and executing.

We each are working make our solutions as rich, as simple, as clean, as available, as flexible, as invisible, and as VMware-integrated as possible.

This is a view I believe NetApp and EMC share (most other vendors don't even show up for the VAAI calls) - and each customer needs to evaluate how well we are executing against that if they agree with that direction.

But this is, indeed a technical thread, not a marketing one.

"Native / integrated failover support with high performance load-balancing via automatic (global) queue depth management is exactly what MPIO should be."

Right-o. NMP (SATP+PSP) doesn't do that. PP/VE does.

I can tell you that I enjoy having NetApp as a competitor - I spring out of bed in the morning with a smile on my face and work all day to try to delight customers and work with our engineers to innovate, and a respected competitor is an important part of that equation.

Our focus on winning here on our merits and execution is unwavering, and overwhelming.

We have solutions that sell to process-admins, service-desk owners, server/VM configuration owners, desktop/client onwers, SLA owners, Exchange/SQL Server/Sharepoint/Oracle/SAP owners, security owners, backup-admins, VMware-admins, and yes, Storage-admins - and that's not a bad thing - that's a good thing.

This post is Storage Centric, sure - but it's discussing a storage nit. If you think that characterizes EMC - well - please keep thinking that! :-)

Closing the same way I opened - with agreement - storage is the tail to Server Virtualization's dog! :-)

Cheers - and looking forward to seeing you on the battlefield!

Michael Lozano

I'm a little confused on something. Vaughn, you said:

"With a NetApp FAS, V-Series, and IBM N-Series arrays our LUNs aren't bound to specific disk drives. Instead they are virtual and are serviced by a pool of drives referred to as an aggregate.


This design reduces hot spots with LUNs and LUN queues. As such our customers do not required to purchase additional multipathing technologies."

To me, LUN I/O performance and PATH I/O performance/failover are separate issues. Having a LUN that performs well does not eliminate the need for multipathing. One would assume multiple LUNs would be presented down the same path, and as that path reaches saturation, additional paths would need to be used to maintain throughput to the LUNs. It is a basic multipathing concept that multiple paths can load balance the aggregate throughput to multiple LUNs, thereby eliminating the issue of static paths saturating without a way to provide more throughput. If the path were to saturate, it wouldn't matter how fast the LUN was, performance would still be degraded. Multipathing solves this issue.

Did I miss something?

Vaughn Stewart


You asked, "Did I miss something?"

I believe you have.

In my post it doesn't state or imply there is no need for multipathing.

I clearly stated there is no need for the sophistication (and expense) of PPVE if your storage array doesn't have the issues associated with per LUN I/O queues.

The need for PPVE is exacerbated with arrays which implement shallow LUN I/O queues.

Remember VCE - Virtualization Changes Everything (as I am known to say)

I'd like to extend an offer for you to stop by a NetApp office and I'll coordinate a demonstration of our vitrualized storage architecture for you.

I'll have more on this topic next week - so stay tuned.

BTW - is this the Michael Lozano who is a Microsoft Solutions Consultant at EMC? If so, welcome to the blog.

Chad Sakac

Vaughn - in normal Chad/Virtual Geek style, there was too much to simply put in comments responses. There was also stuff that I believe was incorrect, and materially incorrect.

So - there's a (as per normal - completely long-winded and detailed) post up now. I tried hard to keep the tenor right, because after all, this is engineering, not competitive positioning.


Also, some corrections I thought weren't material enough for the post on my site:

"I should have mentioned it earlier in this post, but NetApp has provided the EHU and will provide the VSC at no cost. Maybe EMC will consider doing the same with PP VE as it seems to be required in order to ensure the availability and performance with a Clariion array. "

Umm - EMC provides our vCenter Plugins all completely free.

1) EMC Storage Viewer: is similar to the EHU - though I would argue the EMC one is better, and was available before the vCenter plugin of EHU.
2) VDI Deployment Tool which is similar to RCU - though to be fair the RCU is better (working on it!), and was available before the EMC plugin
3) SRM failback plugin - which I would argue NetApp has no equivalent (correct me if I'm wrong)

Of course - with "plugin explosion" happening, we're integrating into a next-gen single framework - great minds think alike. this is the EMC version of the VSC (which BTW, we were working on before we saw your and Nick's post on the topic).

The fact that PP/VE is an additional cost item isn't because:

a) we don't support NMP, RR and ALUA (we do, of course)

b) that it's needed in any way to get past "legacy array" issues. Ha - had me laughing on that one! the core NetApp architecture has been consistent through to the very early 90s, similar to the CLARiiON - they are both "virtualized" and also both "legacy" at the same time. Let's be honest, only something designed in the last decade can be called "non-legacy" (and yes, we've both made many, many incremental innovations throughout, but consistently based on an architectural model that is fairly old. There is a nice way to say "old, legacy architecture" - that is "mature" :-)

So why is it a chargable item? It's because it doesn't simply provide a platform optimization, or alternate view/management model.

PP/VE offers customers (and not just EMC folks - HP, HDS, and IBM as well) something that is BETTER (adaptive and predictive queue management and path management) than NMP RR, and with an automated (ergo scales better to large enterprises) than the very manual setup that remains with NMP RR.

Every customer is given the choice to determine if they think the value is worth the cost. On that note LOL.... As if NetApp software isn't generally a chargeable item at some of the industries' highest margin - a reflection of good customer value - that's not a bad thing.

If/when NetApp decides to truly integrate with the Pluggable Storage Architecture (hint, this involves writing a SATP, PSP or MPP, not just saying "hey we support what's out of the box") and offer that for free, then you can cast a stone here. For example, it looks like Dell/EqualLogic has written a third party PSP (which VMware has recently asked everyone to start calling a MEM - Multipathing Extension Module). Not sure what it does, and whether it's free, but THEY can critique pricing if it's free - and then customers determine value between the two (if it is a PSP, that's very different than an MPP)

Please avoid painting that "evil" picture which is consistent in the post (and many, many NetApp posts).

We're not evil. We're good people. Like any company, we're not monolithic. I don't think NetApp is evil or malicious.

We're working to innovate, and help customers. We're not perfect (and when we make an error, we will work to correct). Every customer chooses value and fit for their environment, and choose their partners.

Looking forward to seeing you at VMworld in San Fran, and the first cocktail is on me!

Vaughn Stewart

Chad, buddy, come on. Evil? I never said or implied such. A legacy, monolithic, out-dated architecture which requires host based software to make it perform better... absolutely!

This isn't personal, so please don't take it as such.

Our array architecture is more apropos for server virtualization than a traditional legacy storage array.

Don't believe me, I work for a vendor check out the market share growth between the 2008 Goldman Sachs IT Spending Survey and the 2009 Forrester Storage Choices fir Virtual Server Environments. You know what I'm referring to.

Which storage vendor gained 18% share?

Now ask why... (hint: storage virtualization matters)

Erick Moore

Ugh,reading this he said she said stuff gets very old and annoying, especially when it is EMC NetApp. As a happy current NetApp customer, and former EMC customer (I was also happy with their arrays so maybe I'm easy to please) I would like to see something like this from NetApp. Yes, I know we can do RR, but a little more logic would be nice.

Would I pay per CPU on my ESX hosts, no chance in hell. Would I license it per controller head, as NetApp does most of their licensing, absolutely.


Are you proposing RR Mpath for active active clusters? Are there potential issues with partner path problems as we have seen in ESX 3.x using esxcfg-mpath? Does this now mean no more running config_mpath when I add new luns?

Chad Sakac

@gballou - so long as the array can present a LUN on all ports (owning and non-owning processors on what VMware calls "active/passive" arrays such as NetApp and CLARiiON CX4 configured in ALUA mode supporting SCSI-3 reservations & what VMware calls "active/active" models such as HDS and Symmetrix), then YES.

I can't/won't speak for NetApp, but speaking for EMC, we absolutely support NMP RR in both those CX4 and Symmetrix configurations, and on our Celerra iSCSI target. That means no more config_mpath.

PowerPath/VE completely updates the whole vSphere multipathing stack - for example not requiring any path discovery via rescans, testing path state (potentially resolving problems before failed I/Os), and also changes the path selection for an IO from a simple round robin to adaptive (using ESX queue depth) with EMC and other 3rd party arrays (check with Elab for support matrix), and if using EMC arrays also factors in the target port queue depths (which are future predictors of host queuing behaviour).

Jeremy Page

There's going to be a time when vendor specific stuff starts dying like flies. Remember the Eddie Murphy skit?

What have you done for me lately?

Both NetApp and EMC are being challenged by commodity hardware + FOSS in the mid range. VMware+ZFS is a compelling story but not mature enough for my organization. I also expect that to change.

But neither company is a sitting target. VMware made bank on the fact that people where so focused on clock speeds they forgot there where other parts to their computers.

NetApp is the core of my current data center and has allowed us to do a better job cheaper from an infrastructure point of view than our former storage vendors. They figured out the game ahead of the pack, it will be interesting to see how far 8 is ahead of the pack. We all know they took their time and Dave etc all are smart folks. They know they still have an edge good enough to keep customers from moving to FOSS.

Vaughn Stewart


High value features in any market always become commoditized in time. Do you recall when power windows were a high end feature in an automobile? What about seat-belts?

We live in an age where we all must innovate or die.

I trust the leaders at the helm @ NetApp will allow us to continue to move forward.

Thank you for being a customer, and if there's anything I can do to assist you please don't hesitate to ask.

Richard Gray

Hi guys, interesting blog. I've always been frustrated by earlier versions of ESX and configuring the right path selection etc. It is nice to see there has been progress since I have mostly left FC for the easy life of NFS.

Chad, your original comments about using RR with ALUA as being less than ideal because it will still send data via 'bad ports' - I share this concern but from my understanding ALUA actually declares path states and therefore the ALUA host will select only states declared as active-optimized and not paths declared as active-unoptimized.
However it will be interesting to test this out as after selecting RR for my ALUA provisioned LUNs and then looking on the ESX host - all 4 path's show as active...you guys any thoughts on this and how best to go about testing it? I dont like to always accept things as gospel with out seeing it first hand.


Vaughn Stewart

@Rich - Your understanding is correct. Can you clarify your testing a bit.

-Are you testing with an array from EMC or NetApp?
-When you state all four links active, is this correct behavior or incorrect (note you don't state whether non primary paths are active).


Richard Gray

Hi Vaughn,

I have tested this now and it works exactly as it should (and without the need for additional products!).

I'm using a FAS3140 and ESX 4 hosts. 2 LUNs provisioned, 1 on FilerA the other on FilerB. Mapped to ALUA enabled iGroups. On the ESX host side, RR enabled for the PSP (followed by a reboot of the host - this is crucial and why in my above post the paths were not showing right).

The above setup now shows each LUN with 4 Active paths, 2 of which are Active I/O paths. These 2 Active I/O paths are the 2 correct path's which belong to the Filer owning the LUN - no vtic/bad port traffic here :)

Being crude and as a test, I remove 1 cable to FilerA. My Active I/O paths drop to 1, still on the correct owning Filer. This is perfect and how it should be as at this point I still dont want traffic across the vtic.

Next, I remove the last cable to FilerA, now the 2 Active paths are 'promoted' to Active I/O paths and obviously now are they routing through FilerB. Again perfect because obviously in the rare event both paths to FilerA fail I still need my LUNs available.

Hope that makes sense but its now exactly the sort of solution it should be. Easy to setup and essentially self managing.
Combined with VSC and I'm confident in the solution and it takes very little time to do right.
1 Thing I would like to see in VSC (I know its early days for it), but that is the option to choose which ESX hosts/cluster to scan for NetApp filers. I have a few clusters in small remote locations connected to MSAs and its an annoyance that they get bothered and as a result the rescan task takes considerably longer to finish.


Richard Gray

Forgot to say, when you restore the links the reverse happens - restore only 1 link to the owning Filer and it becomes the 1 and only Active I/O path, the 'un-optimized' links then both go back to being just Active in state. Restore the last link and its back to the beginning again, 4 paths, 2 I/O to the right Filer :)

The comments to this entry are closed.