With today’s release of vSphere, VMware has introduced numerous new features and technologies all designed to advance the virtual evolution of our data centers. With this release VMware and their storage alliance partners have delivered Plug-n-Play SAN storage connectivity. As you begin downloading the final bits, refreshing your labs and planning your upgrade path to vSphere I’d like to dig into the considerations of and how one can deploy a robust, reliable, and greatly simplified P-n-P SAN architecture with vSphere and NetApp.
The Pluggable Storage Architecture (PSA)
There are numerous posts covering the VMware’s PSA; however, in order for us to have an in depth conversation regarding the technical details covered later in this post I’d like to briefly cover this new architecture.
In ESX/ESXi the PSA is a VMkernel layer responsible for managing storage paths. It also provides an open modular framework that coordinates the operation of multipathing plugins (MPPs), which include the VMware Native Multipathing Plugin (NMP).
The NMP supports all storage arrays listed on the VMware storage HCL and provides a path selection algorithm based on the array type. The NMP associates a set of physical paths with a specific storage device, or LUN. The specific details of handling path failover for a given storage array are delegated to a Storage Array Type Plugin (SATP). The specific details for determining which physical path is used to issue an I/O request to a storage device are handled by a Path Selection Plugin (PSP).
ESX/ESXi offers a SATP for every type of array that VMware supports. In addition, the PSA provides support for third party SATPs & PSPs. When a storage scan is initiated in ESX or ESXi the storage array is identified and assigned a SATP. The NMP associates the SATP with physical paths to the storage array and the SATP implements the tasks that include the following:
• Monitors health of each physical path.
• Reports changes in the state of each physical path to the NMP.
• Performs array-specific actions necessary for storage fail-over.
PSPs are responsible for choosing the physical path for I/O requests. Each SATP is associated with a PSP that the NMP assigns for every logical device associated with the physical paths for that device. The default PSP to SATP assignment can me modified. The NMP provides the following PSPs:
• Fixed — Uses the first working path discovered at system boot time unless manually configured. If the host cannot use the preferred path, it selects an alternative available path. The host automatically reverts back to the original preferred path when it becomes available.
• Most Recently Used (MRU) — Uses the first working path discovered at system boot time unless manually configured. If the host cannot use the preferred path, it selects an alternative available path. The host remains on the new path even if the original path has become available.
• Round Robin (RR) – Uses an automatic path selection rotating through all available paths and enabling I/O load balancing across all of the paths.
Support for Asymmetric logical unit access (ALUA)
Asymmetric logical unit access is an industry standard protocol that enables the communication of storage paths and path characteristics between an initiator port and a target port. If your storage array is ALUA compliant (as NetApp arrays are) then you’re in luck because the vSphere NMP is, by default, ALUA aware. In summary ALUA eliminates the need to manually identify preferred I/O paths between ESX/ESXi hosts and storage controllers.
Plug-n-Play FC HBAs & FCoE CNAs
That’s right, more Plug-n-Play goodness. Unlike VI3, vSphere on NetApp FC and FCoE architectures do not require storage adapter configurations. Now I wish it wasn’t the case, but iSCSI HBA implementations will still be required to configure their hardware adapters. As with VI3, these customers have the option of doing so manually or can leverage NetApp’s ESX Host Utilities (EHU) to automate this process.
I would add that the majority of iSCSI customer base leverages the iSCSI software initiator, which does not require any configuration changes with VI3 and vSphere.
Prepare the array and put it together
OK, so we’ve addressed our storage adapters and next let’s take a look at enabling ALUA on our NetApp storage controller. This is a one-time configuration setting that must be completed from the console of the FAS array. This setting is not a global option, it is enabled on a per initiator group (or igroup) basis. If you have followed our best practices you should have one igroup per ESX/ESXi cluster which to configure.
For customers upgrading from VI3 to vSphere they can enable ALUA on their existing igroups without impact to connections made by ESX/ESXi 3.x hosts.
Viola! At this time we have achieved our Plug-n-Play SAN architecture as our ESX/ESXi hosts automatically understand the preferred storage paths and are configured with a fault tolerant multipathing policy. With ALUA enabled our ESX/ESXI hosts automatically identify our storage array, assign a SATP and its associated PSP, which for ALUA enabled NetApp arrays is MRU.
Is there more?
Like any solid out-of-the-box solution there is always room for improvement and this case is no exception. While ALUA & MRU do provide a production worthy storage connectivity architecture this design still requires manual path load balancing. I’m don’t know the reasons why, but with ESX/ESX 4.0 the SATPs for all storage vendors are configured with either the Fixed or MRU PSP.
An ESX/ESXi host configured with the Round Robin PSP receives a fault tolerant storage connectivity architecture that will send I/O down every available path. From an I/O perspective each host will receive the aggregated bandwidth available to each LUN. This is especially good news for customers who connect via iSCSI on 1GbE networks as they can combine the new support for multiple TCP sessions with RR to provide aggregated storage bandwidth.
I’ll post a blog on multiple TCP sessions & RR next week for our iSCSI customers.
It is fairly straight forward to change the VMW_SATP_ALUA (the SATP which NetApp arrays are assigned to) to be configured to use the VMW_PSP_RR. This process can be accomplished in one of two ways. For customer who automate ESX/ESXi deployments and upgrades via scripted installs (such as PXE booting) this change can be done manually by editing your config files.
As an example I have done so on the command line of my ESX 4.0 server.
Alternatively NetApp provides an automated means to make this change via our Virtual Storage Console. The VSC is the evolution of our EHU into a vCenter Plugin which will serve as the foundation for our various plugins like the Rapid Cloning Utility and VMIsight. By default the VSC will identify ESX/ESXi hosts connect to NetApp arrays and for host connected to ALUA enabled igroups it can automatically modify their SATP to use the VMware’s Round Robin PSP instead of MRU.
What about 3rd party SATPs & PSPs?
As you may recall the vSphere PSA provides support for third party SATPs & PSPs. At present the only available third party PSA is PowerPath VE from EMC. The goal of PP VE is to provide capabilities beyond what is natively available within ESX/ESXi. These capabilities are touted by EMC as ‘automating path utilization dynamically optimize performance and availability.’
Now, before you jump up and say ‘I need that’ and run off to purchase PP VE (note it is licensed per CPU), I believe it is critical to discuss ‘why’ you may need PP VE.
Traditional legacy storage arrays have a limit in the number of simultaneous commands that can be processed by a LUN. This limit is called a queue depth. A high volume of simultaneous requests is usually desirable, and results in good performance; however, if the queue becomes full the storage array will respond with a queue-full flow control command.
An ESX/ESXi response to a queue-full command is HBA dependant, but it typically results in the suspension of more than one second.
PowerPath provides multipathing intelligence to ensure that I/O is redistributed across all links in an attempt to avoid a queue-full situation. While this level of intelligence is not available in VMware’s Round Robin Path Selection Plugin you still need to understand more before purchasing.
Queue depth impacts VM performance and datastore scaling
VMware has been educating our joint customers around the impact of a LUN queue depth and how it impacts performance and scaling. Below is a chart that is from the VMware Scalable Storage Performance whitepaper. As you can see from this chart, queue depth in combination with the average number of SCSI commands per VM are the two metrics one should consider when sizing datastore scaling.
So does my array need PowerPath VE?
It depends, what is the queue depth of the LUNs on your array?
NetApp storage arrays virtualize storage and as a result we do not have LUN queue depths, so we avoid LUN queue-full conditions. Please note NetApp arrays do have a queue depth per FC & FCoE target of 2,000 per port. As a reference each FC/FCoE target port is capable of greater than 30,000 IOPs.
By contrast a quick reference of the EMC Clariion Performance and Availability Best Practices with Flare 28.5 document states that a Clariion array has a LUN queue depth that is calculated based on the size of the LUN and the type of data protection in use. In addition, Clariion arrays have a target port queue depth of 1,600.
From page 26 of this document the calculation for a RAID 5 protected LUN is:
This configuration results in a RAID 5 LUN comprised of five disks (4+1) having a queue depth of 88!
If your storage array has a queue depth of 88, I believe we have identified why it is your best interest to purchase PowerPath VE as you will need automated path utilization to dynamically optimize performance and availability. If you don’t and instead use the vSphere Round Robin PSP you risk having poor performance across all of your LUNs should a single LUN issue a queue-full command.
It sure seems that the need for PP VE is a direct result of a less than VMware friendly design within the storage controller.
Maybe a better decision would be to look a NetApp virtualized storage array for your vSphere deployments. By design our arrays don’t have LUN queues. Instead they have target port queues, which are global, very deep, and when combined with RR the queues are aggregated. As I stated earlier each port has a queue of 2,000, or a single dual port target adapter has a queue of 4,000. This design allows NetApp arrays to avoid the issues that can arise with such shallow queues.
The virtualized architecture of a NetApp array is the ideal design for use with VMware’s Round Robin PSP as we don’t have the challenges associated with traditional legacy arrays.
I should have mentioned it earlier in this post, but NetApp has provided the EHU and will provide the VSC at no cost. Maybe EMC will consider doing the same with PP VE as it seems to be required in order to ensure the availability and performance with a Clariion array.
Wrapping it up
Congratulations VMware on the vSphere release, I believe I speak for all of us at NetApp when I say we love they technology and how it is evolving our data centers. The Plug-n-Play SAN architecture is finally here! It is robust, resilient, available at no cost, and with modern storage arrays it is awesome!
There are numerous posts covering the VMware’s PSA; however, in order for us to have an in depth conversation regarding the technical details covered later in this post I’d like to briefly cover this new architecture.
In ESX/ESXi the PSA is a VMkernel layer responsible for managing storage paths. It also provides an open modular framework that coordinates the operation of multipathing plugins (MPPs), which include the VMware Native Multipathing Plugin (NMP).
The NMP supports all storage arrays listed on the VMware storage HCL and provides a path selection algorithm based on the array type. The NMP associates a set of physical paths with a specific storage device, or LUN. The specific details of handling path failover for a given storage array are delegated to a Storage Array Type Plugin (SATP). The specific details for determining which physical path is used to issue an I/O request to a storage device are handled by a Path Selection Plugin (PSP).
ESX/ESXi offers a SATP for every type of array that VMware supports. In addition, the PSA provides support for third party SATPs & PSPs. When a storage scan is initiated in ESX or ESXi the storage array is identified and assigned a SATP. The NMP associates the SATP with physical paths to the storage array and the SATP implements the tasks that include the following:
• Monitors health of each physical path.
• Reports changes in the state of each physical path to the NMP.
• Performs array-specific actions necessary for storage fail-over.
PSPs are responsible for choosing the physical path for I/O requests. Each SATP is associated with a PSP that the NMP assigns for every logical device associated with the physical paths for that device. The default PSP to SATP assignment can me modified. The NMP provides the following PSPs:
• Fixed — Uses the first working path discovered at system boot time unless manually configured. If the host cannot use the preferred path, it selects an alternative available path. The host automatically reverts back to the original preferred path when it becomes available.
• Most Recently Used (MRU) — Uses the first working path discovered at system boot time unless manually configured. If the host cannot use the preferred path, it selects an alternative available path. The host remains on the new path even if the original path has become available.
• Round Robin (RR) – Uses an automatic path selection rotating through all available paths and enabling I/O load balancing across all of the paths.
Support for Asymmetric logical unit access (ALUA)
Asymmetric logical unit access is an industry standard protocol that enables the communication of storage paths and path characteristics between an initiator port and a target port. If your storage array is ALUA compliant (as NetApp arrays are) then you’re in luck because the vSphere NMP is, by default, ALUA aware. In summary ALUA eliminates the need to manually identify preferred I/O paths between ESX/ESXi hosts and storage controllers.
Plug-n-Play FC HBAs & FCoE CNAs
That’s right, more Plug-n-Play goodness. Unlike VI3, vSphere on NetApp FC and FCoE architectures do not require storage adapter configurations. Now I wish it wasn’t the case, but iSCSI HBA implementations will still be required to configure their hardware adapters. As with VI3, these customers have the option of doing so manually or can leverage NetApp’s ESX Host Utilities (EHU) to automate this process.
I would add that the majority of iSCSI customer base leverages the iSCSI software initiator, which does not require any configuration changes with VI3 and vSphere.
Prepare the array and put it together
OK, so we’ve addressed our storage adapters and next let’s take a look at enabling ALUA on our NetApp storage controller. This is a one-time configuration setting that must be completed from the console of the FAS array. This setting is not a global option, it is enabled on a per initiator group (or igroup) basis. If you have followed our best practices you should have one igroup per ESX/ESXi cluster which to configure.
For customers upgrading from VI3 to vSphere they can enable ALUA on their existing igroups without impact to connections made by ESX/ESXi 3.x hosts.
Viola! At this time we have achieved our Plug-n-Play SAN architecture as our ESX/ESXi hosts automatically understand the preferred storage paths and are configured with a fault tolerant multipathing policy. With ALUA enabled our ESX/ESXI hosts automatically identify our storage array, assign a SATP and its associated PSP, which for ALUA enabled NetApp arrays is MRU.
Is there more?
Like any solid out-of-the-box solution there is always room for improvement and this case is no exception. While ALUA & MRU do provide a production worthy storage connectivity architecture this design still requires manual path load balancing. I’m don’t know the reasons why, but with ESX/ESX 4.0 the SATPs for all storage vendors are configured with either the Fixed or MRU PSP.
An ESX/ESXi host configured with the Round Robin PSP receives a fault tolerant storage connectivity architecture that will send I/O down every available path. From an I/O perspective each host will receive the aggregated bandwidth available to each LUN. This is especially good news for customers who connect via iSCSI on 1GbE networks as they can combine the new support for multiple TCP sessions with RR to provide aggregated storage bandwidth.
I’ll post a blog on multiple TCP sessions & RR next week for our iSCSI customers.
It is fairly straight forward to change the VMW_SATP_ALUA (the SATP which NetApp arrays are assigned to) to be configured to use the VMW_PSP_RR. This process can be accomplished in one of two ways. For customer who automate ESX/ESXi deployments and upgrades via scripted installs (such as PXE booting) this change can be done manually by editing your config files.
As an example I have done so on the command line of my ESX 4.0 server.
Alternatively NetApp provides an automated means to make this change via our Virtual Storage Console. The VSC is the evolution of our EHU into a vCenter Plugin which will serve as the foundation for our various plugins like the Rapid Cloning Utility and VMIsight. By default the VSC will identify ESX/ESXi hosts connect to NetApp arrays and for host connected to ALUA enabled igroups it can automatically modify their SATP to use the VMware’s Round Robin PSP instead of MRU.
What about 3rd party SATPs & PSPs?
As you may recall the vSphere PSA provides support for third party SATPs & PSPs. At present the only available third party PSA is PowerPath VE from EMC. The goal of PP VE is to provide capabilities beyond what is natively available within ESX/ESXi. These capabilities are touted by EMC as ‘automating path utilization dynamically optimize performance and availability.’
Now, before you jump up and say ‘I need that’ and run off to purchase PP VE (note it is licensed per CPU), I believe it is critical to discuss ‘why’ you may need PP VE.
Traditional legacy storage arrays have a limit in the number of simultaneous commands that can be processed by a LUN. This limit is called a queue depth. A high volume of simultaneous requests is usually desirable, and results in good performance; however, if the queue becomes full the storage array will respond with a queue-full flow control command.
An ESX/ESXi response to a queue-full command is HBA dependant, but it typically results in the suspension of more than one second.
PowerPath provides multipathing intelligence to ensure that I/O is redistributed across all links in an attempt to avoid a queue-full situation. While this level of intelligence is not available in VMware’s Round Robin Path Selection Plugin you still need to understand more before purchasing.
Queue depth impacts VM performance and datastore scaling
VMware has been educating our joint customers around the impact of a LUN queue depth and how it impacts performance and scaling. Below is a chart that is from the VMware Scalable Storage Performance whitepaper. As you can see from this chart, queue depth in combination with the average number of SCSI commands per VM are the two metrics one should consider when sizing datastore scaling.
So does my array need PowerPath VE?
It depends, what is the queue depth of the LUNs on your array?
NetApp storage arrays virtualize storage and as a result we do not have LUN queue depths, so we avoid LUN queue-full conditions. Please note NetApp arrays do have a queue depth per FC & FCoE target of 2,000 per port. As a reference each FC/FCoE target port is capable of greater than 30,000 IOPs.
By contrast a quick reference of the EMC Clariion Performance and Availability Best Practices with Flare 28.5 document states that a Clariion array has a LUN queue depth that is calculated based on the size of the LUN and the type of data protection in use. In addition, Clariion arrays have a target port queue depth of 1,600.
From page 26 of this document the calculation for a RAID 5 protected LUN is:
(14 * (# of data drives) + 32)
This configuration results in a RAID 5 LUN comprised of five disks (4+1) having a queue depth of 88!
If your storage array has a queue depth of 88, I believe we have identified why it is your best interest to purchase PowerPath VE as you will need automated path utilization to dynamically optimize performance and availability. If you don’t and instead use the vSphere Round Robin PSP you risk having poor performance across all of your LUNs should a single LUN issue a queue-full command.
It sure seems that the need for PP VE is a direct result of a less than VMware friendly design within the storage controller.
Maybe a better decision would be to look a NetApp virtualized storage array for your vSphere deployments. By design our arrays don’t have LUN queues. Instead they have target port queues, which are global, very deep, and when combined with RR the queues are aggregated. As I stated earlier each port has a queue of 2,000, or a single dual port target adapter has a queue of 4,000. This design allows NetApp arrays to avoid the issues that can arise with such shallow queues.
The virtualized architecture of a NetApp array is the ideal design for use with VMware’s Round Robin PSP as we don’t have the challenges associated with traditional legacy arrays.
I should have mentioned it earlier in this post, but NetApp has provided the EHU and will provide the VSC at no cost. Maybe EMC will consider doing the same with PP VE as it seems to be required in order to ensure the availability and performance with a Clariion array.
Wrapping it up
Congratulations VMware on the vSphere release, I believe I speak for all of us at NetApp when I say we love they technology and how it is evolving our data centers. The Plug-n-Play SAN architecture is finally here! It is robust, resilient, available at no cost, and with modern storage arrays it is awesome!
Great post Vaughn.
Native / integrated failover support with high performance load-balancing via automatic queue depth management is exactly what MPIO should be.
Kudos to the vSphere team!
Posted by: Val Bercovici | May 21, 2009 at 12:40 AM
Thanks as well -- very helpful.
A similar breakdown focused on NFS would be wonderful as well.
Posted by: Andrew Miller | May 21, 2009 at 10:07 AM
Chad Sakac and I are wrapping up a joint post from EMC & NetApp on NFS. Stay tuned!
Posted by: Vaughn Stewart | May 21, 2009 at 12:23 PM
Vaughn - thanks, there are a few things that need correction. There are major difference between the SATP (Active)+PSP (RR) combination and PP/VE.
Here are a couple:
1) RR is literally round robin - one I/O down one path, then one I/O down another path. There is no treatment for varying degrees of congestion. This can happen for many reasons. PP/VE will dynamically rebalance paths on an I/O by I/O basis. It's analagous to pinning guest CPUs to pCPU vs. letting the vmkernel scheduler sort it out.
2) On port congestion, you're right that it's usually the array ports that are the source of congestion - but it's not generally due to the port queue depth or LUN queue depth in the array - despite the CLARiiON slam :-) but rather the service time. The service time is a function of how busy the spindles and the storage processors/filer heads are - not the ports themselves. I'm guessing most customers have 14+2 aggregates - so the service time is a function of how busy the aggregate is (like on the CX, it's a function of the RAID group configuration). Really deep queues with bad service times = really bad latency. But - when the port queues do start to fill - (and they do - with all arrays amigo :-) - that's exactly why PP/VE (and not RR) is so important. Backing off the array port that is particularly busy can improve the entire IO path efficiency - from the ESX host to the array - and efficiency is what it's all about.
3) ALUA - Most of the major mid-range array players support ALUA (and MPIO and NMP) - it's certainly true of EMC, HP, LSI Logic, NetApp - open is good. The only reason we add PP/VE is if we can demonstrate "better than free" - after all PP/VE (which uses the MPIO framework) isn't free, so every customer needs to evaluate it's value relative to what NMP/MPIO deliver on their own. In the ALUA case, it's particularly important - as the first "A" stands for Asymmetric. In ALUA, there is always a "good" path and a "bad" path - where an internal bus is being used to transverse between storage controllers (aka Storage Processors/FAS heads). These internal busses are generally used for other things (CX case = CMI bus for write cache mirroring; NetApp case - I believe - please correct me if I'm wrong - it's used for partitioning and mirroring the NVRAM). These buses are not designed to be able to handle the entire workload of the aggregate of all the front-end ports. Hence, Asymetric Logical Unit Access (ALUA). Round Robin treats all paths equally. For ALUA to really make sense, you certainly don't want all paths treated equally - as loading the bad path = degraded overally array performance.
I'm not saying NMP sucks, or PP/VE rocks - only that this is a topic where there is a lot of "there there".
Thanks for the post - now let's finish that NFS post :-)
Posted by: Chad Sakac | May 26, 2009 at 01:07 PM
Chad,
Thanks for the dialog and feedback. I believe it adds to the quality of the conversation.
You are correct, poor service times (or hot disks) do make for poor I/O performance. No multipathing technology in the world resolves this issue.
Let us not conflate this design concern with needing to balance I/O to individual object queues versus I/O which is sent to global pooled queues.
Scaling queue depth globally seems to be a more advanced architecture. The storage array will have significant I/O load as it must manage the aggregated workload of ESX/ESXi clusters. Eliminating LUN queue-full events from occurring allows one ot not have to worry about issues where a hot LUN adds latency to a storage path and in turn negatively affects the I/O of other VMs.
IMO - purchasing advance path management software is a requirement to solve an issue which has its roots in physical, not virtual, architectures.
I will blog more on this topic next week.
Posted by: Vaughn Stewart | May 26, 2009 at 02:04 PM
Vaughn, unfortunately you seem to be missing the benefits that PowerPath provides as virtualization and consolidation increases. PowerPath/VE is actually MORE critical for virtual environments where maximizing the utilization of a given investment while minimizing the effect on performance is a primary design criteria. As Chad said, PowerPath determines the best path on an IO by IO basis based on expected service time (service time is the time an IO spends in the queue + the response time for that IO once it is sent). Your focus on queue depth is only part of the equation and a minor one at that considering that the vast majority of customers don’t queue enough IO. In fact there are probably at least 99 customers who aren't even filling their queues for every 1 who is queuing too much. The thing is that service time is not entirely dependent upon LUN, target or Port load, but is instead often effected by traffic patterns over shared network paths especially as consolidation ratios increase. This means that something as simple as a batch process on a VM running on another ESX server in the cluster could result in increased latencies for ALL IOs over those shared network resources (even if there is almost no queuing at all). This is because increased load (ie. increased utilization) results in increased response times for ALL IO on that shared path. This is proven using simple and widely accepted queuing theory. Little’s Law (Response Time=Service Time/1-utilization), shows that as utilization increases, any incremental bump in utilization has an exponentially larger effect on response time. This means that higher utilization rates are more sensitive to utilization fluctuations resulting in unpredictable performance the greater you consolidate and this is true regardless of what protocol you are using. Hell, it doesn’t even matter what you are queuing. Understanding this principal will change how you select a line at a toll booth or supermarket ;)
Now, isn’t the effective increase in utilization of assets a fundamental tenet of virtualization and a cornerstone to its value prop? if so, wouldn't you want to look at a tool that could help you drive up asset utilization.
In reality, what PowerPath does is that it recognizes network and storage fluctuations (by monitoring response times) then factors in queue depths to determine the expected service time for any given IO. Based on this PowerPath rebalances loads accordingly, guaranteeing each IO the fastest service time possible over ALL available paths. This is why we typical experience 2x -3x lower response times and 2x-4x more throughput over a given set of resources.
Finally, as 10:1 or 20:1 consolidation becomes 100:1 or 200:1 in the near future, converged fabrics will be a requirement. In these 10Gb or faster networks, iSCSI, NFS, FCoE, and production, backup and management networks will all traverse the same physical cables. This means that fluctuations in any type of communication could affect storage performance. As you said, PowerPath is the only product of its kind on the market that can be used to properly navigate these fluctuations.
So, what this shows is that PowerPath/VE is actually MORE critical for virtual environments where maximizing the utilization of a given investment while minimizing the effect on performance is a primary design criteria. PowerPath, in effect, allows us to more efficiently leverage the physical assets that we have deployed.
Posted by: Aaron Chaisson | May 26, 2009 at 07:21 PM
Aaron,
Thanks for the reply. It appears that NetApp has some work to do educating the world on our storage architecture and how it avoids these issues.
I am not challenging the capabilities of PP VE. Rather I am pointing out that the need for this technology is directly linked to the technology inherent in the storage array.
In your reply you repeated Chad's point that a hot LUN can present the same negative effect on performance as a full LUN queue.
On this point I agree with you, so let's discuss how we avoid this scenario.
With a NetApp FAS, V-Series, and IBM N-Series arrays our LUNs aren't bound to specific disk drives. Instead they are virtual and are serviced by a pool of drives referred to as an aggregate.
This info should not be new to you. Chuck had been publicly knocking our LUN virtualization implementation for years.
This design reduces hot spots with LUNs and LUN queues. As such our customers do not required to purchase additional multipathing technologies.
Now don't get me wrong here. NetApp Array's still require utilization monitoring like an EMC array. With NetApp array one monitors and addresses global or pool utilization levels not individual LUNs.
If this model seems familiar it should as it is very similar to how one monitors resource utilization levels within an ESX/ESXi cluster.
I'd like to extend an invitation to demonstrate our technology for you.
Thanks again for the follow up.
Posted by: Vaughn Stewart | May 27, 2009 at 06:09 AM
Further to Aaron's post, there are several key benefits to using PowerPath/VE in VMware vSphere environments:
• Automation: PowerPath/VE lets you automate optimal server, storage, and path utilization in a dynamic virtual environment to provide predictable and consistent information access—just set it and forget it. While native multipathing products require constant monitoring and manual rebalancing, PowerPath/VE automatically rebalances the I/O to maximize performance and availability.
• Standardization: With PowerPath/VE, PowerPath now unifies the management of major server, operating system, and virtual platforms across virtual and physical environments. Rather than deploy and support a variety of point products, customers can use PowerPath/VE to standardize on a single multipathing solution across entire environments.
• Protection: PowerPath/VE lets you insulate the virtual machine guest operating system and applications from path management, and automatically detects and reroutes I/O following a path failure to ensure that applications stay online. Since PowerPath Multipathing capabilities are added to the VMware kernel, the virtual machine guest operating system has no knowledge of PowerPath, so applications running on the guest virtual machine are shielded from potential disruption.
• Simplified management: PowerPath/VE eliminates the need to monitor and rebalance the dynamic environment. It also reduces the number of multipathing software variants to manage across heterogeneous physical and virtual environments.
• Improved performance: PowerPath/VE increases the number of always-active initiators, resulting in increased performance with multiple data streams per host. I've received reports from beta customers of a 2X performance improvement.
Additionally, it's important to clarify that PowerPath/VE for vSphere is a MPP (Multi Path Plugin) similar to what NMP is, but more advanced and intelligent. It is not a 3rd party/vendor created SATP (Storage Array Type Plugin) or PSP (Path Selection Policy) as initially indicated.
Posted by: Tom Giuliano | May 27, 2009 at 08:00 AM
Tom,
Thanks for jumping into the discussion; however, what is missing from this conversation is the root cause; traditional storage architecture which include individual LUN queues.
Let's not forget this problem is exacerbated by architectures with shallow queue depths.
I don't believe we are in disagreement. If a customer has an EMC array they really need to consider PP VE.
Alternatively customers can consider storage arrays which virtualize storage access and management. This design provides block level data deduplicaiton, integrated snapshot based backups whose storage can also be used with SRM, zero cost and I/O offloading VM clones, and more all while avoiding said root cause.
Why purchase PP VE to fix an issue associated with physical architectures when you can implement a virtual one instead?
Posted by: Vaughn Stewart | May 27, 2009 at 10:49 AM
;) Thanks for the invite. Next time I’m down in Raleigh, I’d love to see your technology in the hands of an expert. That being said, the problem with this post is that you are trying to disparage EMC arrays indirectly by discussing a tool that you clearly don’t actually understand. You keep talking about “traditional” vs. “virtual” storage as the center point of this discussion (terms that alone could be debated … with abject futility over this medium). What we are telling you that you are starting with a false premise. Our point is that PowerPath is not now, nor ever was, a tool designed to solve a “storage problem” inherent in our or anyone else’s array and no matter how many times you keep trying to say it, it doesn't make it true. PowerPath is a tool designed to optimize performance resulting from anything that could effect PATH efficiency and the fact is that as consolidation rates increase, such as the case with server virtualization, efficiency becomes more important, not less.
Posted by: Aaron Chaisson | May 27, 2009 at 08:56 PM
Very revealing discussion you've spawned here Vaughn. I have always been a fan of PowerPath in the early days of SAN's before host OS support became SAN-aware.
But in this decade, from an overall configuration management perspective in a highly virtualized environment at scale (i.e. such as one would find in Cloud Infrastructures) - removing as many 3rd-party layers as possible is key to overall simplicity.
Therefore, I still stand by my original statement:
"Native / integrated failover support with high performance load-balancing via automatic (global) queue depth management is exactly what MPIO should be."
Many of EMC's articulate arguments here are very storage admin-centric. I content that while intelligent storage adds tremendous value to a server virtualization environment - it should do so as transparently as possible to application and server admins.
After all, storage is still the tail to Server Virtualization's dog :)
Posted by: Val Bercovici | May 28, 2009 at 08:49 PM
Val - I agree :-)
From one Canadian to another with respect....
Configuration of multipathing (even in vSphere) is complex, manual, and involves a lot of knowledge of the array internals. I get the sense that you haven't tried this on any vSphere environment of any small scale. Aaron and I have.
BTW - this is true of SATP + RR (which Vaughn discusses) with block devices, and also of NFS datastores. It's complex at moderate scale, hard at enteprise scale, and updating when changes happen is extra hard (in spite of being MUCH MUCH better than ESX 3.x)
Conversely - you install PP/VE via the vSphere Host Utilities and that's it - DONE.
Again - this isn't because we don't believe in the native NMP. For a bit of fun - issue this command against a vSphere ESX 4 host: "esxcli nmp satp list". How often does EMC appear? That, my respected colleague, represents engineering work - plain and simple.
EVERY EMC array was supported on the VMware vSphere HCL **DAY 1**. No other VMware partner pulled that off. That, my respected colleague, represents engineering work - plain and simple.
Likewise our goal is to make storage in the Virtual Datacenter INVISIBLE.
we're doing that in the current vSphere generation via vCenter plugins that are freely available (I know the NetApp ESX Host Utilities are available, not sure on cost) and via making the arrays themselves (for NFS and VMFS use cases) VM-object aware - post going up on that soon.
Our management suite integrated with the vCenter APIs and ESX CIM APIs and were VMware Ready a full year before Onaro and VMinsight.
In future vSphere generations - this will only increase via the vStorage VAAI integration (vStorage APIs for Array Integration - formerly known as vStorage APIs, formerly known as VMware Aware Storage - thank goodness VMware is better at making products than sticking with names (personally, I like the first - VMware Aware Storage describes what the goal is).
Now why do I say this all? Only to point out that NetApp is a respected competitor - but let there be no implication that EMC isn't laser focused on this and executing.
We each are working make our solutions as rich, as simple, as clean, as available, as flexible, as invisible, and as VMware-integrated as possible.
This is a view I believe NetApp and EMC share (most other vendors don't even show up for the VAAI calls) - and each customer needs to evaluate how well we are executing against that if they agree with that direction.
But this is, indeed a technical thread, not a marketing one.
"Native / integrated failover support with high performance load-balancing via automatic (global) queue depth management is exactly what MPIO should be."
Right-o. NMP (SATP+PSP) doesn't do that. PP/VE does.
I can tell you that I enjoy having NetApp as a competitor - I spring out of bed in the morning with a smile on my face and work all day to try to delight customers and work with our engineers to innovate, and a respected competitor is an important part of that equation.
Our focus on winning here on our merits and execution is unwavering, and overwhelming.
We have solutions that sell to process-admins, service-desk owners, server/VM configuration owners, desktop/client onwers, SLA owners, Exchange/SQL Server/Sharepoint/Oracle/SAP owners, security owners, backup-admins, VMware-admins, and yes, Storage-admins - and that's not a bad thing - that's a good thing.
This post is Storage Centric, sure - but it's discussing a storage nit. If you think that characterizes EMC - well - please keep thinking that! :-)
Closing the same way I opened - with agreement - storage is the tail to Server Virtualization's dog! :-)
Cheers - and looking forward to seeing you on the battlefield!
Posted by: Chad Sakac | May 29, 2009 at 05:17 PM
I'm a little confused on something. Vaughn, you said:
"With a NetApp FAS, V-Series, and IBM N-Series arrays our LUNs aren't bound to specific disk drives. Instead they are virtual and are serviced by a pool of drives referred to as an aggregate.
*cut*
This design reduces hot spots with LUNs and LUN queues. As such our customers do not required to purchase additional multipathing technologies."
To me, LUN I/O performance and PATH I/O performance/failover are separate issues. Having a LUN that performs well does not eliminate the need for multipathing. One would assume multiple LUNs would be presented down the same path, and as that path reaches saturation, additional paths would need to be used to maintain throughput to the LUNs. It is a basic multipathing concept that multiple paths can load balance the aggregate throughput to multiple LUNs, thereby eliminating the issue of static paths saturating without a way to provide more throughput. If the path were to saturate, it wouldn't matter how fast the LUN was, performance would still be degraded. Multipathing solves this issue.
Did I miss something?
Posted by: Michael Lozano | June 04, 2009 at 11:56 AM
Michael,
You asked, "Did I miss something?"
I believe you have.
In my post it doesn't state or imply there is no need for multipathing.
I clearly stated there is no need for the sophistication (and expense) of PPVE if your storage array doesn't have the issues associated with per LUN I/O queues.
The need for PPVE is exacerbated with arrays which implement shallow LUN I/O queues.
Remember VCE - Virtualization Changes Everything (as I am known to say)
I'd like to extend an offer for you to stop by a NetApp office and I'll coordinate a demonstration of our vitrualized storage architecture for you.
I'll have more on this topic next week - so stay tuned.
BTW - is this the Michael Lozano who is a Microsoft Solutions Consultant at EMC? If so, welcome to the blog.
Posted by: Vaughn Stewart | June 04, 2009 at 11:31 PM
Vaughn - in normal Chad/Virtual Geek style, there was too much to simply put in comments responses. There was also stuff that I believe was incorrect, and materially incorrect.
So - there's a (as per normal - completely long-winded and detailed) post up now. I tried hard to keep the tenor right, because after all, this is engineering, not competitive positioning.
http://virtualgeek.typepad.com/virtual_geek/2009/06/vmware-io-queues-micro-bursting-and-multipathing.html
Also, some corrections I thought weren't material enough for the post on my site:
"I should have mentioned it earlier in this post, but NetApp has provided the EHU and will provide the VSC at no cost. Maybe EMC will consider doing the same with PP VE as it seems to be required in order to ensure the availability and performance with a Clariion array. "
Umm - EMC provides our vCenter Plugins all completely free.
1) EMC Storage Viewer: is similar to the EHU - though I would argue the EMC one is better, and was available before the vCenter plugin of EHU.
2) VDI Deployment Tool which is similar to RCU - though to be fair the RCU is better (working on it!), and was available before the EMC plugin
3) SRM failback plugin - which I would argue NetApp has no equivalent (correct me if I'm wrong)
Of course - with "plugin explosion" happening, we're integrating into a next-gen single framework - great minds think alike. this is the EMC version of the VSC (which BTW, we were working on before we saw your and Nick's post on the topic).
The fact that PP/VE is an additional cost item isn't because:
a) we don't support NMP, RR and ALUA (we do, of course)
b) that it's needed in any way to get past "legacy array" issues. Ha - had me laughing on that one! the core NetApp architecture has been consistent through to the very early 90s, similar to the CLARiiON - they are both "virtualized" and also both "legacy" at the same time. Let's be honest, only something designed in the last decade can be called "non-legacy" (and yes, we've both made many, many incremental innovations throughout, but consistently based on an architectural model that is fairly old. There is a nice way to say "old, legacy architecture" - that is "mature" :-)
So why is it a chargable item? It's because it doesn't simply provide a platform optimization, or alternate view/management model.
PP/VE offers customers (and not just EMC folks - HP, HDS, and IBM as well) something that is BETTER (adaptive and predictive queue management and path management) than NMP RR, and with an automated (ergo scales better to large enterprises) than the very manual setup that remains with NMP RR.
Every customer is given the choice to determine if they think the value is worth the cost. On that note LOL.... As if NetApp software isn't generally a chargeable item at some of the industries' highest margin - a reflection of good customer value - that's not a bad thing.
If/when NetApp decides to truly integrate with the Pluggable Storage Architecture (hint, this involves writing a SATP, PSP or MPP, not just saying "hey we support what's out of the box") and offer that for free, then you can cast a stone here. For example, it looks like Dell/EqualLogic has written a third party PSP (which VMware has recently asked everyone to start calling a MEM - Multipathing Extension Module). Not sure what it does, and whether it's free, but THEY can critique pricing if it's free - and then customers determine value between the two (if it is a PSP, that's very different than an MPP)
Please avoid painting that "evil" picture which is consistent in the post (and many, many NetApp posts).
We're not evil. We're good people. Like any company, we're not monolithic. I don't think NetApp is evil or malicious.
We're working to innovate, and help customers. We're not perfect (and when we make an error, we will work to correct). Every customer chooses value and fit for their environment, and choose their partners.
Looking forward to seeing you at VMworld in San Fran, and the first cocktail is on me!
Posted by: Chad Sakac | June 20, 2009 at 04:24 PM
Chad, buddy, come on. Evil? I never said or implied such. A legacy, monolithic, out-dated architecture which requires host based software to make it perform better... absolutely!
This isn't personal, so please don't take it as such.
Our array architecture is more apropos for server virtualization than a traditional legacy storage array.
Don't believe me, I work for a vendor check out the market share growth between the 2008 Goldman Sachs IT Spending Survey and the 2009 Forrester Storage Choices fir Virtual Server Environments. You know what I'm referring to.
Which storage vendor gained 18% share?
Now ask why... (hint: storage virtualization matters)
Posted by: Vaughn Stewart | July 07, 2009 at 02:37 PM
Ugh,reading this he said she said stuff gets very old and annoying, especially when it is EMC NetApp. As a happy current NetApp customer, and former EMC customer (I was also happy with their arrays so maybe I'm easy to please) I would like to see something like this from NetApp. Yes, I know we can do RR, but a little more logic would be nice.
Would I pay per CPU on my ESX hosts, no chance in hell. Would I license it per controller head, as NetApp does most of their licensing, absolutely.
Posted by: Erick Moore | July 13, 2009 at 02:04 PM
Are you proposing RR Mpath for active active clusters? Are there potential issues with partner path problems as we have seen in ESX 3.x using esxcfg-mpath? Does this now mean no more running config_mpath when I add new luns?
Posted by: gballou | August 31, 2009 at 08:05 AM
@gballou - so long as the array can present a LUN on all ports (owning and non-owning processors on what VMware calls "active/passive" arrays such as NetApp and CLARiiON CX4 configured in ALUA mode supporting SCSI-3 reservations & what VMware calls "active/active" models such as HDS and Symmetrix), then YES.
I can't/won't speak for NetApp, but speaking for EMC, we absolutely support NMP RR in both those CX4 and Symmetrix configurations, and on our Celerra iSCSI target. That means no more config_mpath.
PowerPath/VE completely updates the whole vSphere multipathing stack - for example not requiring any path discovery via rescans, testing path state (potentially resolving problems before failed I/Os), and also changes the path selection for an IO from a simple round robin to adaptive (using ESX queue depth) with EMC and other 3rd party arrays (check with Elab for support matrix), and if using EMC arrays also factors in the target port queue depths (which are future predictors of host queuing behaviour).
Posted by: Chad Sakac | November 17, 2009 at 03:37 PM
There's going to be a time when vendor specific stuff starts dying like flies. Remember the Eddie Murphy skit?
What have you done for me lately?
Both NetApp and EMC are being challenged by commodity hardware + FOSS in the mid range. VMware+ZFS is a compelling story but not mature enough for my organization. I also expect that to change.
But neither company is a sitting target. VMware made bank on the fact that people where so focused on clock speeds they forgot there where other parts to their computers.
NetApp is the core of my current data center and has allowed us to do a better job cheaper from an infrastructure point of view than our former storage vendors. They figured out the game ahead of the pack, it will be interesting to see how far 8 is ahead of the pack. We all know they took their time and Dave etc all are smart folks. They know they still have an edge good enough to keep customers from moving to FOSS.
Posted by: Jeremy Page | January 05, 2010 at 08:12 PM
Jeremy,
High value features in any market always become commoditized in time. Do you recall when power windows were a high end feature in an automobile? What about seat-belts?
We live in an age where we all must innovate or die.
I trust the leaders at the helm @ NetApp will allow us to continue to move forward.
Thank you for being a customer, and if there's anything I can do to assist you please don't hesitate to ask.
Posted by: Vaughn Stewart | January 06, 2010 at 02:28 PM
Hi guys, interesting blog. I've always been frustrated by earlier versions of ESX and configuring the right path selection etc. It is nice to see there has been progress since I have mostly left FC for the easy life of NFS.
Chad, your original comments about using RR with ALUA as being less than ideal because it will still send data via 'bad ports' - I share this concern but from my understanding ALUA actually declares path states and therefore the ALUA host will select only states declared as active-optimized and not paths declared as active-unoptimized.
However it will be interesting to test this out as after selecting RR for my ALUA provisioned LUNs and then looking on the ESX host - all 4 path's show as active...you guys any thoughts on this and how best to go about testing it? I dont like to always accept things as gospel with out seeing it first hand.
Rich.
Posted by: Richard Gray | February 08, 2010 at 03:53 AM
@Rich - Your understanding is correct. Can you clarify your testing a bit.
-Are you testing with an array from EMC or NetApp?
-When you state all four links active, is this correct behavior or incorrect (note you don't state whether non primary paths are active).
Thanks
Posted by: Vaughn Stewart | February 08, 2010 at 11:42 AM
Hi Vaughn,
I have tested this now and it works exactly as it should (and without the need for additional products!).
I'm using a FAS3140 and ESX 4 hosts. 2 LUNs provisioned, 1 on FilerA the other on FilerB. Mapped to ALUA enabled iGroups. On the ESX host side, RR enabled for the PSP (followed by a reboot of the host - this is crucial and why in my above post the paths were not showing right).
The above setup now shows each LUN with 4 Active paths, 2 of which are Active I/O paths. These 2 Active I/O paths are the 2 correct path's which belong to the Filer owning the LUN - no vtic/bad port traffic here :)
Being crude and as a test, I remove 1 cable to FilerA. My Active I/O paths drop to 1, still on the correct owning Filer. This is perfect and how it should be as at this point I still dont want traffic across the vtic.
Next, I remove the last cable to FilerA, now the 2 Active paths are 'promoted' to Active I/O paths and obviously now are they routing through FilerB. Again perfect because obviously in the rare event both paths to FilerA fail I still need my LUNs available.
Hope that makes sense but its now exactly the sort of solution it should be. Easy to setup and essentially self managing.
Combined with VSC and I'm confident in the solution and it takes very little time to do right.
1 Thing I would like to see in VSC (I know its early days for it), but that is the option to choose which ESX hosts/cluster to scan for NetApp filers. I have a few clusters in small remote locations connected to MSAs and its an annoyance that they get bothered and as a result the rescan task takes considerably longer to finish.
Rich.
Posted by: Richard Gray | February 09, 2010 at 04:07 AM
Forgot to say, when you restore the links the reverse happens - restore only 1 link to the owning Filer and it becomes the 1 and only Active I/O path, the 'un-optimized' links then both go back to being just Active in state. Restore the last link and its back to the beginning again, 4 paths, 2 I/O to the right Filer :)
Posted by: Richard Gray | February 09, 2010 at 04:13 AM