DRS Host Affinity in vSphere 4.1
Posted by Larry Touchette
If you are one of the many NetApp customers running VMware on a NetApp MetroCluster solution you'll be interested in a new feature of vSphere 4.1 called DRS Host Affinity.
To be quite brief MetroCluster is a solution that combines synchronous mirroring of aggregate storage containers with separation of cluster nodes into different locations or sites.
There are two flavors of MetroCluster. There is Stretch MetroCluster, in which the cluster is cabled similar to a typical NetApp HA system and the nodes can be separated by as much as 500 meters.
And there is Fabric MetroCluster, which uses FC switches and long haul fiber to increase the distance to 100 kilometers.
If you want more details about VMware and NetApp MetroCluster solutions we recently updated tech report TR-3788 which covers the solutions.
Some customers have opted not to enable VMware DRS in MC environments where the VMware HA cluster spans the MetroCluster sites. The concern is that as DRS is performing vMotion migrations to balance ESX load it will move VMs to hosts that are not located in the same site as the storage for those VMs. The new DRS host affinity feature allows you set a preference to keep VMs on certain ESX hosts. If setup properly this feature can keep VMs running at a certain site.
Here’s a quick walk-through of the setup of DRS host affinity in vSphere 4.1 in such a way to support a NetApp MC environment.
This is the basic environment. One VMware DRS enabled cluster called vCloud-Cluster-B containing four ESX hosts. Assume hosts .119 and .121 are at MetroCluster SiteA and that hosts .120 and .122 are at SiteB. Some of the VMs in this environment have storage at SiteA and some have storage at SiteB.
In the cluster settings dialog create a VM DRS group in the DRS Groups Manager.
I called this group SiteA_VMs and added all the VMs that have storage at SiteA in the box on the right. You can select multiple VMs at a time by holding ctrl, or a range of them with shift/click. Using the “Name contains:” drop down box can help you filter large lists of VMs.
Next add a Host DRS group. I called this group SiteA_Hosts and added all the ESX hosts physically located at SiteA.
Do the same creating a VM DRS Group and a Hosts DRS Group for SiteB VMs and Hosts.
All groups made…
In DSR Rules build a rule associating the correct groups. I called this rule SiteA_Affinity and set the type to “Virtual Machines to Hosts”, this is the new rule type provided in vSphere 4.1. I select the SiteA_VMs group and SiteA_Hosts group I created earlier. Then I select “Should run on hosts in this group” as the type of rule, this is important... this creates what is called a “preferential” rule. This means that if the VMware HA service needs to, it can violate this rule and start VMs on ESX hosts at the other site. To support MC you definetly want HA to be able to do this. If you select the “Must run on hosts in this group” type then a “required” rule is created and the HA service will not violate the rule to restart VMs if these hosts are not available. So for MC support select the “should” rule.
The rule details explain exactly what this rule will do. Keep VMs from the SiteA VM group running on ESX hosts in the SiteA host group.
After doing the same for VMs that should remain in SiteB…
There it is, VMware Site
Affinity in a NetApp MetroCluster environment!
There are several rules governing affinity and how affinity rules are applied in vSphere. Be sure to review the vSphere Resource Management Guide and the vSphere Availability Guide so you'll understand how your rules will be applied when there are conflicts between multiple rules in the environment.
I suspect that as vSphere 4.1 becomes more mainstream this new feature will be of value to many NetApp customers running VMware on MetroCluster.
Hi Larry,
That's a very useful addition to DRS. Most of our customers don't want DRS enabled because of the problem you described. With this feature this problem is past. Thanks for pointing it out..
Cheers, Rene
Posted by: Rene | July 28, 2010 at 03:26 AM
This feature will be very useful to us in another manner. We have a performance sensitive application and hosts that offer various levels of performance (Xeon vs Nehalem). With this rule, I can turn DRS back on for the sensitive VM and know that it will remain on the faster hardware.
Posted by: David Nixon | August 11, 2010 at 12:08 PM
Thanks for the comment David. I believe setups like yours will be one of the more common reasons for enabling this feature.
Posted by: LarryT | August 13, 2010 at 05:43 AM