Today my good friend Duncan Epping surprised me by shooting me an IM notifying me that he was posting on the impact of partition misalignment and asked if I’d care to comment. Fortunately for me I had time to do just that and this post is an expanded version of what I shared in the comments section at YellowBricks.
I’d suggest that anyone interested begin at YellowBricks before proceeding.
Alignment of the data in a virtual infrastructure to the storage array is critical to performance, scaling, hardware life cycle, and storage efficiencies. The lack of alignment results in an array retrieving more data than what the VM is requesting. This results in inefficiencies on the array that leads to requiring more storage hw resources to serve a workload.
Did You Know...
Misalignment can be found with VMFS datastores and inside of VMs?
There was an old issue with VMFS file systems created in Virtual Center 1.x did not align the VMFS partition to a 128KB offset. I wish I had the KB # at hand. If your datastore is misaligned it does not get corrected with a VMFS upgrade. If you have these old datastores the best course of action is to migrate the VMs and destroy the datastore.
Misalignment within Virtual Machines
To my knowledge the only storage vendors to publish content around the importance of alignment have been EMC & NetApp. In fact, EMC has even updated their incorrect data around the lack of needing to align on NFS (kudos to Chad Sakac of EMC for getting this erroneous data corrected).
I would suggest that anyone interested in understanding more on this issue read the NetApp technical report TR-3747. The content in this document was reviewed and approved by VMware, Microsoft, Citrix, and NetApp.
As for the VI3 document that Duncan referenced in his post, I have some concerns. First it only recommends aligning the virtual disks outside of the VM’s systems drive. It reasoning listed is around a view that the systems drive does not have a high I/O requirement. While I can agree on the merits of this point I disagree on the recommendation. First aligning system drives is hard to accomplish once the system is deployed. I believe this may be closer to the actual reason for the recommendation.
Further more, if one does not align the system partition, the array still has to work harder. Imagine the impact of misaligned data when one reboots a number of VMs say with SRM or View? Misalignment also has a negative impact on data deduplication, which manifests itself in seeing a reduction in the storage savings over time. he reasoning for this is misalignment results in the data being stored in each VM to not be identical on the array and as such the dedupe savings are reduced. As an example, if one deploys a service pack to many VMs.
I believe it is fair to say the premise of misalignment impacting NetApp more than other arrays is over simplified, allow me to elaborate.
What is GOS Type?
I know Windows rather well, and as such I will speak to this OS family.
First modern GOS types like Windows 7, Vista, 2008 implement GPT versus MBR and as such have a 1MB starting partition offset (versus the traditional 32,256 byte with MBR). The 1MB offset is aligned and optimized for every storage array vendor, protocol, and platform. I would like to thank Microsoft for listening to their storage partners input when they began engineering GPT.
If your VMs run Windows NT – 2003 then they most likely are misaligned due to the default starting offset of 32,256 found with MBR partitions. Also, if you upgrade a VM from one of these versions to Windows 7 or 2008 the starting partition offset will remain unchanged.
See TR-3747 for more on this point. It is also covered in TR-3428 (VI3) & TR-3749 (vSphere).
So Why Bother to Align Existing VMs
It’s simple, consider GOS partition alignment a standard for clouds and virtual infrastructures. When you align you ensure the best performance for VMs on any storage platform, over any storage protocol whether it is an internal cloud or an external cloud provider. Isn’t one of the goals of server virtualization hardware independence?
I have discussed this issue in a post titled: I/O Efficiency & Alignment - the Cloud Demands Standards
What Array and Storage File System are You Storing VMs on?
A NetApp array stores data in a 4KB block whether the data is served via SAN or NAS. So if the GOS partition is misaligned then should a VM make a 4KB read request we will read 2 4KB blocks (or 8KB). Most data reads aren’t that small and as such don’t have a 100% read overhead (4KB / 8KB). Say the VM makes a 1MB read request, then we would retrieve 1MB plus an additional 4KB block. In this case the overhead on the array is less than 1%. I would ‘guesstimate’ that most non-busy VMs make requests more in the range of 32KB to 128KB, and as such the overhead for misalignment would be around 7% to 10%.
Storage arrays from other vendors store data in other block, or chunk, sizes. Say your array stores data in a 64 KB block (Maybe EMC can confirm this is the size of the storage chunk used in a Symmetrix with LUNs). In this configuration if the GOS partition is misaligned then should a VM make a 4KB read request we will read a 64KB block. As I’ve stated before most data reads aren’t that small, so let’s consider a 1MB read request. In this case the array would retrieve 1MB plus an additional 64KB block. In this case the overhead on the array is around 1%. So if we consider my premise that many non-busy VMs make requests in the 32KB to 128KB range the overhead with a misaligned 64KB block would be between 200% and 50%.
While ESX/ESXi does aggregate its read requests they are ‘decoupled’ when they hit the array and as such do actually experience this read inefficiencies.
Pleas align the GOS partitions in your VMs. I’d suggest by correcting your templates and your P2V migration process. NetApp provides a free tool, MBRAlign, which can align the partitions for most GOS types. We also provide MBRScan, which can perform an audit of your currently deployed VMs providing you with feedback as to the state of our current deployment. These tools are only supported on the service console on ESX hosts (but some may have fond a way to run on ESXi).
Alternatively, one can check out tools like vOptimizer pro for Vizioncore, which provides a more robust interface with scheduling and reporting mechanisms.
Thanks to Duncan for raising awareness on this topic and to Chad for correcting the recommendations for Celerra NFS datastores.
Do the NetApp tools work in multi-vendor environments? ie: Can I use the mbrscan and mbralign on both NetApp and Clariion LUNs, even though base chunk size is different?
Posted by: Matt | April 08, 2010 at 10:13 AM
@Matt - Yes they do; however, we do not provide support from them when aligning data that does is not accessed thru a NetApp FAS or vSeries controller.
Posted by: Vaughn Stewart | April 08, 2010 at 11:56 AM
@Vaughn - Thanks. I wasn't sure if aligning was the same for 4K and 64K blocks. I appreciate the info.
Posted by: Matt | April 08, 2010 at 12:19 PM
I've been aware of this issue for a while -
the real question for the customer is why can't vmware and netapp put their APIs & engineers together (as they have done very successfully in the past) to solve it with zero downtime for VMs?
http://communities.vmware.com/message/1273452
eg:
If I can storage vMotion the live VM to another datastore why can that process not lay down the destination vmdk's in alignment
Posted by: www.facebook.com/profile.php?id=658313066 | April 08, 2010 at 11:15 PM
Re: [NetApp - The Virtual Storage Guy] www.facebook.com/profile.php?id=658313066 submitted a comment to Raising Awareness Around the Misalignment of Data
@Fletcher
Regarding joint engineering efforts to solve this issue... It appears that VMs cannot be aligned without disruption as the GOS cache SCSI blocks into memory and the realignment process results in files residing at an adjusted SCSI address which does not match what may be in the GOS cache.
So today we have a process where one powers down a VM, aligns the VMDK, and restarts the VM. I hope to share some ways to significantly speed up this process.
NetApp VMware engineering are working on a joint solution, but for now, thats all I can share - damn NDAs ;) In the end these efforts will benefit all deployments as this issue impacts every array.
Posted by: Vaughn Stewart | April 09, 2010 at 06:30 AM
Vaughn,
the link to TR-3747 is broken. You missed the "ht" in "http"
Thanks to you and Duncan for reminding those of us that have aligned to go back and check new systems to be aligned!
Posted by: Nick Howell | April 09, 2010 at 11:07 AM
Re: [NetApp - The Virtual Storage Guy] Nick Howell submitted a comment to Raising Awareness Around the Misalignment of Data
@Nick Thanks, correcting now
Posted by: Vaughn Stewart | April 09, 2010 at 11:13 AM
@Vaughn,
Thanks for the post--very helpful. One question regarding this statement:
"First modern GOS types like Windows 7, Vista, 2008 implement GPT versus MBR and as such have a 1MB starting partition offset (versus the traditional 32,256 byte with MBR)."
I see from my win2k8r2 x64 vm that the offset is correct by default on the system volume as you reference above but I also see that the disk is identified as having a "Partition Style" of "Master Boot Record (MBR)" from the Disk Management app in the OS. It wasn't until I cracked open msinfo32 that I confirmed the proper offset and verified your info. Is Disk Mangler misreporting the parition type?
Thanks,
Scott
Posted by: Scott Howard | April 09, 2010 at 12:06 PM
@Vaughn, Can you discuss why rdm's don't need alignment assuming they are properly presented with the appropriate lun type. I find I am asked to explain it quite often because someone will look at a volume within a guest and say "MSINFO32 says the starting offset for this 200 gig E drive is 32,256 so its not aligned"
I get it.. but how do I explain it
Posted by: Greg | April 09, 2010 at 12:44 PM
Re: [NetApp - The Virtual Storage Guy] Greg submitted a comment to Raising Awareness Around the Misalignment of Data
@Greg To reset when one uses a RDM they are giving direct access to a LUN by a VM. As such, the LUN type should match the type of the GOS of the VM. When one selects a Windows LUN type, as in your case, the array implements a means underneath the LUN which adjusts where the partition starts on the physical disks without any knowledge or interaction from the GOS.
What results is Windows 2000 2003 servers reporting a starting offset of 32,256 (or the default value) while being aligned on the array.
Does this make sense?
Posted by: Vaughn Stewart | April 09, 2010 at 03:28 PM
@Vaughn-You mentioned NA is working on some ways to better/faster perform the cluster alignment. We currently have a large amount of VM's that are misaligned; can you expound on when your solution might be ready for release? I'm trying to avoid a very LONG project.
Posted by: Joe | June 18, 2010 at 01:42 PM
Hi All
Can anyone tell me how to create a misaligned VM on a ESX server.
This would be helpful for my testing.
Thanks
Anee
Posted by: Anee | September 22, 2010 at 06:17 AM