« VMware Cloud Infrastructure & Mgmt Launch | Main

07/15/2011

NetApp & Storage DRS

 

...and what you need to consider when using it with shared storage.

 

VSphere-Suite-5

As we've gone through the vSphere 5 beta program, Storage DRS is the one piece that has both fascinated us, as well as scared us.  As you all know, I'm a huge fan of transparency, and I want to be crystal clear with this post.

I won't speak for EMC, HP, and Dell/EQL, but it's likely that if you're running your vSphere environment on a shared storage array from any of us, you're likely not going to have a pleasurable experience with Storage DRS, unless you just don't like taking advantage of our features.  I'll assume, for this post, that you do. :)

Over the past few years, NetApp has brought to the light storage efficiencies such as thin provisioning, deduplication, snapshot-based backups, et al.  These constructs rely heavily on the storage array being able to manage the storage and these efficiencies at a granular level, most times invisible to the end-user.

So what happens when you move a .vmdk from one volume to another?  What's all the fuss about?

226599

Let's talk about Deduplication first.  As we've all known for some time, a Best Practice of ours is to stuff as many VM's as you can into a single Datastore [volume] in order to get the highest returns on deduplication.  Users can see as high as 80%+ space returned (I know, I was one of them, and did) on their VM's by collectively placing them in the same container with no performance penalty for doing so. Moving data [i.e. vmdk's] between volumes will "un-dedupe" that data being moved, and you will have to re-run the original deduplication scan on that new volume in order to re-coop those savings after the move.  While this is not the end of the world, it is a nuissance, and you should be aware of it as a caveat to StorageDRS, or generally moving VMDK's between datastores.

Thin-prov

Thin Provisioning.  Nothing really "breaks" with thin provisioning, but you're allowing an outside construct to control placement of data on what it "thinks" is a 1TB volume that you've thinly provisioned, when in actuality there is only 100GB truly available to write to.  What happens when Storage DRS moves something there?  Boom.  I think most of you know what happens when a thin provisioned LUN exceeds it's available space.  

Side note: This is another brilliant feature of NetApp that often goes unnoticed, or untalked about.  We have certains settings you can turn on that Autogrow both Volumes as well as the LUNs inside them.


Snapshot

Snapshots.  This one is pretty straightforward.  When we snapshot volumes, or the data inside, we store the initial tier of snaps within that same volume.  What happens when you [or some outside construct] begin to move the items inside the volume to another volume?  Those snaps (i.e. Your Backups) become invalid.

Look, bottom line, at the end of the day, the promise of SDRS is a good one… mitigate capacity and performance issues in storage.  But… it's not sophisticated enough yet to know about all the backend array-specific value-add, and take that into account in the DRS Recommendation algorithm, and you end up causing further issues that may even be more impactful to you than the one you were trying to address in the first place.

 

We'll call this segment Unicorns & Rainbows.

Unicorn_rainbow1 In a perfect world, what would happen is that VAAI would come back into play with some additional primitives.  vCenter needs to be less of the do-er, and more of the middle-man traffic director.  From what I understand from VMware, this is the next evolutionary step of Storage DRS, as well as other things.

 

 

 

I envision a conversation to go something like this:

vCenter: "Hey NetApp!"

NetApp: "Hey vCenter! What's up?"

vCenter: "You know that VM you've got in Datastore1?  It seems to be growing pretty rapidly and is chewing up a lot of disk I/O.  Think you could provision a new datastore and move this VM to it?"

NetApp:  "No problem!"

  • NetApp plugin VSC provisions a new datastore and mounts all hosts in the cluster
  • DataMotion snapshots and moves the VM's files over to the new volume via array-based CopyOffload.

NetApp:  "OK, vCenter, all done!"

vCenter:  "Wow that was super fast!  Thanks!  Looking much better now!"

 

Technically, this is the essence of VAAI; these types of "conversations."  And there's no reason that we shouldn't see Storage DRS become a big part of it in the future.  I'm not privvy to any special information, and I'm not saying that it WILL be done, I'm just an ex-admin like a lot of you still are, and I'm really hoping it's one of those things that gets developed.

 

I'll wrap up this post with what we're likely going to publish as our Best Practice for you to use... a Good/Better/Best sort of configuration scenario.  None of this is set in stone yet, but due to the volume of questions I have received, I didn't want to leave them unaddressed.

Basically, it all comes down to the DRS Recommendations, and how you set the slider bar to throttle the application of those recommendations.

  1. Good: "manual" Storage vMotion is a good solution to migrate data
  2. Better: Storage DRS is a better solution than having to move things manually
  3. Best:  At NetApp, we are going to recommend that you forego the use of Storage vMotion, and instead utilize our DataMotion (remember I like to call it "nMotion") to move your volumes around on the backend storage.  While this does not address individual VM-related moves, it does address performance bottlenecks, as well as offloading the act of the move to the storage controller, and maintaining all of your storage efficiencies that we talked about above.

I'll leave you with a few final recommendations to keep in mind...

1) Set SDRS to manual mode and to review the recommendations before accepting them.

2) All datastores in the cluster should use the same type of Storage (SAS, SATA etc.) and have the same replication and protection settings.

3) Understand that SDRS will move VMDKs between datastores and any space savings from NetApp cloning or deduplication will be lost when the VMDK is moved. Customers can rerun deduplication to regain these savings.

4) After SDRS moves VMDKs, it is recommended to rebaseline Snapshots on the destination datastore.

5) It is highly recommended not to use SDRS on Thinly Provisioned VMFS datastores due to the risk of reaching an out of space situation.

 

If you have any further questions, reach out to me on Twitter, or leave a comment below and we can further the discussion.

-Nick

Comments

Duncan

Hey Nick,

I wrote an article explaining the impact of using SDRS in combination with some of these array capabilities. One thing to keep in mind though is that with vSphere 5.0 multiple enhancements have been introduced to VAAI and to the way vSphere deals with thin provisioned datastores. The picture you've painted is only half of it.

1) UNMAP (part of VAAI) allows for the reclamation of dead space, basically returning blocks to the pool of unused storage and allowing for other volumes to reuse it
2) Part of the Thin Provisioning enhancement of VAAI are the out-of-space conditions. Did you know that SDRS will not make a recommendation when the back-end is out of diskspace?!
3) "BOOM"? Actually only the virtual machine claiming additional blocks is affected and we stun the virtual machine until free space is allocated to the disk. The other virtual machines that don't need space continue to run, and as mentioned in 2 recommendations should not be made when there is no diskspace available.
4) With regards to deduplication I would look at it this way: Would you rather have a constant > 15 ms latency or a volume running out of diskspace than a slightly less efficient deduplication process? I think most of my and your customers will probably go for the latter.

We are working on official recommendations / best practices around these storage capabilities and SDRS. We agree that care needs to be taken, but it is not as black/white as stated here fortunately.

Duncan
My take: http://www.yellow-bricks.com/2011/07/15/storage-drs-interoperability/

Nhowell

@Duncan,

First of all, thanks for the thorough comment, and for the books you and Frank have done on the topic. I thoroughly enjoyed reading them and they greatly improved my knowledge of the topic.

As far as this post, I was just trying to warn people of the pitfalls of turning on SDRS haphazardly, because they DO exist, and if not careful, run the risk of making a huge mess. We can sit here and break down each individual component all day and have semantic arguments about why or why not, but at the end of the day, it IS about the customer, and I'm a firm believer of protecting them from the unknown.

1) UNMAP is awesome! It's unfortunate that it's VMFS only (to my knowledge). There is a lot of movement on our side on Space Reclamation enhancements as well.

2) Yes, but you only know what the datastore knows, correct? You do not know that the logical 1TB volume only has 100GB of free space remaining in the underlying aggregate. That was the point I was trying to make.

3) Properly thin-provisioned LUNs blow up and have to be restored from snapshots in most cases if they are run out of disk space. This is why so much care has to be taken when thin provisioning, and our best practices go into great detail about the Autogrow and Snapshot Delete functionality that was added in recent years to make this easy.

4) I can't answer that, only the customer can, and every customer's answer will be different. Having the ability to make this decision, and having options in both directions is what's important, not one being better than the other.

Duncan

1) correct
2) No, there is a Thin Prov addition to VAAI which will tell us if the LUN is out of disk-space. So back-end reporting.
3) see 2

Sudhir Brahma

A bigger threat is probably from the Mirror mode in Storage Vmotion (Vspher5). The IOs are mirrored between the storage arrays directly. That kind of enables low cost storage arrays to be used instead of the filers, since replication via snapshots may no longer be necessary.

Duncan

good one Sudhir :)

Duncan

Couple of comments I want to make on top of what has already been discussed. The main differentiator between Storage DRS and any technique the Storage Vendors offer is Initial Placement. Storage DRS will place the Virtual Machine, or disk, on a datastore based on I/O load and Space utilization. This by itself is huge as it reduces the need to manually monitor your environment. Where customers needed to figure out which LUN was least loaded from an I/O and diskspace perspective they have SDRS doing it for them...

On top of that: Affinity Rules. This will allow you to keep together VMs or Disks or to keep them separate. Either way it is another huge benefit which no one else provides these days.

The comments to this entry are closed.

TRUSTe CLICK TO VERIFY