In part one of this series I introduced you to Transparent Storage Cache Sharing and introduced the use case of serving virtual machines and their binaries. The post received some great feedback and allowed me to realize that I wasn’t as clear as I had hoped to be in beginning this conversation.
Before we continue the conversation by looking at additional use cases, I think I should clarify a few details around traditional storage caches in order to effectively communicate the advantages of TSCS enabled storage arrays.
It’s fair to say that storage caches are available in different sizes, provide various predictive cache algorithms or means of read ahead in order to increase total storage performance. In their operation the cache stores data for use by a single VM in hopes that any subsequent requests from this same VM can be satisfied from cache before accessing disk. Any subsequent requests from any other VM cannot utilize the data cached for the first VM.
To put it another way, traditional array caching mechanisms lack transparency. They lack the ability to share a single reference between discrete objects such as two VMs.
TSCS provides such transparency and as a result discreet client requests can be served from a single instance of data in the array’s cache. This ability to serve multiple requests from a single copy of data provides a significant increase in array performance by serving more data requests from cache. This benefit provides a significant decrease in disk IO requests.
This capability is also available with any storage protocol be it SAN (FCoE, FC, iSCSI) or NAS (NFS, CIFS).
For today’s post I’ve added an I/O cache ratios to my images. This ratio depicts the number of discreet items that are served by a single cached item. I am hoping this addition will allow me to better depict the I/O gains provided by TSCS.
I’d like to point out that Linked Clones deliver a similar type of transparency via software. For the sake of this post I’d like to table the technical comparison of IO sharing via software and storage arrays for a later post. Fair enough?
Traditional Use Case #2 – Oracle Databases
In this example we begin with a classic Oracle database environment, one that includes a production database along with replicas of the production database, which are used for development and quality assurance.
These replicas can be simple copies of the original database files or they can be array-based copies such as business continuance volumes or snap clones. With all of these duplication technologies the array process the I/O requests from each host as an isolated request.
In this example you will notice each I/O request for by the database and/or the logs is served by an I/O request in both the disk and cache. The ratio of host I/O requests to storage I/O processed is 1:1.
(click to view image at full size)
TSCS Use Case #2 – Oracle Databases
By moving the Oracle production, development, and QA instances to NetApp the environment can provide instant copies or updates to the dev & QA systems by leveraging NetApp’s FlexClone technology; however, as these distinct database instances share a common source any page request or table lookup made by any one of the instances pre-fetch the cache for the same request made by any of the other instances.
Each instance of Oracle receives a performance gain while simultaneously reducing the total disk IO requests. This is only possible by achieving shared cache transparency.
(click to view image at full size)
You probably noticed that the log files for each instance are not taking advantage of TSCS. The reason is that these files are not sharing common blocks of data and as such are not requests shared block, and are cached in the traditional manner.
TSCS Use Case #3 – Exchange Mailbox Servers
Not every dataset or application requires multiple copies to serve the needs of other departments like in our Oracle example. Let us look at Microsoft Exchange Server. For this example I’ll refer to Exchange 2010.
Please note the terms mailstore and database are analogous. Both refer the ESE databases which mail messages are stored in. An Exchange server commonly contains multiple databases.
Did you know that when you send an email with an attachment to multiple individuals a copy of each attachment is stored for each recipient? That’s right, that document with the meeting minutes you just sent to your team is being stored multiple times in the same database in multiple databases across multiple Exchange mailbox servers.
Note with Exchange Server 2000-2007 a single copy of a message attachment is stored per database; however, the multiplication effect of multiple databases per host and multiple hosts still exists. The redundancy leads to attachments being stored multiple times within one’s Exchange environment.
When Exchange is stored in NetApp we store each database; however, we can deduplicate the redundant messages stored in each database as well as across multiple databases. The result is significantly reduce storage capacity but also the first request of a document pre-fetches the cache and can serve all subsequent requests even if the requests are for different users accessing different Exchange servers!
(click to view image at full size)
Did you notice the TSCS sharing ratio? With Exchange the sharing ratios can be almost too much to accept, yet we live in an age where we send and receive emails and attachments to multiple individuals every minute of every working day.
As I have in some of the past examples, I am simplifying this solution. In fairness I will cover more on this solution including data layout and additional components like DAGs in a future post. For today’s post I want to focus on the value and unique benefits of TSCS.
TSCS Use Case #4 – User Data
Users tend to access two common data sources during the business day. The first is their personal data, be it in home directories or user data disks, and the second is the local copy cache by our email clients such as the Outlook offline storage file or OST.
(click to view image at full size)
With TSCS both of these use cases are similar. The caching of a file, be it the entire file or block level commonality between two versions (such as fiscal report v1.0 and v1.1) the initial cached copy will serve all subsequent requests and provides a tremendous IO scaling through the this transparency.
If you need more information on our block level dedupe and how it applies to two files that share common blocks but yet are unique please see this post.
Transparency is Virtualization
I opened this post discussing Transparent Page Sharing, and how it shares the cache content between multiple, discreet VMs providing a means that scales the performance of the VMs beyond the physical limits of the hypervisor's physical platform.
I believe we all agree that TPS is cache virtualization.
By continuing our conversation, reviewing additional use case scenarios, and introducing concepts like cache sharing ratios I hope that I have been able help you better understand Transparent Storage Cache Sharing particularly as it compares to traditional caching mechanisms.
TSCS is array cache virtualization, which extends performance capabilities beyond physical capacities. It is also not bound or restricted to any particular use case scenario. Wether SAN, NAS, user data or high performance data base environments, the ability to more effectively utilize the dat stored in cache the greater the performance one will receive form the storage platform.
The Question To Ask
If you are the planning or design phases of upgrading the storage arrays in your datacenter ask your storage vendors if the technology you are considering can address the four uses cases cited in these posts.
We are truly seeing a tremendous amount of über successful cloud deployments leveraging TSCS. So what are you waiting for, it’s time to virtualize your storage!
Comments