07/21/2011

RAID-DP for SQL Server

RAID-DP is a high performance implementation of RAID 6 that provides double parity across the disk subsystem and protects against the failure of up to two disks per RAID group. Calculations have shown  double parity RAID offers over 160 times the protection against data loss than RAID 10 and almost 4000 times the protection against data loss than RAID 5.

Comparing with more well known RAID 10, RAID 5 and generic RAID 6, RAID-DP has the following advantages:

  • RAID-DP can provide similar performance of RAID 10. Thanks to its innovative design and implementation, RAID-DP does not suffer from the “write penalty” typically associated with RAID 5 and RAID 6.
  • RAID-DP has similar space efficiency as RAID 5. For instance, a typical RAID 5 group may be constructed with 4+1 disks, where the 1-disk is for parity. A RAID-DP group usually uses 14+2 disks, with the 2-disk being the double parity disks. At the same time, RAID-DP provides much better protection against data loss than RAID 5.
  • RAID-DP provides similar data protection as generic RAID 6 (because of the dual parity), yet without the poor performance of generic RAID 6.

With these advantages, why not use RAID-DP for SQL Server?

Good question. In fact, Microsoft and NetApp have published several joint papers on SQL Server 2008 related topics:

For all the studies described in these papers, SQL Server databases as well as log files were placed on RAID-DP. Therefore, these papers show pretty convincingly that RAID-DP is well suited for Microsoft SQL Server.

Thanks for reading.

 

06/29/2011

Read_realloc

Read_realloc is a Data ONTAP feature since version 7.3.1. It was designed to improve sequential read performance for workloads with a mixture of large sequential reads and random writes.  

How does it work? As its name suggests, read_realloc is initiated by an application’s read operation (and the subsequent read-ahead operation). Data ONTAP then analyzes the IO pattern and the associated data blocks. If it is a sequential read and the associated data blocks are not largely contiguous, Data ONTAP optimizes the data layout by relocating those blocks to another location on the disk. In theory, the sequential read performance should be improved thereafter, since the data layout has been optimized.

How do I use this feature? Read_realloc is a FlexVol volume level option. You can enable it on a per volume basis. Use the following Data ONTAP command to enable it (default: off).

>vol options <myvol> read_realloc on

How do we know if the theory is correct? The best way is to test the theory. Here is one way to test it, by using Iometer to simulate a workload with a mixture of random reads, random writes and sequential reads.

  • Step 1: Create a 50GB test file using Iometer on a LUN from a NetApp controller (Note: the test file size should be larger than the controller cache size).
  • Step 2: Run the simulated workload with the following IO pattern for 20 minutes:
    • 32KB IO size
    • 50% read, 50% write
    • 70% sequential, 30% random
  • Step 3: Run the baseline sequential read test (i.e. read_realloc off) with the following IO pattern for 10 minutes:
    • 64KB IO size
    • 100% read
    • 100% sequential
  • Step 4: Run the same test as in Step 3, except with read_realloc enabled
  • Step 5: Repeat Step 4 (if the theory is correct, we should see some improvement in sequential read performance in this second pass)

The table below summarizes the test results. The result from 1st trial after enabling read_realloc is very close to that of the baseline (when read_realloc was off). This may indicate that in our test, read_realloc had very little overhead. The sequential read throughput from the 2nd trial is 6% better than the 1st trial, indicating that read_realloc actually does what it supposed to do.

  Sequential Read Throughput (MB/s)
Baseline (read_realloc off) 159
1st Trial (read_realloc on) 161
2nd Trial (read_realloc on) 170

 

 

 

 

Figure 1 below shows the same results in a bar chart.

Blog_readrealloc_fig1_2011jun28                                                           

Figure 1. read_realloc and sequential read performance

You may think to yourself, so what? It’s only a 6% improvement. Well, a small but measurable improvement is probably what we want to see, as it is an indication of Data ONTAP doing a pretty good job of data layout in general.

Thanks for reading.

 

05/31/2011

Scaling Down Exchange 2010

When people talking about scaling, large scale, scale-up or scale-out are usually what come to mind. Exchange 2010 is designed to scale up and/or scale out to tens of thousands or even millions of mailboxes. However, for every large enterprise requiring that many mailboxes, there are many more small enterprises that only need a couple hundreds of mailboxes. Does Exchange 2010 scale down well? The answer is yes; Exchange 2010 provides several options to allow small enterprises or branch offices to consolidate servers or server roles in order to reduce the number of physical servers.

NetApp’s storage solutions for Exchange 2010 can also scale down efficiently and allow the solution to grow as the business grows. A good example is the recent published FAS2040 iSCSI 500-mailbox ESRP.

This ESRP configuration is designed to be an entry-level solution. Below is a brief summary of the test configuration and the results.

The Exchange 2010 user profile and DAG (database availability group) configuration:

  • 500 mailboxes
  • 2GB per mailbox
  • 0.12 IOPS per mailbox
  • 1 DAG with 2 copies of database

 The FAS2040 storage system configuration:

  • Data ONTAP 8.0.1 (7-mode)
  • 1TB SATA drives (4 disks)
  • RAID-DP

The storage protocol used was 1GbE iSCSI and the server was IBM x3650 system.  The test topology is shown in Figure 1 below.

Blog_2040_500u2g_fig1_2011may31 
Figure 1. Test topology and DAG architecture

Figure 2 shows the Expected vs. Achieved total database IOPS (including both reads and writes). The achieved IOPS is 32% higher than the targeted.

Blog_2040_500u2g_fig2_2011may31 
Figure 2. Achieved database IOPs is 32% higher than the expected number (higher is better)

Figures 3 & 4 show the measured read and write latencies, respectively, in comparison to the 20ms acceptable latency limit (set by Microsoft). As you can see both read and write latencies are excellent and well below the 20ms limit.

Blog_2040_500u2g_fig3_2011may31 
 Figure 3. Database read latency in comparison to the 20ms limit (lower is better) 

Blog_2040_500u2g_fig4_2011may31 
Figure 4. Database write latency in comparison to the 20ms limit (lower is better) 

It is worth pointing out this entry-level solution has very small footprint; it only needs 4 SATA drives. Yet it still provides excellent data protection, due to RAID-DP, against double disk failure.

Note also that the example provided here is a sample scenario; it may not match your real life scenario exactly. Therefore, it is very important to test the actual scenario before deployment.

Thanks for reading.

 

05/22/2011

Large Sequential Read

Large sequential read is one of the common data access patterns. For instance, the Decision Support System (DSS) workload or Business Intelligence (BI) system relies heavily on this data pattern to extract useful information timely and generate reports by scanning and analyzing multi-terabyte databases.

With the arrival of Big Data, this data pattern is likely to become more important. And Big Data would likely require storage systems to optimize the performance of large sequential reads.

On NetApp Fabric-Attached Storage (FAS) systems, there is a tuning you can do to enhance large sequential read performance. TR-3760 recommended to tune the setting of one nonstandard flag:

setflag wafl_max_write_alloc_blocks 256

The default value of the flag is 64. You may wonder why tuning this flag would have anything to do with large sequential read performance. The fact of matter is that good sequential read performance starts with good, contiguous data layout on disk. TR-3760 pointed it out correctly:  “this flag optimized the WAFL® on-disk data layout.”

Of course, it goes without saying that Big Data also requires Big Pipe. As the 10GbE (FCoE or iSCSI) now becoming mainstream; and 40GbE and even 100GbE on the horizon, the pipe is definitely getting bigger and bigger.

Thanks for reading.

 

04/28/2011

Hyper-V and Jetstress

One of the benefits of Hyper-V is server consolidation. A lot of server resources are required to run large scale Jetstress tests. Is it feasible to use Hyper-V with Jetstress in order to achieve server consolidation? The answer is yes. Below is an example of how you can run Jetstress on Hyper-V virtual machines hosted on a single physical server.   

The physical machine in this example is an IBM x3550 M2 server, with fast Intel Xeon processors and 64GB of RAM. Two virtual machines were created: HVM1 and HVM2. Each VM was configured with 4GB of RAM (see Figure 1). Since Jetstress has a limit of 256MB of memory per database, 4GB RAM per VM is enough in our example.

Hyperv_vm_setup 
Figure 1. Hyper-V virtual machine setup.

Next, Jetstress 2010 was installed on both HVM1 and HVM2; and storage was provisioned for both VMs. A NetApp FAS3070 was used to provide the storage for the Jetstress test. Figure 2 shows the test topology.

Hyperv_Jetstress_topology 

Figure 2. Hyper-V Jetstress test topology.

The Jetstress mailbox profile and configuration are summarized here:

  • 8000 mailboxes per VM (16000 total)
  • 0.1 IOPS per mailbox
  • 100MB per mailbox
  • 1 database per VM
  • 2 copies per database
  • background maintenance enabled

The use of small mailbox size (i.e., 100MB) was to shorten the time needed to create the databases.

Figure 3 is a screenshot of HVM1 and HVM2 while a Jetstress test was running.  The test results from HVM1 and HVM2 are also shown in Figure 4 and Figure 5 below.

Hvm1_hvm2_jetstress 

Figure 3. Jetstress running on HVM1 and HVM2. 

Hvm1_jetstress_result  

Figure 4. HVM1 Jetstress test result.

Hvm2_jetstress_result 
Figure 5. HVM2 Jetstress test result.

This example demonstrates a good use case of Hyper-V. In reality, more than two VMs can be created and more Jetstress mailboxes supported on the same physical machine.

Thanks for reading.

 

04/11/2011

Virtualizing Jetstress

Virtualization technology has been widely used in deployment as well as in lab test and development, to consolidate servers and improve server utilization. 

Jetstress is an essential tool to simulate Exchange server IOs and to test and validate the performance and reliability of the storage system for Exchange server deployment.

Now, what about putting the two together, virtualizing Jetstress?  For whatever reason, this has not been a routine practice. However, there are several factors suggesting that virtualizing Jetstress is a very good idea.

  • Maximum number of active mailboxes per server. In real world Exchange 2010 deployment, it is not recommended to have more than 10,000 active mailboxes per server. Since Jetstress is a simulation tool for Exchange deployment, it should limit the mailbox count per server to 10,000 or less. This is why you won’t find any published Exchange 2010 ESRP supporting more than 10,000 active mailboxes per server. This also means that if you want Jetstress to simulate more than 10,000 mailboxes, you need two or more servers.
  • Memory limitation per database. To accurately test the disk IOs of Exchange 2010 to and from the storage system, Jetstress limits the amount of server RAM per Exchange database to 256MB. If you have a decent server with lots of RAM and 4 Exchange databases, only ~1GB of RAM would be used by Jetstress. In other words, the server memory utilization will be very low. But, how about creating many databases in your Jetstress testing?
  • Background database maintenance overhead. Exchange 2010 introduced the Background Database Maintenance feature, which does database integrity check in the background, and has an IO overhead of ~30 reads per second per database. Thus, the more databases, the more IO overhead. Also, more databases would likely lead to more random IOs and negatively impact Exchange read or write latency.  

Taking these parameters into consideration, plus the fact that servers are more powerful and cost less, the idea of virtualizing Jetstress becomes very attractive and logical.

Figure 1 below illustrates a hypothetical scenario for Jetstress virtualization. The physical server has 4 CPUs and 32GB of RAM. If we create 3 virtual machines (VM), with 4-8 GB RAM per VM, then each VM can comfortably support 8,000 mailboxes.   

Blog_virtualizingJetstress_2011apr11 
Figure 1. A scenario of Jetstress virtualization.

So, what does this scenario tell us? Without virtualizing Jetstress, the physical server is limited to 10,000 mailboxes and under-utilized. On the other hand, by applying virtualization to Jetstress, 24,000 mailboxes can be supported on the same physical server. That’s 2.4x more mailboxes! Not bad at all.

Thanks for reading.

 

03/31/2011

Approaching SharePoint Performance

SharePoint farms are complex systems. Typical farms consist of WFES, Application servers, SQL servers and storage. For each of these servers and storage, there are subcomponents such as CPU, memory, network, and disk.

Many factors can cause SharePoint performance problems. When performance issues arise, where do you look for causes? Indeed, how do you approach a SharePoint performance issue? Granted, there are different approaches or methods. The approach I would like to propose is described below.

Step 1. Break down the complexity with this simple SharePoint Performance Analysis Matrix (SPAM), as shown in Figure 1.

Blog_sharept_perf_matrix_2011mar31_fig1 
Fig. 1. The SPAM matrix.

There are 4x4 = 16 cells in the matrix. These are the places you need to keep an eye on. A bottleneck in any of these 16 boxes could very well be a cause to your ShparePoint farm’s poor performance.

Step 2. Suppose your SharePoint farm is experiencing a performance issue. How do you know which one of the 16 boxes is actually causing it?

First, you should collect performance stats on servers (perfmon logs) as well as storage (e.g., perfstat on NetApp storage). Second, analyze all the performance stats. For example, you may use a top-down approach - look into the perfmon stats collected on all WFEs first and see if the CPUs are saturated, or if there is excessive paging, network bottleneck, or high disk latency.  If the WFEs look fine, then repeat the process for Application servers, SQL servers, and finally storage controllers. Fig. 2 illustrates the performance analysis process.

Blog_sharept_perf_matrix_2011mar31_fig2 

Fig. 2. SharePoint performance analysis process, the top-down approach. Orange-colored boxes indicate performance issues.

Step 3. Let’s say you find one of the 16 boxes being a bottleneck. Now what? It depends. If one of the WFEs is network bandwidth limited, you might lighten the load on this particular WFE, or add more network bandwidth to the WFE. On the other hand, if you find the SQL server’s disk has poor read latency, then more in-depth analysis of the storage is required to pinpoint the root cause.

Step 4. What if after this tedious exercise, no apparent bottleneck is found? Realize that software or driver internal parameters could be throttling the data pipe. In this case, some performance tuning is in order. A classical example is the HBA queue depth setting. A lot of the times, the problem of a slow disk response time could be solved by setting the HBA queue depth to a higher value than default.

In summary, to solve the complex performance problems of SharePoint farms, the proposed SPAM could be a useful tool to help narrow down the potential root causes. The performance analysis process described above is also easy to follow.

 Thanks for reading.

 

03/22/2011

Where Should I Put Exchange 2010 Logs?

There are databases, and there are transaction logs. The conventional wisdom (CW) has been to put databases and transaction logs on separate LUNs and separate hard disk drives, for the following reasons:

  • Performance – database IOs and transaction log IOs often have different IO patterns. It is a good idea to separate them in order to achieve good performance.
  • Reliability – more importantly, transaction logs are used to recover damaged database due to disk (or other types of) failure. In the event of a disk failure, data loss may occur if both database and logs were placed on the same disks, since both may be lost at the same time. Therefore, in order to prevent data loss, it is crucial to put database and transaction logs on different physical disks.

This applies in general to database applications as well as earlier versions of Microsoft Exchange.

However, with Exchange 2010, things work a little differently. The Exchange 2010 high availability (HA) feature requires and relies on multiple copies (>= 2) of databases to protect from data loss. Different copies of a database must be placed on different LUNs and on different hard disk drives. If the active copy of a database is corrupted or damaged (due to disk failure, for instance), Exchange 2010 simply switch to use the 2nd copy of the database. So, there is no data loss.

If the Exchange 2010 HA feature is used, databases and transaction logs can be placed on a same set of hard disk drives. It is still a good idea to put databases and transaction logs on separate LUNs to allow data management flexibility (e.g., to apply different policies to database LUNs and transaction log LUNs).

Figure 1 illustrates one implementation of an Exchange 2010 HA scenario where two copies of databases are used. This example is based on the NetApp FAS3210 6,000-mailbox Exchange 2010 ESRP.

Blog_e2010_dblog_place_2011mar22 
Figure 1. One implementation of a two-copy Data Availability Group (DAG), where Exchange 2010 databases and logs share same sets of physical disks. Each pool of physical disks (shown in oval) form a RAID-DP group with the 2 disks in green color being the parity drives.

What about performance? Well, Exchange 2010 made significant changes to the database IO characteristics. For example, IO reduction has been dramatic; and at the same time, the size of database transactional IO has been increased to 32KB from 8KB. These changes led to fewer database IOs that are more sequential. Therefore, performance-wise, putting databases and logs on the same disks is no longer an issue.

What if I don’t want to use the Exchange 2010 HA feature? The HA feature does have a cost associated with it (more servers and more storage are required). If you decided not to use the Exchange 2010 HA configuration, you can use Exchange 2010 stand-alone configuration, where only one database copy is required. In this case, the old, time-tested CW still applies – you should place Exchange databases and transaction logs on different physical disks.

So far, we have been assuming RAID-based storage. What if you decided to go JBOD? Then, you need to do the following: 1) Use Exchange 2010 HA; and 2) Have at least 3 database copies; and 3) Divide each single disk into two volumes: one for database and one for logs. In other words, the database and the transaction logs must reside on the same physical disk.

For more details on Exchange 2010 storage configuration, please review the Microsoft TechNet article here.

Thanks for reading.

 

02/28/2011

64K or 512K?

Decision Support System (DSS) Workload has some unique I/O characteristics. For instance:

  • Mostly reads – it’s not uncommon for DSS queries to have a read-to-write ratio of 9 or even higher
  • Mostly sequential – because of scanning range(s) of database and index

These characteristics demand that in order to have good query performance, a Database Management System (DBMS) and its I/O subsystem must handle large sequential reads efficiently.

Microsoft SQL Server 2008 is a popular database application. Does it support large read sizes? The short answer is yes.

Here is how. SQL Server 2008 has a start-up option: -E, which enables a larger number of contiguous extents in each file to be allocated together to a database table during data loading or indexing, or as the table grows. The effect of this –E option is fewer but larger reads from disks, resulting in better sequential read performance.

The –E option can be set by using the SQL Server Configuration Manager.  Figure 1 below illustrates how this is done.

Blog_sql_readsize_2011feb28_fig1 
Figure 1. Setting the –E option inside the SQL Server Configuration Manager.

Note that the -E option must be set before loading data into the database table; and it is not supported for 32-bit versions of SQL Server.

So, what were the results?

Well, without using the –E option, the observed read sizes while running test queries were 64KB or smaller. In contrast, after using the –E option, the read sizes increased to up to 512KB.

In summary, SQL Server 2008 supports large read sizes with the start-up –E option. When properly used, it can improve DSS workload’s sequential read performance.

Thanks for reading.

 

02/12/2011

Unified Storage in Action

NetApp 10Gb Ethernet-based storage unifies multiple protocols such as FCoE, iSCSI, NFS and CIFS in a single storage system. The unified storage brings several benefits to customers, for example, simplicity, flexibility and high-performance.

Figure 1 shows a test topology that has taken advantage of some of these benefits. It was a real testbed configured in a Microsoft SQL Server Performance Lab, for a joint project by NetApp, IBM, Emulex and Microsoft.

Blog_unifiedstorage_2011feb11_fig1 
Figure 1. Topology for the 10GbE FCoE and iSCSI tests.

In Figure 1, two Emulex OCe10102 10GbE CNAs (Converged Network Adapters) were installed in the IBM x3850 X5 server. Three (out of 4) host ports were connected to a Cisco Nexus 5010 switch via fiber optical cables (shown as blue lines). Eight NetApp FAS3070 controllers were connected to the Nexus switch using NetApp 10GbE UTAs (Unified Target Adapters) via either optical (shown in blue) or direct-attach twinax copper (shown in orange) cables. One port per controller was used, for a total of eight unified storage ports.

This test topology demonstrates an end-to-end 10Gb Ethernet storage network that supports both FCoE and iSCSI protocols. In fact, switching from FCoE to iSCSI, or vice versa, does not require any re-cabling or reconfiguration of RAID or disks.

What about performance?

We used Microsoft SQL Server 2008, a real world enterprise application, to evaluate the performance of both FCoE and iSCSI. The workload used was TPCH-like queries that are I/O intensive and bandwidth intensive. The total on-disk size of the test database (including tables, indexes and backup) was 2.7TB. The execution of these queries required intensive scans of the database and indexing. Therefore, the end-to-end FCoE or iSCSI pipes, as well as the hard disk drives, were heavily used. More details can be found in the NetApp technical report, TR-3853

So, what were the results?

Figure 2 below illustrates the peak read throughput achieved with both FCoE and iSCSI while running one of the queries.

Blog_unifiedstorage_2011feb11_fig2 
Figure 2.  Query peak read throughput.

With three 10Gbps host ports, the peak read throughput of 3231MB/s for FCoE was at the wire-speed. And the peak read throughput achieved by iSCSI was only 7% lower, at 3006MB/s, and still near wire-speed.

In summary, NetApp unified storage is easy to use, flexible and high-performance. Furthermore, 10GbE iSCSI is enterprise ready.

Thanks for reading.

 

TRUSTe CLICK TO VERIFY