The article covers some tips that can help you optimize and troubleshoot storage performance on Hyper-V hosts.
If you would like to read the first part in this article series please go to Hyper-V Optimization Tips (Part 1).
In the previous article in this series we examined disk caching settings and how they should be configured on both Hyper-V hosts and on the virtual machines running on these hosts. In this present article we will examine the dependence of Hyper-V performance on the underlying storage subsystem of those hosts. In particular we will be examining clustered Hyper-V hosts and how to optimize and troubleshoot performance on those systems through judicious choice of storage hardware and how you tune the storage subsystem on the hosts. Since this is a broad topic that is heavily dependent on your choice of vendor for your storage hardware, we will only be examining a few key aspects of this subject in this present article.
Identifying storage bottlenecks
For the scenario we are going to examine, let's assume we have a four-node Windows Server 2012 R2 Hyper-V host cluster with CSV storage that is hosting a dozen virtual machines running as front-end web servers for a large-scale web application. Let's also assume that these virtual machines are using the Virtual Fibre Channel feature of Windows Server 2012 R2 Hyper-V which lets you connect to Fibre Channel SAN storage from within a virtual machine via Fibre Channel host bus adapters (HBAs) on the host cluster nodes.
Users of your web application have been complaining that the performance of the application is often "slow" from their perspective. But "slow" from the end-user's perspective is rather subjective, so what would be a more accurate way to measure application performance? One measure you could look at is the disk answer time, that is, the average response times of the CSV volumes of your host cluster. The following table which associates application performance levels with disk answer times and compares them with raw storage experiences was shared with me by a colleague who works in the field with customers that have large Hyper-V host clusters deployed.
|Performance||Average disk answer time||Discussion|
|Very good||Less than 5 milliseconds||This level of performance is similar to that provided by a dedicated SAS disk|
|Good||Between 5 and 10 msec||This level of performance is similar to that provided by a dedicated SATA disk|
|Satisfactory||Between 10 and 20 msec||This level of performance is generally not acceptable for I/O intensive workloads such as when databases are involved|
|Poor, needs attention||Between 20 and 50 msec||This level of performance may cause users to remark that the application sometimes "feels slow"|
|A serious bottleneck||More than 50 msec||This level of performance will generally cause users to complain|
If the average disk answer time is more than 20 msec then you should do some performance monitoring of your system to try and determine the cause of the problem. The performance counters you should usually start collecting to monitor disk performance on your Hyper-V hosts are these:
\Logical Disk(*)\Avg. sec/Read
\Logical Disk(*)\Avg. sec/Write
It's usually best to focus on logical disk counters instead of physical disk counters because applications and services running on Windows Server utilize logical drives represented as drive letters whereas the actual physical disk (LUN) being presented to the operating system however may be comprised of multiple physical disk drives arranged in a disk array.
Resolving disk bottleneck problems
Once you've identified that your clustered Hyper-V host is experiencing performance problems because of a storage bottleneck, there are a number of different steps you can take to try and resolve or mitigate this problem. The steps described in this section are not meant to be exhaustive but can often be of help in such "the application feels slow" scenarios.
Follow your storage vendor's best practices
The first thing you should probably do once you've identified storage as a performance bottleneck for your Hyper-V cluster is to check whether your storage vendor has a "best practices" document that covers different Hyper-V scenarios. Storage vendors often create such documentation based on well-understood I/O patterns for different kinds of workloads, and if you can match your vendor's documentation closely to the kind of workload of your own application running on your clustered Hyper-V hosts then you should make sure you're adhering to the various recommendation that your storage vendor makes for that kind of workload.
By following your storage vendor's recommendations you may find that you have resolved or at least have mitigated your performance problem. On the other hand, you might find little or no improvement by following your storage vendor's advice. That's because storage profiling in "lab" environments sometimes doesn't translate well to the "real" world where users sometimes behave in unpredictable ways and multi-tier applications can be more complex in their behavior than is typically seen with "sample" applications.
Use faster disks in your storage array
Using your storage vendor's software you should monitor the load on your storage array to see whether the average load is unacceptably high. If you find this to be so, then one obvious step you can take is to replace any slower disks with faster disks, for example with 15k SAS disks. In general, preference should always be given to SAS disks over SATA disks if you want to ensure optimal performance of your storage array. SAS disks of either 10k or 15k are always to be preferred over any speed of SATA disk.
Use RAID 10 instead of RAID 5
Traditionally, RAID 5 (striping with parity) has been the most popular RAID level used for servers. RAID 10 (mirroring with striping) on the other hand uses a striped array of disks that are mirrored to a second identical set of striped disks. RAID 10 provides the best read-and-write performance of any RAID level, but only at the expense of needing twice as many disks for a given amount of total storage.
So if you can afford dedicating the extra storage resources to your host cluster, use RAID 10 for the storage utilized by your virtual machines via Virtual Fiber Channel. In any case, you should generally not be using either RAID 5 or RAID 6 (double parity RAID) for storage used by virtualized workloads as it is not an appropriate solution due to random write access. There may be exceptions to this rule however, but the only way to properly identify them is to monitor the read/write performance of different RAID levels for your application so you can evidentially select the most appropriate RAID level for your particular scenario.
Ensure also that you have as many RAID sets as you have nodes in your Hyper-V host cluster. In other words, having a four-node host cluster means you should have four RAID sets configured on your storage array i.e. one RAID set per host.
Check your storage controller configuration
Make sure the firmware of your storage controller has been updated with the latest rev from your storage vendor to ensure optimal performance of your storage array. If your storage controller is experiencing high CPU load then the disks in your storage array are probably too slow and should be upgraded to faster disks (and if possible to SAS type as mentioned earlier).
Also, if you haven't enabled write caching on your storage controller you may want to do this as it can increase I/O capacity by 20% or more depending on your workload and the kind of RAID level you have implemented. Of course there are other considerations involved with regard to write caching, see Part 1 in this series for more concerning this.
We'll examine some other tips for improving storage performance in Hyper-V environments in future articles in this series.