Hyper-V Optimization Tips (Part 1)

The first article in this series examines disk cache settings for virtual disks attached to virtual machines running on Hyper-V hosts.

If you would like to read the next part in this article series please go to Hyper-V Optimization Tips (Part 2).



Disk write caching is a performance feature introduced with Windows Server 2003 and Windows XP that enables the operating system and applications to run faster by allowing them to not have to wait for data write requests to be committed to disk. 

But while these "delayed writes" can help Windows run faster, they also have a risk associated with them. That's because a sudden hardware failure, software crash or power outage could cause the cached data to be lost. 

The result can be that Windows thinks certain data was written to disk whereas in actual fact the writes weren't committed to disk. In addition, file system corruption and/or data loss may occur. Having a backup power source such as a UPS can help mitigate such risks.

For scenarios where data integrity is more important than performance it's important that disk caching be disabled. One example of such a scenario is Active Directory domain controllers where disk write caching should always be disabled to prevent corruption of the directory database and/or loss of important security information for the domain. 

In fact when you promote a Windows Server system to the role of domain controller, Windows automatically disables its write cache function. On the other hand there are also some applications where disk write caching always needs to be enabled. 

An example of this is Microsoft Exchange Server which uses the Windows write cache function for its own transactional logging function. This is one reason why it's generally not a good idea to deploy Exchange Server on a domain controller as explained here.

Understanding disk write caching

Disk write caching can be enabled or disabled on a per-volume basis in the Windows operating system by configuring the settings found by opening Computer Management, selecting Disk Management, right-clicking on a disk and selecting the Policies tab on the Properties sheet as shown in Figure 1 below:

                              Figure 1: Settings for configuring disk write caching

It's important to begin with that you understand the difference between the two settings shown in the above figure. The first setting "Enable write caching on the device" which is enabled by default tells your storage hardware to signal to Windows that a write request has been completed even though the actual data to be written has not yet been flushed from the intermediate hardware cache (volatile storage i.e. memory) to the final storage location (non-volatile storage i.e. the disk). 

Data from write requests is usually only held for a brief time interval as the storage hardware usually flushes its cache automatically when the hardware is idle. Certain operating system commands involving NTFs can also force cached data to be flushed to disk. If the power to the system fails while data still remains in the cache, data loss or corruption may cause applications to fail or the operating system to crash.

The second configuration setting "Turn off Windows write-cache buffer flushing on the device" has to do with write requests that have been tagged by the operating system as "write-through" by tagging it with a ForceUnitAccess flag. When a write request has been tagged as write-through, the storage hardware is supposed to guarantee that the data has been written to non-volatile (disk) storage and has not been temporarily stored in some intermediate cache on the storage hardware. 

Enterprise storage hardware can accomplish this in several ways, one of more common approaches being when the intermediate cache on the storage hardware is battery-backed which enables dirty (cached) writes to be completed (flushed to disk in proper sequential order) even when the server system itself experiences a power failure or operating system crash. If you enable the "Turn off Windows write-cache buffer flushing on the device" setting however, the ForceUnitAccess flag is removed from any write requests that are tagged with this flag. 

This results in greater use of the cache and thus better write performance, but this option should only be enabled if a UPS is present that backs up the power for hardware along the entire I/O path (or if the machine is a laptop with a working battery in it). Because of the potential added risk of data loss that may occur, this second write caching setting is not enabled by default on Windows Server systems. For more information and usage guidance on these two disk write caching settings, see the Performance Tuning Guidelines for Windows Server 2012 R2 on MSDN.

Scenarios for modifying the default write cache settings

Concerning the first write caching setting, I've already mentioned an example of a scenario where write caching is automatically disabled: on domain controllers. But really for any application where data integrity is paramount, you may want to consider disabling the disk cache to ensure that all writes are committed to disk storage before the storage subsystem reports success for the write request. If this is the case however then be sure to disable write caching both in Windows and in the firmware of your storage controller.

And for some fascinating history concerning the second setting, be sure to see the post titled Dangerous setting is dangerous: This is why you shouldn't turn off write cache buffer flushing on Raymond Chen's blog The Old New Thing. Be sure to read the comments to this post too as they give some additional useful insights--for example the comment that says "Lots of hard drives cheat and do write caching internally--even when the protocol doesn't allow for it" which is kind of scary. Raymond's basic point in his post seems to be that the second setting should never be selected and instead should be eliminated from the Windows UI. 

However as Emmanuel Bergerat points out in his MSDN post titled The checkbox that saves you hours, there are actually some real world scenarios where selecting the second checkbox makes sense. Finally, note that some server systems have firmware (BIOS or UEFI) settings you can use to configure intermediate caching for the storage subsystem, so it's not just the Windows settings that you need to be aware of--to configure write caching you must do so both in the operating system and on the storage controller.

Disk write caching on virtual machines

The question we want to focus on for the remainder of this article is the impact of using these settings for virtual machines running on Hyper-V hosts. Figure 2 shows a virtual machine named SERVER03 running Windows Server 2012 R2 that is opened in Virtual Machine Connection on a Hyper-V host named HOST40 which is also running Windows Server 2012 R2. When we try to clear the "Enable write caching on the device" to disable write caching on the virtual hard disk (VHD) for this virtual machine, the error dialog shown appears informing us that this action is not allowed:

    Figure 2: Attempting to disable write caching for a virtual hard disk.

If you click OK in the above dialog box, the default write caching settings are restored and a warning message is displayed:

   Figure 3: You cannot disable write caching on the virtual hard disk.

Let's think about this more carefully. First, it makes total sense that Hyper-V doesn't allow you to change the write cache settings for virtual hard disks attached to virtual machines. After all, a virtual hard disk isn't really a storage device at all, it's simply a file (.vhd or .vhdx) stored on the file system of the storage device used by the host. Since it's just a file, a virtual hard disk doesn't have any form of disk cache associated with it. 

So what really matters in this kind of scenario is whether you need to enable or disable disk write caching on the underlying physical storage on which the virtual machine's VHD or VHDX is stored. The type of write caching used by the host's physical storage obviously depends on the type of storage device the host is using, which might be internal or directly-attached HDD or SSD storage, hardware RAID, HBA to fiber channel SAN, and so on.

But if you cannot disable write caching for this virtual hard disk then why was it possible in earlier versions of Hyper-V to disable write caching in a virtual machine using the above Policies property sheet? The answer (as I've been told by a Hyper-V expert at Microsoft) is simply that there was a bug in the Windows ataport and Hyper-V storage stack in earlier versions of Hyper-V that allowed you to change the disk write caching setting of the system drive of a virtual machine if that system drive was backed by a virtual hard disk that used virtual IDE (vIDE). 

This bug gave users the impression they could disable write caching to improve data integrity for write operations to the virtual hard disk, but in reality all it was really doing was creating the potential for data loss and corruption of the virtual hard disk should the underlying Hyper-V host experience a power outage or an unplanned start (see KB2853952 for details). Microsoft released a fix for this issue as described in that KB article, but the point is that write caching isn't configurable for virtual hard disks on virtual machines--and nor should it be.

But while Hyper-V won't allow you to disable write caching on a virtual hard disk by clearing the "Enable write caching on the device" setting in the guest operating system of a virtual machine, Hyper-V does allow you to select the second setting "Turn off Windows write-cache buffer flushing on the device" as demonstrated in the next screenshot:

    Figure 4: You can however turn off write-cache buffering on this virtual disk.

Why is that and what's the point of doing this? First, remember that it's the first setting that controls whether write caching is enabled on the disk or not. And since a virtual hard disk isn't really a disk at all, that setting has no meaning as far as virtual disks are concerned. But the second setting is different and does have meaning as it controls the cache flush on/off settings for the disk. When you select the second setting, cache flushes will essentially pretend to succeed--at least at the level of the software stack. 

These flushes can be costly for certain kinds of physical storage devices like hard disk drives and SATA SSDs since the command queue is drained when the dirty data in the cache is flushed. Flushes also usually have their own built-in cost associated with how the storage controller processes them. So when you select this setting in the guest OS for a virtual hard disk in a virtual machine, you might see some performance improvement for applications running in the virtual machine. But always remember that it's the host's disk cache settings that are the important ones as far as data integrity are concerned.

Additional resources

For more information on how caching works in the Hyper-V virtual storage stack, see KB2801713 Hyper-V storage: Caching layers and implications for data consistency. Remember also that disk caching is only one of many issues associated with storage on Hyper-V hosts. 

No comments:

Powered by Blogger.