The main reason you would want to set up a cluster in vSphere is to take advantage of the High Availability (HA) and Distributed Resource Scheduler (DRS) features. In this post, I’ll cover the benefits and functionality of VMware DRS and will briefly touch on how to set it up.
What is DRS?Let’s begin by highlighting the benefits derived using DRS;
Load Balancing – In a nutshell, DRS monitors resource utilization and availability on the constituent hosts of a cluster. Depending on how it’s configured, DRS will recommend or automatically migrate virtual machines from a host running low on resources to any that can sustain the additional load and supplement the resources required by the vms. The main function of DRS is thus to ensure that a vm is allocated the required compute resources for it to run optimally.
Power Management – VMware Distributed Power Management (DPM) is a sub-component of DRS which essentially places one or more ESXi hosts in standby-by mode if the remaining hosts are found to be providing sufficiently excess capacity. When resources start running low, DPM will power back on hosts to keep capacity running at an optimum level.
Virtual Machine Placement – Using DRS groups, affinity and anti-affinity rules, you can specify which virtual machines will reside on which hosts. You can also lock the placement of mutually dependent vms to a specific host for improved performance.
Resource Pools– While resource pools are not exclusive to DRS since they can be created on any ESXi host, it is only after you enable DRS that you are able to create resource pools on those hosts which are members of a cluster.
Storage DRS – This feature is independent of enabling DRS on an ESXi cluster but nevertheless I thought it’s best to give it a mention even though I won’t cover it in any detail. Put simply, if you have several datastores, you can group these under a datastore cluster for which you can optionally enable Storage DRS as shown in Figure 1. From there on, sDRS takes care of load balancing the disk space and I/O requirements for the virtual machines residing within that datastore cluster.
What are the requirements?Basic – You will need at least two ESXi hosts participating in a cluster managed by vCenter Server. Every ESXi host must be configured for vMotion. Each host will preferably be allocated a 1Gbit link on a private network reserved solely for vMotion traffic.
Storage – A SAN or NAS based shared-storage solution allowing for the provision of iSCSI or NFS based datastores mounted on every ESXi hosts which is included in the cluster. Datastore naming should be consistent across all hosts.
Processors – Preferably, all hosts should be sporting the same type of processor(s) to ensure correct vMotion transferring and state resumption. Once the vm is transferred, the processor(s) on the destination host should present the same processor instruction set and pick up executing instructions from where the source host processor(s) stopped. Enhanced vMotion Compatibility (EVC) should be enabled wherever dissimilar processors are used.
Any gotchas?Software companies like Oracle and Microsoft require you to purchase a license for every host on which you plan to run products such as Microsoft SQL Server or Oracle Database. If you have large clusters, the price tag will quickly inflate. As you’ll see further on, you could use VM-Host affinity rules to make sure that such vms are “preferably” placed only on those hosts for which you acquired licenses. You could also opt to disable DRS altogether for the specific vms. While I’m not covering HA here, note that the same issue will arise when a host fails since the vms that were running on it are optionally restarted on another host, one that has not necessarily been licensed.
Licensing is generally a complex and often obfuscated topic, so make sure to understand the requirements and repercussions before enabling DRS (and HA) for products burdened by restrictive licensing schemes. This will ensure that, come audit season, you’re not caught violating licensing agreements.
How do I set it up?Enabling DRS couldn’t be simpler. Just right-click on the cluster name, select “Edit Settings” and turn it on (Figure 2). I’m using the traditional (c#) vSphere client (my bad) but you’ll be better off using the vSphere Web client as in general it exposes more settings for the particular feature you are enabling or managing.
Figure 2 – Enabling DRS on a cluster (using the C# vSphere Client)
Simply turning on DRS will suffice for most environments. However you need to be aware that the default automation level is set to “Fully Automated”. What this means is that DRS will automatically move vms across host whenever it deems it necessary. In fact there are 3 levels of automation. These are;
Manual – when this mode is selected, DRS will suggest whether vms are to be migrated or not if resources are running low. All subsequent actions require user-intervention. As can be seen in Figure 3, DRS keeps on prompting until you tell it which host you want your vm powered up on.
Partially Automated – in this mode, DRS will automatically place a vm that’s just been powered up on a host with optimal capacity. During the course of normal operations, DRS will make suggestions about those vms that need migrating. To view them, click on the cluster name and select the DRS tab while in “Hosts and Clusters” view when using the vSphere client. Clicking the “Apply Recommendations” button actuates these recommendations with the respective vms being migrated to the DRS chosen hosts. Suggestions are also made when running DRS in manual mode. DRS does a check every 5 minutes but you can force it to run by clicking on “Run DRS” as shown in Figure 4 highlighted in red.
Figure 4 – Manually running DRS
Fully Automated – as the name implies, DRS will take care of automatically moving vms whenever the need arises.
This automation level is shown in Figure 5. One should be careful of the “Migration Threshold” setting at the bottom which, if set too high, may result in an inordinate number of migration especially in large environments. This may result in performance issues specifically on the storage and network fronts due to an increase in the demand for iops and bandwidth.
The automation level can also be individually set for each vm. Doing so, overrides the cluster settings.
DRS Groups and RulesThere are instances where you’d want a particular set of virtual machines to run on the same host or group of hosts. There will be other times where you definitely want to have two or more vms running on separate hosts to minimize performance issues perhaps to isolate a heavily used database vm from an equally utilized mail server. DRS provides for this as follows;
VM-VM Affinity Rules
Keep vms together (Affinity) – use these to have a group 2 or more vms run on the same host
Separate vms (Anti-Affinity) – use these to have a group of 2 or more vms run on separate hosts
Note: If any two rules conflict, the older one is left enabled while the most recent is disabled. You can however select which rule to enable. In the following example I set up two rules. The first specifies that vm a and vm b should be kept together. The second, on the contrary, specifies that the two vm should be kept apart thus resulting in a conflict with the first rule. A red icon next to the rule will alert you of existing conflicts (Figure 7).
VM-Host Affinity Rules
Virtual machines to hosts – Bind one or more vms to a pre-defined DRS group of hosts
Note: No rule checking is performed for VM-Host affinity rules so you may end up with conflicting rules. Again, the older rule takes precedence with the new one being automatically disabled. Care should also be exercise when creating this type of rule since any action violating a required affinity rule may prevent;
- DRS from evacuating vms when a host is placed in maintenance mode.
- DRS from placing virtual machines for power-on or load balance virtual machines.
- HA from performing failovers.
- DPM from placing hosts into standby mode.
DRS Groups and Rules can be set from the Cluster settings shown below. You only need to create groups when setting up “Virtual machines to hosts rules” since the option is not available when creating affinity and anti-affinity rules (See Figures 8 and 9).
Pay particular attention when creating “Virtual Machines to Hosts” rules. You are given four options (see Figure 10 – options boxed in green) to choose from and although similarly worded, the behavior is anything but similar.
Be wary of using rules starting with “must” as this implies strict imposition. In practical terms, let’s say you create a “must run on hosts in group” rule for a particular vm. If for any reason the hosts in the referenced group are offline, the vm will not migrate and/or is prevented from powering up – unless of course you disable or delete the rule. This can also lead to a host affinity rule violations as a result of any of the unwanted scenarios previously mentioned. If this happens, disable the offending rule and manually run DRS (or wait for it to do so automatically, the interval being every 5 minutes). Any stuck process, such as placing a host in maintenance, should resume normally after a short while.
Unless absolutely necessary, avoid using “must” and opt instead for “should”. This simply sets a preference with regard to which host to use. If none are available, DRS selects the next best option.
Monitoring DRSIf you switch to the Summary tab, you should see a “vSphere DRS” information pane on the upper-right part of the screen (Figure 11). Here you are presented with DRS related information including the automation level set, the number of outstanding recommendations and the degree to which the cluster is load balanced. There’s also a link to the “Resource Distribution Chart” which when clicked on opens up a window showing the load distribution across the cluster using CPU and Memory utilization per host as a metric (Figure 12).
Figure 11 – DRS status window
Disabling DRSIf for any reason you find yourself needing to disable DRS, you will need to keep a couple of things in mind. The first is that you WILL LOSE any existing Resource Pool hierarchy, including vm membership (I learned this the hard way!).
One a more positive note, the vSphere Web client allows you to save the resource pool tree for future import which is perhaps one more reason why you should consider ditching the old client. However, note that while this process will restore your original resource pool structure IT WILL NOT restore vm membership, meaning you’re left with a bunch of empty resource pools. In addition, you will not be able to re-import the resource pool tree if you created new resource pools after you re-enabled DRS. To be honest I don’t see this as being of any use but as they say we should always be thankful for small mercies.
The second, is that any rules set up prior to disabling DRS will still apply. If fact if you re-enable DRS, any previously set rules will magically reappear. According to this article, one should be wary of “should” rules when DRS is disabled which apparently do not work as expected. I tried replicating this on a vCenter Server 6.0 / ESXi 6.0 environment (2 nested node cluster) and the behavior of “should” rules remained the same irrespective of DRS being enabled or not.