Archive for HA

HA in VMware vSphere 5.x – What actually happens?

HEARTBEAT

The HA Question?

We were asked what actually happens to the hosts and VMs in vSphere 5.5 if an isolation event was triggered and we completely lost our host Management Network. (Which I have seen happen in the past!) I have written several blog posts about HA in the HA Category so I am not going to go back over these. I am just going to focus on this question with our settings which are set as below.

It is important to note that the restarting by VMware HA of virtual machines on other hosts in the cluster in the event of a host isolation or host failure is dependent on the “host monitoring” setting. If host monitoring is disabled, the restart of virtual machines on other hosts following a host failure or isolation is also disabled

On our Non Production Cluster and our Production Cluster we have HA enabled and Enable Host Monitoring turned on with Leave Powered On as our default

HA1

HA2

HA3

The vSphere architecture comprises of Master and Slave HA agents. Except during network partitions there is one master in the cluster. A master agent is responsible for monitoring the health of virtual machines and restarting any that fail. The Slaves are responsible for sending information to the master and restarting virtual machines as instructed by the master.

HA4

HA5

When a HA cluster is created it will begin by electing a master which will try and gain ownership of all the datastores it can directly access or by proxying requests to one of the slaves using the management network. It does this by locking a file called protectedlist that is stored on the datstores in an existing cluster. The master will also try and take ownership of any datastores that it discovers on the way and will periodically try any datatstores it could not access previously.

The master uses the protectlist file to store the inventory and keeps track of the virtual machines protected by HA. It then distributes the inventory across all the datastores

HA6

There is also a file called poweron located on a shared datastore which contains a list of powered on virtual machines. This file is used by slaves to inform the master that they are isolated by the top line of the file containing a 0 or 1 with 1 meaning isolated

HA7

Datastore Heartbeating

In vSphere versions prior to 5.x, machine restarts were always attempted, even if it was only the Management network which went down and the rest of the VM networks were running fine. This was not a desirable situation. VMware have introduced the concept of Datastore heartbeating which adds much more resiliency and false positives which resulted in VMs restarting unnecessarily.

Datastore Heartbeating is used when a master has lost network connectivity with a slave. The Datastore Heartbeating mechanism is then used to validate if a host has failed or is isolated/network partitioned which is validated through the poweron file as mentioned previously. By default HA picks 2 heartbeat datastores. To see which datastores, click on the vCenter name and select Cluster Status

HA3

Isolation and Network Partitioning

A host is considered to be either isolated or network partitioned when it loses network access to a master but has not completely failed.

Isolation

  • A host is not receiving any heartbeats from the master
  • A host is not receiving any election traffic
  • A host cannot ping the isolation address
  • Virtual machines may be restarted depending on the isolation response
  • A VM will only be shut down or powered off when the isolated host knows there is a master out there that has taken ownership for the VM or when the isolated host loses access to the home datastore of the VM

Network Partitioning

  • A host is not receiving any heartbeats from the master
  • A host is receiving election traffic
  • An election process will take place and the state reported to vCenter and virtual machines may be restarted depending on the isolation response

What happens if? 

  • The Master fails

If the slaves have not received any network heartbeats from the master, then the slaves will try and elect a new master. The new master will gather the required information and restart the VMs. The Datastore lock will expire and a newly elected master will relock the file if it has access to the Datastore

  • A Slave fails

The master along with monitoring the slave hosts also receives heartbeats from the slaves every second. If a slave fails or become isolated, the master will check for connectivity for 15 seconds then it will see if the host is still heartbeating to the datastore. Next it will try and ping the management gateway. If the datastore and management gateway prove negative then the host will be declared failed and determine which VMs need to be restarted and will try and distribute them fairly across the remaining hosts

  • Power Outage

If there is a Power Outage and all hosts power down suddenly then as soon as the power for the hosts returned, an election process will be kicked off and a master will be elected. The Master reads protected list which contains all VMs which are protected by HA and then the Master initiates restarts for those VMs which are listed as protected but not running

  • Complete Management Network failure

First of all it’s a very rare scenario where the Management Network becomes unavailable at the same time from all the running Host’s in the Cluster. VMware recommend to have redundant vmnics configured for the Host and each vmkernel management vmnic going into a different management switch for full redundancy. See pic below.

vmkernelredundant

If all the ESXi Hosts lose the Management Network then the Master and the Slaves will remain at the same state as there will be no election happening because the FDM agents communicate through the Management Network. Because the VMs will be accessible on the Datastores which the master knows by reading the protectedlist file and the poweron file on the Datastores, it will know if there is a complete failure of the Management network or a failure of itself or a slave or an isolation/network partition event. Each host will ping the isolation address and declare itself isolated. It will then trigger the isolation response which is to leave VMs powered on

A host remains isolated until it observes HA network traffic, like for instance election messages or it starts getting a response from an isolation address. Meaning that as long as the host is in an “isolated state” it will continue to validate its isolation by pinging the isolation address. As soon as the isolation address responds it will initiate an election process or join an existing election process and the cluster will return to a normal state.

Useful Link

Thanks to Iwan Rahabok 🙂

http://virtual-red-dot.blogspot.co.uk/2012/02/vsphere-ha-isolation-partition-and.html

 

 

Configure customised isolation response settings

HA Isolation Responses

As seen in the below diagram, when HA detects a failure on one of the hosts, a response is triggered to deal with the Virtual Machines on that host

ha

Host Isolation Responses

First of all we need to look at the Host Monitoring which is a selectable box within the HA Settings

Host Monitoring

The restarting by VMware HA of virtual machines on other hosts in the cluster in the event of a host isolation or host failure is dependent on the “host monitoring” setting. If host monitoring is disabled, the restart of virtual machines on other hosts following a host failure or isolation is also disabled. Disabling host monitoring also impacts VMware Fault Tolerance because it controls whether HA will restart a Fault Tolerance (FT) secondary virtual machine after an event. Essentially a host will always perform the programmed host isolation response when it determines it is isolated. The host monitoring setting determines if virtual machines will be restarted elsewhere following this event.

iso2

Isolation  Responses

When an isolation response is triggered, the isolated host must determine whether it must take any action based upon the configuration settings for the isolation response for each virtual machine that is powered on. The isolation response setting provides a means to dictate the action desired for the powered-on virtual machines maintained by a host when that host is declared isolated. There are three possible isolation response values that can be configured and applied to a cluster or individually to a specific virtual machine.These are

  • Leave Powered On
  • Power Off
  • Shut Down

isolation

Leave Powered On

With this option, virtual machines hosted on an isolated host are left powered on. In situations where a host loses all management network access, it might still have the ability to access the storage subsystem and the virtual machine network. Selecting this option enables the virtual machine to continue to function if this were to occur. This is now the default isolation response setting in vSphere High Availability 5.0.

Power Off
When this isolation response option is used, the virtual machines on the isolated host are immediately stopped. This is similar to removing the power from a physical host. This can induce inconsistency with the file system of the OS used in the virtual machine. The advantage of this action is that VMware HA will attempt to restart the virtual machine more quickly than when using the third option.

Shut Down

Through the use of the VM Tools package installed within the guest operating system of a virtual machine, this option attempts to gracefully shut down the operating system with the virtual machine before powering off the virtual machine. This is more desirable than using the Power Off option because it provides the OS with time to commit any outstanding I/O activity to disk. HA will wait for a default of 300 seconds (5 minutes) for this graceful shutdown to occur. If the OS is not gracefully shut down by this time, it will initiate a power-off of the virtual machine. Changing the das.isolationshutdowntimeout attribute will modify this timeout if it is determined that more time is required to gracefully shut down an OS. The Shut Down option requires that the VM Tools package be installed in the guest OS. Otherwise, it is equivalent to the Power Off setting.

Best Practices

From a best practices perspective, Leave Powered On is the recommended isolation response setting for the majority of environments. Isolated hosts are a rare event in a properly architected environment, given the redundancy built in. In environments that use network-based storage protocols, such as iSCSI and NFS, the recommended isolation response is Power Off. With these environments, it is highly likely that a network outage that causes a host to become isolated will also affect the host’s ability to communicate to the datastores.
An isolated host will initiate the configured isolation response for a running virtual machine if either of the following is true

  • The host lost access to the datastore containing the configuration (.vmx) file for the virtual machine
  • The host still has access to the datastore and it determined that a master is responsible for the virtual machine.

To determine this, the isolated host checks for the accessibility of the “home datastore” for each virtual machine and whether the virtual machines on that datastore are “owned” by a master, which is indicated by a master’s having exclusively locked a key file that HA maintains on the datastore. After declaring itself as being isolated, the isolated host releases any locks it might have held on any datastores. It then checks periodically to see whether a master has obtained a lock on the datastore. After a lock is observed on the datastore by the isolated host, the HA agent on the isolated host applies the configured isolation response. Ensuring that a virtual machine is under continuous protection by a master provides an additional layer of protection. Because only one master can lock a datastore at a given time, this significantly reduces chances of “split-brain” scenarios. This also protects against situations where a complete loss of the management networks without a complete loss of access to storage would make all the hosts in a cluster determine they were isolated.
In certain environments, it is possible for a loss of the management network to also affect access to the heartbeat datastores. This is the case when the heartbeat datastores are hosted via NFS that is tied to the management network in some manner. In the event of a complete loss of connectivity to the management network and the heartbeat datastores, the isolation response activity resembles that observed in vSphere 4.x.
In this configuration, the isolation response should be set to Power Off so another virtual machine with access to the network can attempt to power on the virtual machine.
There is a situation where the isolation response will likely take an extended period of time to transpire. This occurs when all paths to storage are disconnected, referred to as an all-paths-down (APD) state, and the APD condition does not impact all of the datastores mounted on the host. This is due to the fact that there might be outstanding write requests to the storage subsystem that must time out. Establishing redundant paths to the storage subsystem will help prevent an APD situation and this issue.

Can you have VMware hosts with different amounts of RAM?

Yes you can have hosts with different amounts of RAM in a cluster.

HA will work ok as long as one host doesn’t have less RAM than an actual VM needs as a reservation

For example: If you have 3 hosts (2 with 64 GB RAM) and one with 32 GB RAM. If you then have a large VM with a 36 GB reservation then each host needs to be able to power on the largest VM in the event of a failover and in this case the host with 32 GB ram would not be able to power on the VM.

HA Advanced Settings

Below are some of the Advanced HA Settings you can find on vSphere 5 and prior

Please note that each bullet details the version which supports this advanced setting:

  • das.maskCleanShutdownEnabled – 5.0 only
    Whether the clean shutdown flag will default to false for an inaccessible and poweredOff VM. Enabling this option will trigger VM failover if the VM’s home datastore isn’t accessible when it dies or is intentionally powered off.
  • das.ignoreInsufficientHbDatastore – 5.0 only
    Suppress the host config issue that the number of heartbeat datastores is less than das.heartbeatDsPerHost. Default value is “false”. Can be configured as “true” or “false”.
  • das.heartbeatDsPerHost – 5.0 only
    The number of required heartbeat datastores per host. The default value is 2; value should be between 2 and 5.
  • das.failuredetectiontime – 4.1 and prior
    Number of milliseconds, timeout time, for isolation response action (with a default of 15000 milliseconds). Pre-vSphere 4.0 it was a general best practice to increase the value to 60000 when an active/standby Service Console setup was used. This is no longer needed. For a host with two Service Consoles or a secondary isolation address a failuredetection time of 15000 is recommended.
  • das.isolationaddress[x] – 5.0 and prior
    IP address the ESX hosts uses to check on isolation when no heartbeats are received, where [x] = 0 ‐ 9. (see screenshot below for an example) VMware HA will use the default gateway as an isolation address and the provided value as an additional checkpoint. I recommend to add an isolation address when a secondary service console is being used for redundancy purposes. Start at das.isolationaddress1 when adding a second gateway
  • das.usedefaultisolationaddress – 5.0 and prior
    Value can be “true” or “false” and needs to be set to false in case the default gateway, which is the default isolation address, should not or cannot be used for this purpose. In other words, if the default gateway is a non-pingable address, set the “das.isolationaddress0” to a pingable address and disable the usage of the default gateway by setting this to “false”.
  • das.isolationShutdownTimeout – 5.0 and prior
    Time in seconds to wait for a VM to become powered off after initiating a guest shutdown, before forcing a power off.
  • das.allowNetwork[x] – 5.0 and prior
    Enables the use of port group names to control the networks used for VMware HA, where [x] = 0 – ?. You can set the value to be ʺService Console 2ʺ or ʺManagement Networkʺ to use (only) the networks associated with those port group names in the networking configuration.
  • das.bypassNetCompatCheck – 4.1 and prior
    Disable the “compatible network” check for HA that was introduced with ESX 3.5 Update 2. Disabling this check will enable HA to be configured in a cluster which contains hosts in different subnets, so-called incompatible networks. Default value is “false”; setting it to “true” disables the check.
  • das.ignoreRedundantNetWarning – 5.0 and prior
    Remove the error icon/message from your vCenter when you don’t have a redundant Service Console connection. Default value is “false”, setting it to “true” will disable the warning. HA must be reconfigured after setting the option.
  • das.vmMemoryMinMB – 5.0 and prior
    The minimum default slot size used for calculating failover capacity. Higher values will reserve more space for failovers. Do not confuse with “das.slotMemInMB”.
  • das.slotMemInMB – 5.0 and prior
    Sets the slot size for memory to the specified value. This advanced setting can be used when a virtual machine with a large memory reservation skews the slot size, as this will typically result in an artificially conservative number of available slots.
  • das.vmCpuMinMHz – 5.0 and prior
    The minimum default slot size used for calculating failover capacity. Higher values will reserve more space for failovers. Do not confuse with “das.slotCpuInMHz”.
  • das.slotCpuInMHz – 5.0 and prior
    Sets the slot size for CPU to the specified value. This advanced setting can be used when a virtual machine with a large CPU reservation skews the slot size, as this will typically result in an artificially conservative number of available slots.
  • das.sensorPollingFreq – 4.1 and prior
    Set the time interval for HA status updates. As of vSphere 4.1, the default value of this setting is 10. It can be configured between 1 and 30, but it is not recommended to decrease this value as it might lead to less scalability due to the overhead of the status updates.
  • das.perHostConcurrentFailoversLimit – 5.0 and prior
    By default, HA will issue up to 32 concurrent VM power-ons per host. This setting controls the maximum number of concurrent restarts on a single host. Setting a larger value will allow more VMs to be restarted concurrently but will also increase the average latency to recover as it adds more stress on the hosts and storage.
  • das.config.log.maxFileNum – 5.0 only
    Desired number of log rotations.
  • das.config.log.maxFileSize – 5.0 only
    Maximum file size in bytes of the log file.
  • das.config.log.directory – 5.0 only
    Full directory path used to store log files.
  • das.maxFtVmsPerHost – 5.0 and prior
    The maximum number of primary and secondary FT virtual machines that can be placed on a single host. The default value is 4.
  • das.includeFTcomplianceChecks – 5.0 and prior
    Controls whether vSphere Fault Tolerance compliance checks should be run as part of the cluster compliance checks. Set this option to false to avoid cluster compliance failures when Fault Tolerance is not being used in a cluster.
  • das.iostatsinterval (VM Monitoring) – 5.0 and prior
    The I/O stats interval determines if any disk or network activity has occurred for the virtual machine. The default value is 120 seconds.
  • das.failureInterval (VM Monitoring) – 5.0 and prior
    The polling interval for failures. Default value is 30 seconds.
  • das.minUptime (VM Monitoring) – 5.0 and prior
    The minimum uptime in seconds before VM Monitoring starts polling. The default value is 120 seconds.
  • das.maxFailures (VM Monitoring) – 5.0 and prior
    Maximum number of virtual machine failures within the specified “das.maxFailureWindow”, If this number is reached, VM Monitoring doesn’t restart the virtual machine automatically. Default value is 3.
  • das.maxFailureWindow (VM Monitoring) – 5.0 and prior
    Minimum number of seconds between failures. Default value is 3600 seconds. If a virtual machine fails more than “das.maxFailures” within 3600 seconds, VM Monitoring doesn’t restart the machine.
  • das.vmFailoverEnabled (VM Monitoring) – 5.0 and prior
    If set to “true”, VM Monitoring is enabled. When it is set to “false”, VM Monitoring is disabled.
  • das.config.fdm.deadIcmpPingInterval – 5.0 only
    Default value is 10. ICPM pings are used to determine whether a slave host is network accessible when the FDM on that host is not connected to the master. This parameter controls the interval (expressed in seconds) between pings.
  • das.config.fdm.icmpPingTimeout – 5.0 only
    Default value is 5. Defines the time to wait in seconds for an ICMP ping reply before assuming the host being pinged is not network accessible.
  • das.config.fdm.hostTimeout – 5.0 only
    Default is 10. Controls how long a master FDM waits in seconds for a slave FDM to respond to a heartbeat before declaring the slave host not connected and initiating the workflow to determine whether the host is dead, isolated, or partitioned.
  • das.config.fdm.stateLogInterval – 5.0 only
    Default is 600. Frequency in seconds to log cluster state.
  • das.config.fdm.ft.cleanupTimeout – 5.0 only
    Default is 900. When a vSphere Fault Tolerance VM is powered on by vCenter Server, vCenter Server informs the HA master agent that it is doing so. This option controls how many seconds the HA master agent waits for the power on of the secondary VM to succeed. If the power on takes longer than this time (most likely because vCenter Server has lost contact with the host or has failed), the master agent will attempt to power on the secondary VM.
  • das.config.fdm.storageVmotionCleanupTimeout – 5.0 only
    Default is 900. When a Storage vMotion is done in a HA enabled cluster using pre 5.0 hosts and the home datastore of the VM is being moved, HA may interpret the completion of the storage vmotion as a failure, and may attempt to restart the source VM. To avoid this issue, the HA master agent waits the specified number of seconds for a storage vmotion to complete. When the storage vmotion completes or the timer expires, the master will assess whether a failure occurred.
  • das.config.fdm.policy.unknownStateMonitorPeriod – 5.0 only
    Defines the number of seconds the HA master agent waits after it detects that a VM has failed before it attempts to restart the VM.
  • das.config.fdm.event.maxMasterEvents – 5.0 only
    Default is 1000. Defines the maximum number of events cached by the master
  • das.config.fdm.event.maxSlaveEvents – 5.0 only
    Default is 600. Defines the maximum number of events cached by a slave.

Basic design principle: Avoid using advanced settings as much as possible as it leads to increased complexity.

Always disable HA and re-enable to activate any changes

Useful KB Links

Advanced Configuration options for VMware High Availability for pre-5.0

Setting Multiple Isolation Response Addresses for VMware High Availability

 

Understanding vSphere 5 High Availability

On the outside, the functionality of vSphere HA is very similar to the functionality of vSphere HA in vSphere 4. Now though HA uses a new VMware developed tool called FDM (Fault Domain Manager) This tool is a replacement for AAM (Automated Availability Manager)

Limitations of AAM

  • Strong dependance on name resolution
  • Scalability Limits

Advantages of FDM over AAM

  • FDM uses a Master/Slave architecture that does not rely on Primary/secondary host designations
  • As of 5.0 HA is no longer dependent on DNS, as it works with IP addresses only.
  • FDM uses both the management network and the storage devices for communication
  • FDM introduces support for IPv6
  • FDM addresses the issues of both network partition and network isolation
  • Faster install of HA once configured

FDM Agents

FDM uses the concept of an agent that runs on each ESXi host. This agent is separate from the vCenter Management Agents that vCenter uses to communicate with the the ESXi hosts (VPXA)

The FDM agent is installed into the ESXi Hosts in /opt/vmware/fdm and stores it’s configuration files at /etc/opt/vmware/fdm

How FDM works

  • When vSphere HA is enabled, the vSphere HA agents participate in an election to pick up a vSphere HA master.The vSphere HA Master is responsible for a number of key tasks within a vSphere HA enabled cluster
  • You must now have at least two shared data stores between all hosts in the HA cluster.
  • The Master monitors Slave hosts and will restart VMs in the event of a slave host failure
  • The vSphere HA Master monitors the power state of all protected VMs. If a protected VM fails, it will restart the VM
  • The Master manages the tasks of adding and removing hosts from the cluster
  • The Master manages the list of protected VMs.
  • The Master caches the cluster configuration and notifies the slaves of any changes to the cluster configuration
  • The Master sends heartbeat messages to the Slave Hosts so they know that the Master is alive
  • The Master reports state information to vCenter Server.
  • If the existing Master fails, a new HA Master is automatically elected. If the Master went down and a Slave was promoted, when the original Master comes back up, does it become the Master again?  The answer is no.

Enhancements to the User Interface

3 tabs in the Cluster Status

Cluster Settings showing the new Datatatores for heartbeating

How does it work in the event of a problem?

Virtual machine restarts were always initiated, even if only the management network of the host was isolated and the virtual machines were still running. This added an unnecessary level of stress to the host. This has been mitigated by the introduction of the datastore heartbeating mechanism. Datastore heartbeating adds a new level of resiliency and allows HA to make a distinction between a failed host and an isolated / partitioned host. You must now have at least two shared data stores between all hosts in the HA cluster.

Network Partitioning

The term used to describe a situation where one or more Slave Hosts cannot communicate with the Master even though they still have network connectivity. In this case HA is able to check the heartbeat datastores to detect whether the hosts are live and whether action needs to be taken

Network Isolation

This situation involves one or more Slave Hosts losing all management connectivity. Isolated hosts can neither communicate with the vSphere HA Master or communicate with other ESXi Hosts. In this case the Slave Host uses Heartbeat Datastores to notify the master that it is isolated. The Slave Host uses a special binary file, the host-X-poweron file to notify the master. The vSphere Master can then take appropriate action to ensure the VMs are protected

  • In the event that a Master cannot communicate with a slave across the management network or a Slave cannot communicate with a Master then the first thing it will try and do is contact the isolation address. By default the gateway on the Management Network
  • If it can’t reach the Gateway, it considers itself isolated
  • At this point, an ESXi host that has determined it is network isolated will modify a special bit in the binary host-x-poweron file which is found on all datastores which are selected for datastore heartbeating
  • The Master sees this bit, used to denote isolation and is therefore notified that the slave host has been isolated
  • The Master then locks another file used by HA on the Heartbeat Datastore
  • When the isolated node sees that this file has been locked by a master, it knows that the master is responsible for restarting the VMs
  • The isolated host is then free to carry out the configured isolation response which only happens when the isolated slave has confirmed via datastore heartbeating infrastructures that the Master has assumed responsibility for restarting the VMs.

Isolation Responses

I’m not going to go into those here but they are bulleted below

  • Shutdown
  • Restart
  • Leave Powered On

Should you change the default Host Isolation Response?

It is highly dependent on the virtual and physical networks in place.

If you have multiple uplinks, vSwitches and physical switches, the likelihood is that only one part of the network may go down at once. In this case use the Leave Powered On setting as its unlikely that a network isolation event would also leave the VMs on the host inaccessible.

Customising the Isolation response address

It is possible to customise the isolation response address in 3 different ways

  • Connect to vCenter
  • Right click the cluster and select Edit Settings
  • Click the vSphere HA Node
  • Click Advanced
  • Enter one of the 3 options below
  • das.isolationaddress1 which tries the first gateway
  • das.isolationaddress2 which tries a second gateway
  • das.AllowNetwork which allows a different Port Group to try

Enabling Host Monitoring in HA Clusters

VMware HA clusters enable a collection of ESX/ESXi hosts to work together so that, as a group, they provide higher levels of availability for virtual machines than each ESX/ESXi host could provide individually. When you plan the creation and usage of a new VMware HA cluster, the options you select affect the way that cluster responds to failures of hosts or virtual machines.
Before creating a VMware HA cluster, you should be aware of how VMware HA identifies host failures andisolation and responds to these situations. You also should know how admission control works so that you can choose the policy that best fits your failover needs. After a cluster has been established, you can customize its behavior with advanced attributes and optimize its performance by following recommended best practices.

How VMware HA works

VMware HA provides high availability for virtual machines by pooling them and the hosts they reside on into a cluster. Hosts in the cluster are monitored and in the event of a failure, the virtual machines on a failed host are restarted on alternate hosts.

Primary and Secondary Hosts in a VMware HA Cluster

When you add a host to a VMware HA cluster, an agent is uploaded to the host and configured to communicate with other agents in the cluster. The first five hosts added to the cluster are designated as primary hosts, and all subsequent hosts are designated as secondary hosts. The primary hosts maintain and replicate all cluster state and are used to initiate failover actions. If a primary host is removed from the cluster, VMware HA promotes another host to primary status.
Any host that joins the cluster must communicate with an existing primary host to complete its configuration (except when you are adding the first host to the cluster). At least one primary host must be functional for VMware HA to operate correctly. If all primary hosts are unavailable (not responding), no hosts can be successfully configured for VMware HA.

Failure Detection and Host Network Isolation

Agents communicate with each other and monitor the liveness of the hosts in the cluster. This is done through the exchange of heartbeats, by default, every second. If a 15-second period elapses without the receipt of heartbeats from a host, and the host cannot be pinged, it is declared as failed. In the event of a host failure, the virtual machines running on that host are failed over, that is, restarted on the alternate hosts with the most available unreserved capacity (CPU and memory.)

Note: In the event of a host failure, VMware HA does not fail over any virtual machines to a host that is in maintenance mode, because such a host is not considered when VMware HA computes the current failover level. When a host exits maintenance mode, the VMware HA service is re-enabled on that host, so it becomes available for failover again.

Host network isolation occurs when a host is still running, but it can no longer communicate with other hosts in the cluster. With default settings, if a host stops receiving heartbeats from all other hosts in the cluster for more than 12 seconds, it attempts to ping its isolation addresses. If this also fails, the host declares itself as isolated from the network.
When the isolated host’s network connection is not restored for 15 seconds or longer, the other hosts in the cluster treat it as failed and attempt to fail over its virtual machines. However, when an isolated host retains access to the shared storage it also retains the disk lock on virtual machine files. To avoid potential data corruption, VMFS disk locking prevents simultaneous write operations to the virtual machine disk files and attempts to fail over the isolated host’s virtual machines fail. By default, the isolated host shuts down its virtual
machines, but you can change the host isolation response to Leave powered on or Power off

Redundancy and Reducing Isolation

If you ensure that your network infrastructure is sufficiently redundant and that at least one network path is available at all times, host network isolation should be a rare occurrence.

Which setting should I use?

  • Shutdown

It depends. Some people prefer “Shut down” because they do not want to use a deprecated host and it will shut down your VMs in a clean manner.

The isolation response is a setting that needs to be taken into account when you create your design. For instance when using an iSCSI array or NFS choosing “leave powered on” as your default isolation response might lead to a split-brain situation depending on the version of ESX used. The reason for this being that the disk lock times out if the iSCSI network is also unavailable. In this case the VM is being restarted on a different host while it is not being powered off on the original host. In a normal situation this should not lead to problems as the VM is restarted and the host on which it runs owns the lock on the VMDK, but for some weird reason when disaster strikes you will not end up in a normal situation but you might end up in an exceptional situation

  • Leave Powered On

Many people prefer to use “Leave powered on” because it reduces the chances of a false positive. A false positive in this case is an isolated heartbeat network but a non-isolated VM network and a non-isolated iSCSI / NFS network.

How does HA knows if the host is isolated or completely unavailable when you have selected “leave powered on”?

HA actually does not know the difference. HA will try to restart the affected VMs in both cases. When the host has failed a restart will take place, but if a host is merely isolated the non-isolated hosts will not be able to restart the affected VMs. This is because of the VMDK file lock; no other host will be able to boot a VM when the files are locked. When a host fails this lock starves and a restart can occur.

Isolation Response Considerations

The default value for isolation/failure detection is 15 seconds. In other words the failed or isolated host will be declared dead by the other hosts in the HA cluster on the fifteenth second and a restart will be initiated by one of the primary hosts.

For now let’s assume the isolation response is “power off”. The “power off”(isolation response) will be initiated by the isolated host 1 second before the das.failuredetectiontime. A “power off” will be initiated on the fourteenth second and a restart will be initiated on the fifteenth second.

Does this mean that you can end up with your VMs being down and HA not restarting them?
Yes, when the heartbeat returns between the 14th and 15th second the “power off” could already have been initiated. The restart however will not be initiated because the heartbeat indicates that the host is not isolated anymore.

How can you avoid this?

Pick “Leave VM powered on” as an isolation response. Increasing the das.failuredetectiontime will also decrease the chances of running in to issues like these.

Basic design principle: Increase “das.failuredetectiontime” to 30 seconds (30000) to decrease the likely-hood of a false positive

Please see the below link for further information on this

http://rickardnobel.se/vmware-ha-das-failuredetectiontime/

Calculate Available Resources and VMware HA (High Availability) Slots

Admission Control Settings

Within a cluster we use Admission control to ensure that sufficient resources exist to provide failover protection. Admission control is also used to ensure that virtual machine resource reservations are protected

Admission Control Policies

  • Host Failures the Cluster tolerates
  • Percentage of Cluster Resources reserved as failover spare capacity
  • Specify Failover Hosts

Host Failures the Cluster tolerates

What is a Slot?

A slot is a logical representation of the memory and CPU resources that satisfy the requirements for any powered-on virtual machine in the cluster

In vCenter Server 4.0, the slot size is now shown in vSphere Client on the Summary tab of the cluster

How is the Slot calculated?

  • VMware HA determines how many slots are available in each ESX/ESXi host based on the host’s CPU and memory capacity.
  • It then determines how many ESX/ESXi hosts can fail in the cluster with at least as many slots as powered on virtual machines.

Default Reservation Values

Slot size is comprised of two components, CPU and memory

VMware calculates the memory component by obtaining the memory reservation (If set) plus memory overhead, of each powered-on virtual machine and selecting the largest value. There is no default value for the memory reservation.

If a virtual machine does not have reservations, meaning that the reservation is 0, default values are used as listed below

  • 0 MB of RAM and 256 MHz CPU speed are used for vSphere 4 and Prior
  • 0 MB of RAM and 32MHz for CPU for vSphere 5.0 and above
  • When no memory reservation is specified for a virtual machine, the largest memory overhead for any virtual machine in the cluster will be used as the default slot size value for memory

Advanced Settings for CPU and Memory Slot Size

  • das.vmMemoryMinMB <value>

This options/value pair overrides the default memory slot size value used for admission control for VMware HA where <value> is the amount of RAM in MB to be used for the calculation if there are no larger memory reservations. By default this value is set to 256MB. This is the minimum amount of memory in MB sufficient for any VM in the cluster to be usable

  • das.vmCPUMinMHz <value>

This options/value pair overrides the default CPU slot size value used for admission control for VMware HA where <value> is the amount of CPU in MHz to be used for the calculation if there are no larger memory reservations. By default this value is set to 256MHz

Maximum Upper Bound Advanced Settings for Slot Sizing

If your cluster contains any virtual machines that have much larger reservations than the others, they will  distort slot size calculation. To avoid this, you can specify an upper bound for the CPU or memory component of the slot size by using the das.slotcpuinmhz or das.slotmeminmb advanced attributes, respectively.

Keep in mind that when you are low on resources this could mean that you are not able to power-on this high reservation VM as resources are fragmented throughout the cluster instead of located on a single host.

  • das.slotmeminmb <value>

This option defines the maximum bound on the memory slot size. If this option is used, the slot size is the smaller of this value or the maximum memory reservation plus memory overhead of any powered-on virtual machine in the cluster.

  • das.slotcpuinmhz <value>

This option defines the maximum bound on the CPU slot size. If this option is used, the slot size is the smaller of this value or the maximum CPU reservation of any powered on virtual machine in the cluster

HA Failover Capacity

There are lots of questions surrounding VMware’s HA (High Availability), especially when users see a message stating there are “Insufficient resources to satisfy HA failover.” It is worth making the effort to understand capacity calculations. In current versions of ESX(i)and earlier, the following calculation applies for failover capacity.

Failover Capacity is determined using a slot size value that is calculated on the cluster. Slots are calculated by a combination of the total CPU and Memory that are in the physical hosts. The calculation for failover capacity works as follows:

Let’s say you have 4 ESX servers in your VMware HA cluster and Configured Failover capacity on the cluster is set to 1.

Physical memory in the hosts is as follows:

ESX1 = 16 GB
ESX2 = 24 GB
ESX3 = 32 GB
ESX4 = 32 GB

In the cluster you have 24 VM’s each configured and running. Of the 24 VM’s running, determine the VM which has the highest “configured memory”. For this example let’s say this is 2GB. All other VMs are configured with less or equal to 2GB.

With this information we can now do the calculation:

1. Pick the ESX host which has the least amount of RAM. In this case it is ESX1 and the minimum amount of RAM is = 16 GB

2. Divide the value found in step 1 with value for the maximum RAM in a VM. In my example this gives us 8 (16 divided by 2). This means we have 8 slots available per ESX host in the cluster.

3. Since we have 4 hosts and the configured failover capacity for the cluster is 1, we are left with 3 hosts in a failure situation. Hence the total number of VMs that can be powered on these 3 servers is 24 VMs. (i.e. 8 multiplied by 3 = 24)

4. If the total number of VMs in the cluster exceeds 24 then it will give us “Insufficient resources to satisfy HA failover” and the “current failover capacity will be shown as 0″. If the number is less than 24, we should not get this message.

Note: If you are still seeing the message and you have less VM’s running than in the calculation allows for, check both the CPU and Memory reservations on both VM’s and resource pools, as this can skew the calculation. You should avoid unnecessary memory or cpu reservations on VM’s as this can cause these types of errors to occur, because we have to ensure that the resource is available.

Host Failures?

What happens if you set the number of allowed host failures to 1?
The host with the most slots will be taken out of the equation. If you have 8 hosts with 90 slots in total but 7 hosts each have 10 slots and one host 20 this single host will not be taken into account. Worst case scenario! In other words the 7 hosts should be able to provide enough resources for the cluster when a failure of the “20 slot” host occurs.

And of course if you set it to 2 the next host that will be taken out of the equation is the host with the second most slots and so on

How can we get round distorted Slot Sizes causing HA errors?

There are multiple ways to fix, or get around this calculation. The most common are as follows:

  • Set the Disable – “ Power on Vms that violate availability constraints” in the configuration of the cluster. In this case it ignores the above calculation and will try to power on as many VM’s as possible in case of HA failover. If this is the option chosen you can also set restart priority in the ‘Virtual Machine Options’ section of the cluster configuration. This way any high priority VM’s are powered on first, and then the lower priority up to the point where we cannot power any further VM’s on

  • If you have one VM which is configured with a very high amount of memory, you can either lower its configured memory, or take it out of the cluster and run it on any other standalone ESX host. This will increase the number of slots available with the current hardware
  • Increase the amount of RAM on servers so that there are more slots available with the current RAM reservations.
  • Remove any CPU reservations on any VM(s) that are greater than the max speed of the processors in the hosts.
  • With vSphere this is something that’s configurable. If you have just one VM with a really high reservation you can set the following advanced settings to lower the slot size being used during these calculations: das.slotCpuInMHz or das.slotMemInMB. To avoid not being able to power on the VM with high reservations these VM will take up multiple slots. Keep in mind that when you are low on resources this could mean that you are not able to power-on this high reservation VM as resources are fragmented throughout the cluster instead of located on a single host.

What if you don’t want to…

  • Disable strict admission control
  • Mess around with setting advanced settings for Minimum Memory and CPU Slot size
  • Lower the VM Memory reservation

There is also the option of

  • Creating a memory reservation on a Resource Pool and putting the VM in here

Why?

High Availability ignores resource pools reservation settings when calculating the slot size, so if a single VM is placed in a resource pool with memory reservation configured, it will have the same effect on resource allocation as per VM memory reservation, but does not affect the HA slot size.

By creating a resource pool with a substantial memory setting you can avoid decreasing the consolidation ratio of the cluster and still guarantee the virtual machine its resources. You need to be careful though. Creating a Resource Pool for each VM would be a catastrophic way of managing multiple high memory configured VMs and probably should be carried out when you have 1 or 2 VMs that have this type of configuration

Percentage of Cluster Resources Reserved as Failover

With the Percentage of Cluster Resources reserved for Failover Spare Capacity, vSphere HA ensures that a specified percentage of aggregate CPU and memory is reserved for Failover

vSphere HA uses reservations of CPU and Memory if they have been set. If not they use a default value of 0MB Memory and 256MHz CPU

With this policy HA does the following

  • Calculates the total resource requirement for all powered on machines in the cluster
  • Calculates the total host resources available for the virtual machines
  • Calculates the current CPU and Memory failover capacity for the cluster
  • Determines if either the current CPU failover or current memory failover is less than the corresponding failover capacity
  • If so Admission Control disallows the operation

Example

Specify Failover Hosts

If you choose this option, be aware that you will lose one whole host to be put aside for capcity

HA Slot sizes in the vSphere 5 Web Client

You now have the ability to set slot size for “Host failures tolerated” through the vSphere Web Client

slot

More Information

There are great articles on the below webpages regarding HA Slot sizing and calculation

http://www.vmwarewolf.com/ha-failover-capacity/#more

and this article walking you through an example

http://www.vladan.fr/ha-slot-sizes/

HA Slot sizes in the vSphere 5 Web Client

http://www.yellow-bricks.com/2012/09/12/whats-new-vsphere-5-1-high-availability/