Archive for April 2012

What is the vSphere Web Client?

What is the vSphere Web Client?

  • This is a funky new feature in vSphere 5
  • An alternative to using the vSphere Client and a web-based interface to vCenter or a VMware ESXi host
  • Supports Firefox and IE Browsers on multiple O/S platforms (Windows and Linux)
  • Customisable Interface
  • Advanced Search Functionality
  • Partners and Users can add features and capabilities
  • Requires Adobe Flash Player

What can it be used for?

  •  Managing VMs
  • Creating VMs
  • Performing VM operations
  • Configuring VM Resources
  • Viewing all vSphere objects
  • Performing basic health monitoring
  • Supplying a remote console
  • Managing vSphere apps through the web

Where do you install it from?

The only difference between installing this to the vSphere Client is selecting an HTTP and a HTTPS Port. By default the HTTP and HTTPS ports are 9090 and 9443 although be careful using 9443 as this is used as a storage I/O Port by VMware. Possible conflict

Installation

  • Before you can connect to a vCenter Server Instance, you must register the vCenter Server Instance.
  • Select Start > Programs > VMware > VMware vSphere Web Client > vSphere Administration Application and click Register vCenter Server

  •  Use this tool to register one or more vCenter Instances. This tool cannot be run remotely. The User must have Admin rights to the vCenter Server Instance

  • Ignore the Client Certificate warning
  • The next screen is the final screen showing you are configured.

  • Open a Web Browser window and type in the URL HTTP:vCenterSever:9443/vSphere-client
  • Put in Server, Username and Password

This Service runs under a service called vCenter Inventory Service – Always make sure this is running or you may get an error when you connect.

Unable to add new LUNS on VMware 4.1 U2

Problem

This week we have upgraded our hosts to VMware ESXi 4.1.0, 582267. Our storage guy has given us 2 x 2TB LUN’s but I was unable to add them as per screen-print below. Previously he has created 2TB LUNs and these have been fine

Unable to read partition information from disk

Solution

It seems Update 2 enforces the maximum LUN size, which is 2TB minus 512 Bytes with vSphere 4.x. Depending on the storage system, 2 TB could be either 2.000 GB (marketing size) or 2.048GB (technical size). The above mentioned maximum relates to the technical size, so with the storage system you have, you may need to configure 2.047GB max.

See Also

http://virtualgeek.typepad.com/virtual_geek/2009/06/vsphere-and-2tb-luns-changes-from-vi3x.html

External I/O workload detected on shared datastore running Storage I/O Control (SIOC)

This alarm may appear in the vCenter vSphere Client. A warning message similar to one of these may also appear in the vCenter vSphere Client:

  • Non-VI Workload detected on the datastore
  • An external I/O workload is detected on datastore XYZABC

This informational event alerts the user of a potential misconfiguration or I/O performance issue caused by a non-ESX workload. It is triggered when Storage I/O Control (SIOC) detects that a workload that is not managed by SIOC is contributing to I/O congestion on a datastore that is managed by SIOC. (Congestion is defined as a datastore’s response time being above the SIOC threshold.) Specific situations that can trigger this event include:

  • The host is running in an unsupported configuration.
  • The storage array is performing a system operation such as replication or RAID reconstruction.
  • VMware Consolidated Backup or vStorage APIs for Data Protection are accessing a snapshot on the datastore for backup purposes.
  • The storage media (spindles, SSD) on which this datastore is located is shared with volumes used by non-vSphere workloads
SIOC continues to work during these situations. This event can be ignored in many cases and you can disable the associated alarm once you have verified that none of the potential misconfigurations or serious performance issues are present in your environment. As explained in detail below, SIOC ensures that the ESX workloads it manages are able to compete for I/O resources on equal footing with external workloads. This event notifies the user of what is happening, provides the user with the opportunity to better understand what is going on, and highlights a potential opportunity to correct or optimize the infrastructure configuration.

NOTE: At this time, SIOC is not supported with NFS storage or with Raw Device Mappings (RDM) virtual disks. This includes RDM’s used for MSCS (Microsoft cluster server) also datastores with multiple extents. This alarm could occur if these storage object are configured.

Example Scenario 1:

A /vmfs/volumes/shared-LUN datastore is accessible across multiple hosts. Some hosts are running ESX version 4.1 or later and others are either running an older version or are outside the control domain of vCenter Server.

Example Scenario 2:

The array being used for vSphere is also being used for non-vSphere workloads. The non-vSphere workloads are accessing a storage volume that is on the same disk spindles as the affected datastore.

Impact

When SIOC detects that datastore response time has exceeded the threshold, it typically throttles the ESX workloads accessing the datastore to ensure that the workloads with the highest shares get preference for I/O access to the datastore and lower I/O response time. However, such throttling is not appropriate when workloads not managed by SIOC are accessing the same storage media. Throttling in this case would result in the external workload getting more and more bandwidth, while the vSphere workloads get less and less. Therefore, SIOC detects the presence of such external workloads, and as long as they are present while the threshold is being exceeded, SIOC competes with the interfering workload by curtailing its usual throttling activity.

SIOC automatically detects when the interference goes away and resumes its normal behavior. In this way, SIOC is able to operate correctly even in the presence of interference. The vCenter Server event is notifying the user that SIOC has noticed and handled the interference from external workloads.

Note: When an external workload is acting to drive the datastore response time above the SIOC threshold, the external workload might cause I/O performance issues for vSphere workloads. In most cases, SIOC can automatically and safely manage this situation. However, there may be an opportunity to improve performance by changing some aspects of your configuration. The next section provides guidance on this

These unsupported configurations can result in the event:

  • One or more hosts accessing the datastore are running an ESX version older than 4.1.
  • One or more hosts accessing the datastore are not managed by vCenter Server.
  • Not all of the hosts accessing the datastore are managed by the same vCenter Server.
  • The storage media (spindles, SSD) where this datastore is located are shared with other datastores that are not SIOC enabled.
  • Datastores in the configuration have multiple extents.

Ensure that you are running a supported configuration:

  • Can you disable and successfully re-enable congestion management for the affected datastore?

Disable and attempt to re-enable congestion management for the affected datastore. If the event occurred because the configuration includes hosts that are running an older version of ESX and the hosts are managed by the same vCenter Server, vCenter Server detects the problem and does not allow you to re-enable congestion management. When the older hosts are updated to ESX 4.1 or later, or the hosts are disconnected from the affected datastore, you can enable congestion management.

  • Are hosts that are not managed by this vCenter Server accessing the affected datastore?
If disabling and re-enabling congestion management for the affected datastore does not solve the problem, other hosts that are not managed by this vCenter Server might be accessing the datastore.

Verify that the datastore is shared across hosts that are managed by different vCenter Server systems or are not managed hosts. If so, perform one of these actions:

  • Do all datastores in the configuration that share the same physical storage media (spindles, SSD) have the same SIOC configuration?
    All datastores that share physical storage media must share the same SIOC configuration — all enabled or all disabled. In addition, if you have modified the default congestion threshold setting, all datastores that share storage media must have the same setting.
  • Are any SIOC-enabled datastores in the configuration backed up by multiple extents?
    SIOC-enabled datastores must not be backed up by multiple extents.

If none of the above scenarios apply to your configuration and you have determined that you are running a supported configuration, but are still seeing this event, investigate possible I/O throttling by the storage array.

If an environment is known to have shared access to datastores or performance constraints, it may be preferable to disable the Alarm in vCenter Server. For more information, see Working with Alarms in the vSphere 4.1 Datacenter Administration Guide.

Flowchart for Troubleshooting

Using ESXi with iSCSI SANs

What is ISCSi?

iSCSI SANs use Ethernet connections between computer systems, or host servers, and high performance storage subsystems. The SAN components include iSCSI host bus adapters (HBAs) or Network Interface Cards (NICs) in the host servers, switches and routers that transport the storage traffic, cables, storage processors (SPs), and storage disk systems.
iSCSI SAN uses a client-server architecture. The client, called iSCSI initiator, operates on your host. It initiates iSCSI sessions by issuing SCSI commands and transmitting them, encapsulated into iSCSI protocol, to a server.
The server is known as an iSCSI target. The iSCSI target represents a physical storage system on the network. It can also be provided by a virtual iSCSI SAN, for example, an iSCSI target emulator running in a virtual machine. The iSCSI target responds to the initiator’s commands by transmitting required iSCSI data.

Ports

A single discoverable entity on the iSCSI SAN, such as an initiator or a target, represents an iSCSI node. Each node has one or more ports that connect it to the SAN.
iSCSI ports are end-points of an iSCSI session. Each node can be identified in a number of ways.

IP Address

Each iSCSI node can have an IP address associated with it so that routing and
switching equipment on your network can establish the connection between
the server and storage. This address is just like the IP address that you assign
to your computer to get access to your company’s network or the Internet.

iSCSI Name

A worldwide unique name for identifying the node. iSCSI uses the iSCSI
Qualified Name (IQN) and Extended Unique Identifier (EUI).
By default, ESXi generates unique iSCSI names for your iSCSI initiators, for
example, iqn.1998-01.com.vmware:iscsitestox-68158ef2. Usually, you do not
have to change the default value, but if you do, make sure that the new iSCSI
name you enter is worldwide unique.

ISCSI Alias

A more manageable name for an iSCSI device or port used instead of the iSCSI
name. iSCSI aliases are not unique and are intended to be just a friendly name
to associate with a port.

ISCSi Initiators

To access iSCSI targets, your host uses iSCSI initiators. The initiators transport SCSI requests and responses, encapsulated into the iSCSI protocol, between the host and the iSCSI target.
Your host supports different types of initiators.

Software Initiator

A software iSCSI adapter is a VMware code built into the VMkernel. It allows your host to connect to the iSCSI storage device through standard network adapters. The software iSCSI adapter handles iSCSI processing while communicating with the network adapter. With the software iSCSI adapter, you can use iSCSI technology without purchasing specialized hardware.

This requires VMkernel networking

Hardware Initiator

A hardware iSCSI adapter is a third-party adapter that offloads iSCSI and network processing from your host.
Hardware iSCSI adapters are divided into categories

  • Dependent Hardware iSCSI Adapter. This requires VMkernel networking

This type of adapter can be a card that presents a standard network adapter and iSCSI offload functionality for the same port. The iSCSI offload functionality depends on the host’s network configuration to obtain the IP, MAC, and other parameters used for iSCSI sessions. An example of a dependent adapter is the iSCSI licensed Broadcom 5709 NIC.

  • Independent Hardware iSCSI Adapter. No VMkernel networking needed

Implements its own networking and iSCSI configuration and management interfaces.
An example of an independent hardware iSCSI adapter is a card that either presents only iSCSI offload functionality or iSCSI offload functionality and standard NIC functionality. The iSCSI offload functionality has independent configuration management that assigns the IP, MAC, and other parameters used for the iSCSI sessions. An example of a independent adapter is the QLogic QLA4052 adapter. Hardware adapters may need to be licensed or they will not appear in the vClient or VCLI

CHAP

iSCSI storage systems authenticate an initiator by a name and key pair. ESXi supports the CHAP protocol, which VMware recommends for your SAN implementation. To use CHAP authentication, the ESXi host and the iSCSI storage system must have CHAP enabled and have common credentials.

Because the IP networks that the iSCSI technology uses to connect to remote targets do not protect the data they transport, you must ensure security of the connection. One of the protocols that iSCSI implements is the Challenge Handshake Authentication Protocol (CHAP), which verifies the legitimacy of initiators that access targets on the network.
CHAP uses a three-way handshake algorithm to verify the identity of your host and, if applicable, of the iSCSI target when the host and target establish a connection. The verification is based on a predefined private value, or CHAP secret, that the initiator and target share. ESXi supports CHAP authentication at the adapter level. In this case, all targets receive the same CHAP name and secret from the iSCSI initiator. For software and dependent hardware iSCSI adapters, ESXi also supports per-target CHAP authentication, which allows you to configure different credentials for each target to achieve greater level of security.

ESXi supports the following CHAP authentication methods:

  • One-way CHAP

In one-way CHAP authentication, also called unidirectional, the target authenticates the initiator, but the initiator does not authenticate the target.

  • Mutual CHAP

In mutual CHAP authentication, also called bidirectional, an additional level of security enables the initiator to authenticate the target. VMware supports this method for software and dependent hardware iSCSI adapters only.

For software and dependent hardware iSCSI adapters, you can set one-way CHAP and mutual CHAP for each initiator or at the target level. Independent hardware iSCSI supports CHAP only at the initiator level.

Security Levels

When you set the CHAP parameters, specify a security level for CHAP.

  • Do not use CHAP (Software/Dependent/Independent)
  • Do not use CHAP unless required by target (Software/Dependent)
  • Do not use CHAP unless prohibited by target (Software/Dependent/Independent)
  • Use CHAP (Software/Dependent)

USB Devices on ESXi 4.1 Upwards

We experienced an issue with an Aladdin USB dongle on our VMware 4.1 U1 host where it suddenly stopped working for no reason and we were subsequently unable to re-add it back into the VM it previously resided on. We followed the below steps to try and resolve it which may help someone else experiencing the same problems

  1. First of all check that your device is supported by VMware
  2. Check that the vendor supports the VMware version and server H/W etc and whether they’ve done any in house testing
  3. USB Device Passthrough requires the VM to be on Windows 7
  4. USB Passthrough requires a USB Controller which is a virtual hardware device which you add to the VM
  5. USB Passthrough requires a USB Device which is a virtual hardware device which you add to the VM
  6. VMware ESXi 4 only supports USB1 or USB2 devices. USB3 devices are supported on VMware ESXi 5
  7. Check other USB Ports on your server – Are they turned on in the BIOS
  8. Only one USB controller of each type can be added to a virtual machineThe USB arbitrator (VMware service) can monitor a maximum of 15 USB controllers. If your system includes an additional number of controllers and you connect USB devices to these controllers, the devices are not available to be passed through to a virtual machine

Different types of USB Passthrough

Host Connected USB Passthrough

Limitations

  1. VMware ESXi 4 only supports USB1 or USB2 devices. USB3 devices are supported on VMware ESXi 5
  2. Not possible to use USB3

Client Connected USB Passthrough

The vSphere Client 5 allows Client USB Passthrough if you are connecting into vCenter, not the host directly. Both EHCI and UHCI as well as the normal HCI Controller can be chosen and are compatible with Client Connected Passthrough

Limitations

  1. Passthrough of a USB device using the xHCI controller with Hardware Version 8 requires that the guest O/S has a functioning xHCI driver. without one you cannot use a USB3 device
  2. At the current time of writing there is no xHCI driver
  3. Closing the vSphere client which initiated the USB Connection will disconnect the connection to the USB device

Further Troubleshooting

We checked all the above out and then contacted VMware and we tried the following

  1. Detach the USB device from the host.
  2. Reboot the host.
  3. Once the host comes up, attach the USB back to it.
    Run the lsusb -t command. See below. You should see that your device is recognised

All of this checked out ok

The End Result

VMware Support then said to us that USB1/2 devices are much better supported by VMware ESXi 4 U2 rather than Update 1.

He did actually get this fixed by doing the following

Checking the hostd logs we could see errors similar to those below:

USBGL: Error connecting to arbitrator socket: No such file or directory
USBGL: Giving up on connecting to USB arbitrator

This indicates that the hostd process is trying to access the usbarbitrator process before it is fully initialized and as such fails. Restarting the host and management services will not resolve the issue as you will still end up in the scenario that hostd is trying to access the service before it is ready, this is a known issue and should be resolved in U2.

In the mean time the workaround is to restart the hostd service itself once the usbarbitrator service is up and running, to do this we used

/etc/init.d/usbarbitrator stop

/etc/init.d/usbarbitrator start

/etc/init.d/hostd restart.

Robocopy

robocopy

Robocopy

Robocopy, or “Robust File Copy”, is a command-line directory replication command. It has been available as part of the Windows Resource Kit starting with Windows NT 4.0, and was introduced as a standard feature of Windows Vista, Windows 7 and Windows Server 2008. The command is robocopy.

Capabilities

Robocopy is notable for capabilities above and beyond the built-in Windows copy and xcopy commands, including the following:

  • Ability to tolerate network interruptions and resume copying. (incomplete files are marked with a date stamp of 1980-01-01 and contain a recovery record so Robocopy knows where to continue from)
  • Ability to skip Junction Points which can cause to fail copying in an infinite loop (/XJ)
  • Ability to copy file data and attributes correctly, and to preserve original timestamps, as well as NTFS ACLs, owner information, and audit information using command line switches. /COPYALL or /COPY. Copying folder timestamps is also possible in later versions – /DCOPY:T
  • Ability to assert the Windows NT “backup right”  /B so an administrator may copy an entire directory, including files denied readability to the administrator.
  • Persistence by default, with a programmable number of automatic retries if a file cannot be opened.
  • A “mirror” mode, which keeps trees in sync by optionally deleting files out of the destination that are no longer present in the source.
  • Ability to skip files that already appear in the destination folder with identical size and timestamp.
  • A continuously-updated command-line progress indicator.
  • Ability to copy file and folder names exceeding 256 characters — up to a theoretical limit of 32,000 characters — without errors.
  • Multi-threaded copying. (Windows 7 only)
  • Return code on program termination for batch file usage.

Example Process

  • Decide what is your source folder
  • Decide which is your destination folder
  • The Syntax is then as follows

ROBOCOPY Source_folder Destination_folder [files_to_copy] [options]

Robocopy

Robocopy Source Options

  • /S : Copy Subfolders
  • /E : Copy Subfolders, including Empty Subfolders
  • /COPY:copyflag[s] : What to COPY (default is /COPY:DAT) (copyflags : D=Data, A=Attributes, T=Timestamps S=Security=NTFS ACLs, O=Owner info, U=aUditing info)
  • /SEC : Copy files with Security (equivalent to /COPY:DATS)
  • /DCOPY:T : Copy Directory Timestamps. ##
  • /COPYALL : Copy ALL file info (equivalent to /COPY:DATSOU)
  • /NOCOPY : Copy NO file info (useful with /PURGE)
  • /A : Copy only files with the Archive attribute set
  • /M : like /A, but remove Archive attribute from source files
  • /LEV:n : Only copy the top n LEVels of the source tree
  • /MAXAGE:n : MAXimum file AGE – exclude files older than n days/date
  • /MINAGE:n : MINimum file AGE – exclude files newer than n days/date. (If n < 1900 then n = no of days, else n = YYYYMMDD date)
  • /FFT : Assume FAT File Times (2-second date/time granularity)
  • /256 : Turn off very long path (> 256 characters) support

Copy Options

  • /L : List only – don’t copy, timestamp or delete any files
  • /MOV : MOVe files (delete from source after copying)
  • /MOVE : Move files and dirs (delete from source after copying)
  • /Z : Copy files in restartable mode (survive network glitch)
  • /B : Copy files in Backup mode
  • /ZB : Use restartable mode; if access denied use Backup mode
  • /IPG:n : Inter-Packet Gap (ms), to free bandwidth on slow lines
  • /R:n : Number of Retries on failed copies – default is 1 million
  • /W:n : Wait time between retries – default is 30 seconds
  • /REG : Save /R:n and /W:n in the Registry as default settings
  • /TBD : Wait for sharenames To Be Defined (retry error 67)

Destination options

  • /A+:[RASHCNET] : Set file Attribute(s) on destination files + add
  • /A-:[RASHCNET] : UnSet file Attribute(s) on destination files – remove
  • /FAT : Create destination files using 8.3 FAT file names only
  • /CREATE : CREATE directory tree structure + zero-length files only
  • /DST : Compensate for one-hour DST time differences ##
  • /PURGE : Delete dest files/folders that no longer exist in source
  • /MIR : MIRror a directory tree – equivalent to /PURGE plus all subfolders (/E)

Logging options

  • /L : List only – don’t copy, timestamp or delete any files
  • /NP : No Progress – don’t display % copied
  • /LOG:file : Output status to LOG file (overwrite existing log)
  • /UNILOG:file : Output status to Unicode Log file (overwrite) ##
  • /LOG+:file : Output status to LOG file (append to existing log)
  • /UNILOG+:file : Output status to Unicode Log file (append) ##
  • /TS : Include Source file Time Stamps in the output
  • /FP : Include Full Pathname of files in the output
  • /NS : No Size – don’t log file sizes
  • /NC : No Class – don’t log file classes
  • /NFL : No File List – don’t log file names
  • /NDL : No Directory List – don’t log directory names
  • /TEE : Output to console window, as well as the log file
  • /NJH : No Job Header
  • /NJS : No Job Summary

Repeated Copy Options

/MON:n : MONitor source; run again when more than n changes seen
/MOT:m : MOnitor source; run again in m minutes Time, if changed
/RH:hhmm-hhmm : Run Hours – times when new copies may be started.
/PF : Check run hours on a Per File (not per pass) basis.

Job Options

/JOB:jobname : Take parameters from the named JOB file
/SAVE:jobname : SAVE parameters to the named job file
/QUIT : QUIT after processing command line (to view parameters)
/NOSD : NO Source Directory is specified
/NODD : NO Destination Directory is specified
/IF : Include the following Files

GUI Version (Freeware)

http://tribblesoft.com/easyrobocopy.aspx

Restarting VMware agents

Restarting the Management Agents

Caution: Restarting the management agents may impact any tasks that may be running on the ESX or ESXi host at the time of the restart

To restart the management agents on ESXi:

  1. Connect to the console of your ESXi host.
  2. Press F2 to customize the system.
  3. Login as root
  4. Use the Up/Down arrows to navigate to Restart Management Agents.Note: In ESXi 4.1 and ESXi 5.0, this option is available under Troubleshooting Options.
  5. Press Enter.
  6. Press F11 to restart the services.
  7. When the service has been restarted, press Enter.
  8. Press Esc to log out of the system.

Restarting the Management Network

To restart the management network on ESXi:

  1. Connect to the console of your ESXi host.
  2. Press F2 to customize the system.
  3. Login as root
  4. Use the Up/Down arrows to navigate to Restart Management Network

To restart the management agents on ESX host:

  1. Log in to your ESX host as root from either an SSH session or directly from the console.
  2. Run this command

service mgmt-vmware restart

To restart the management agents on ESXi host:

  1. Log in to your ESX host as root from either an SSH session or directly from the console.
  2. Run this command

/sbin/services.sh restart

To restart the Hostd on the ESXi host

  1. Log in to your ESX host as root from either an SSH session or directly from the console.
  2. Run this command

/etc/init.d/hostd restart

vSphere: Storage vMotion Fails with an Operation Timed Out Error

You may experience these symptoms

  1. Storage vMotion fails
  2. The Storage vMotion operation fails with a timeout between 5-10% or 90-95% complete
  3. On ESX 4.1 you may see the errors:

Hostd Log

v ix: [7196 foundryVM.c:10177]: Error VIX_E_INVALID_ARG in VixVM_CancelOps(): One of the parameters was invalid ‘vm:/vmfs/volumes/4e417019-4a3c4130-ed96-a4badb51cd0a/Mail02/Mail02.vmx’ opID=9BED9F06-000002BE-9d] Failed to unset VM medatadata: FileIO error: Could not find file : /vmfs/volumes/4e417019-4a3c4130-ed96-a4badb51cd0a/Mail02/Mail02-aux.xml.tmp.

vmkernel: 114:03:25:51.489 cpu0:4100)WARNING: FSR: 690: 1313159068180024 S: Maximum switchover time (100 seconds) reached. Failing migration; VM should resume on source.
vmkernel: 114:03:25:51.489 cpu2:10561)WARNING: FSR: 3281: 1313159068180024 D: The migration exceeded the maximum switchover time of 100 second(s). ESX has preemptively failed the migration to allow the VM to continue running on the source host.
vmkernel: 114:03:25:51.489 cpu2:10561)WARNING: Migrate: 296: 1313159068180024 D: Failed: Maximum switchover time for migration exceeded(0xbad0109) @0x41800f61cee2

vCenter Log

[yyyy-mm-dd hh:mm:ss.nnn tttt error ‘App’] [MIGRATE] (migrateidentifier) vMotion failed: vmodl.fault.SystemError
[yyyy-mm-dd hh:mm:ss.nnn tttt verbose ‘App’] [VpxVmomi] Throw vmodl.fault.SystemError with:
(vmodl.fault.SystemError) {
dynamicType = ,
reason = “Source detected that destination failed to resume.”,
msg = “A general system error occurred: Source detected that destination failed to resume.”

Resolution

Note: A virtual machine with many virtual disks might be unable to complete a migration with Storage vMotion. The Storage vMotion process requires time to open, close, and process disks during the final copy phase. Storage vMotion migration of virtual machines with many disks might timeout because of this per-disk overhead.

This timeout occurs when the maximum amount of time for switchover to the destination is exceeded. This may occur if there are a large number of provisioning, migration, or power operations occurring on the same datastore as the Storage vMotion. The virtual machine’s disk files are reopened during this time, so disk performance issues or large numbers of disks may lead to timeouts.

The default timeout is 100 seconds, and can be modified by changing the fsr.maxSwitchoverSeconds option in the virtual machine configuration to a larger value. This change must be done with the virtual machine powered down.

To modify the fsr.maxSwitchoverSeconds option using the vSphere Client:

  1. Open vSphere Client and connect to the ESX/ESXi host or to vCenter Server.
  2. Locate the virtual machine in the inventory.
  3. Power off the virtual machine.
  4. Right-click the virtual machine and click Edit Settings.
  5. Click the Options tab.
  6. Select the Advanced: General section.
  7. Click the Configuration Parameters button.
  8. From the Configuration Parameters window, click Add Row
  9. In the Name field, enter the parameter name: fsr.maxSwitchoverSeconds
  10. In the Value field, enter the new timeout value in seconds (for example:300
  11. Click the OK buttons twice to save the configuration change.
  12. Power on the VM
To modify the fsr.maxSwitchoverSeconds option by editing the .vmx file manually:

The virtual machine’s configuration file can be manually edited to add or modify the option. Add the option on its own line fsr.maxSwitchoverSeconds = 300

Note: To edit a virtual machines configuration file you will need to power off the virtual machine, remove it from Inventory, make the changes to the vmx file, add the virtual machine back to inventory, and power the virtual machine on again.

Why suspend a VM?

Suspending a virtual machine is similar to putting a real computer into the sleep mode. When you suspend a virtual machine, you save its current state (including the state of all applications and processes running in the virtual machine) to a special file on your. When the suspended virtual machine is resumed, it continues operating at the same point the virtual machine was at the time of its suspending.

Suspending your virtual machine may prove efficient if you need to restart your host, but do not want to:

  1. Quit the applications running in the virtual machine
  2. Spend much time on shutting the guest operating system down and then starting it again

What is Pluggable Storage Architecture (PSA) and Native Multipathing (NMP)?

Pluggable Storage Architecture

To manage storage multipathing, ESX/ESXi uses a special VMkernel layer which sits in the SCSI middle layer of the VMKernel I/O Stack, Pluggable Storage Architecture (PSA). The PSA is an open modular framework that coordinates the simultaneous operation of multiple multipathing plugins (MPPs). PSA is a collection of VMkernel APIs that allow third party hardware vendors to insert code directly into the ESX storage I/O path. This allows 3rd party software developers to design their own load balancing techniques and failover mechanisms for particular storage array. The PSA coordinates the operation of the NMP and any additional 3rd party MPP

What does PSA do?

  • Load and unload multipathing plug-ins
  • Uses predefined claim rules to assign each device to an MPP (One claim rule per device)
  • Handle physical path discovery and removal (through scanning)
  • Route I/O requests for a specific logical device to an appropriate multipathing plug-in
  • Handle I/O queuing to the physical storage HBAs and to the logical devices
  • Implement logical device bandwidth sharing between virtual machines
  • Provide logical device and physical path I/O statistics

Native Multipathing Plugin

The VMkernel multipathing plugin that ESX/ESXi provides, by default, is the VMware Native Multipathing Plugin (NMP). The NMP is an extensible module that manages subplugins. There are two types of NMP subplugins: Storage Array Type Plugins (SATPs), and Path Selection Plugins (PSPs). SATPs and PSPs can be built-in and provided by VMware, or can be provided by a third party.If more multipathing functionality is required, a third party can also provide an MPP to run in addition to, or as a replacement for, the default NMP.

VMware provides a generic Multipathing Plugin (MPP) called Native Multipathing Plugin (NMP) which supports all storage arrays on the Compatibility list

A single MPP can support multiple SATPs and PSPs. If a storage vendor has not supplied an MPP, SATP or PSP, VMware will use its own assigned by default

PSA

What does NMP do?

  1. Manages physical path claiming and unclaiming.
  2. Registers and de-registers logical devices
  3. Associates physical paths with logical devices.
  4. Processes I/O requests to logical devices
  5. Selects an optimal physical path for the request (load balance)
  6. Performs actions necessary to handle failures and request retries.
  7. Supports management tasks such as abort or reset of logical devices.

How it works

The ESX kernel (VMkernel) goes down through three layers when communicating with storage:

  1. In the top layer, VMware native NMP or third-party MPP software decides which SATP to use, or whether to use the native interface.
  2. The SATP layer includes native generic path selection (active/active, active/passive), standard ALUA, as well as allowing third-party plugins (SATP) to override its behavior. The SATP monitors these paths, reports changes, and initiates fail-over on the array as needed.
  3. At the PSP layer, software decides which physical channel to use for I/O requests.

In more detail

  • NMP assigns a SATP to every physical path to the logical device (datastore)
  • NMP associates paths to logical devices
  • NMP decides which PSP to use with the logical device.
  • The VM tells NMP an I/O is ready to send.
  • I/O is issued.
  • PSP is selected. Load-balances if applicable.
  • I/O is sent to  device.
  • Success:Device driver (Storage array) indicates I/O is complete. Failure: NMP calls appropriate SATP.
  • Success: NMP tells PSP I/O is complete. Failure: SATP interprets error codes and fails over to inactive paths.
  • Failure: PSP is called again to select which path to use for I/O excluding the failed path.
  • PSP checks every 300 seconds if the path is active again. SATP is responsible for doing the failover.

PSA Plugins

There are three types of PSA plugins for vSphere 4:

  1. Storage Array Type Plug-In (SATP)
  2. Path Selection Plug-in (PSP)
  3. A complete third-party multipathing software stack (MPP)

As is the case with VAAI, VMware includes a number of third-party plug-ins in the ESXi install. Users can simply activate many of these according to their needs, though some require additional fees and licensing.

SATP Plugins

SATPs allow load balancing across multiple paths, intelligent path selection, and over troubled conditions such as “chatter”, when passed rapidly fail back and forth between controllers.

The SATP has critical tasks to perform in the PSA stack:

  1. Decide which method of communication to use with the storage (PSA or native)
  2. Monitor the health of the physical I/O channels or paths
  3. Report any changes in the state of the paths up the stack
  4. Perform actions required to fail over storage between controllers on the array

VMware vSphere includes a variety of generic SATP plugins for storage arrays.

  • VMW_SATP_LOCAL – Local SATP for direct-attached devices
  • VMW_SATP_DEFAULT_AA – Generic for active/active arrays
  • VMW_SATP_DEFAULT_AP – Generic for active/passive arrays
  • VMW_SATP_ALUA – Asymmetric Logical Unit Access-compliant arrays
  • VMW_SATP_LSI – LSI/NetApp arrays from Dell, HDS, IBM, Oracle, SGI
  • VMW_SATP_SVC – IBM SVC-based systems (SVC, V7000, Actifio)
  • VVMW_SATP_SYMM – EMC Symmetrix DMX-3/DMX-4/VMAX, Invista
  • MW_SATP_CX – EMC/Dell CLARiiON  and Celerra (also VMW_SATP_ALUA_CX)
  • VMW_SATP_INV – EMC Invista and VPLEX
  • VMW_SATP_EQL – Dell EqualLogic systems

You can see which SATP plug-ins are available using the following esxcli command:

esxcli storage nmp satp list

PSP Plugins

In contrast to the diversity of VAAI and SATP plug-ins, the universe of path selection plug-ins is fairly small. Most storage arrays are supported with either Most Recently Used (MRU) or Fixed path selection approaches. Many also support Round Robin (RR) path selection. The only vendor with a specific PSP that is not also part of a full MPP (like EMC PowerPath or HDS HDLM) is Dell, which offers a special routed path selection plug-in for the EqualLogic iSCSI arrays.

  • VMW_PSP_MRU – Most-Recently Used (MRU) – Supports hundreds of storage arrays
  • VMW_PSP_FIXED – Fixed – Supports hundreds of storage arrays
  • VMW_PSP_RR – Round-Robin – Supports dozens of storage arrays
  • DELL_PSP_EQL_ROUTED – Dell EqualLogic iSCSI arrays

You can view the Path Polices in vCenter

  • Click on the Host
  • Click Configuration
  • Click Storage
  • Click on a Datastore and click Properties
  • Click Manage paths and you should see the below

paths

Array Types

  • Active /Active arrays use Fixed PSP Plugins
  • Active/Passive arrays use Most Recently Used PSP Plugins

ESXCLI Commands

  • esxcli storage nmp psp list

Capture1

  • esxcli nmp satp list

Capture2

  • esxcli storage core claimrule list

Capture3

  • esxcli storage nmp device list

Capture4