Archive for Objective 6 Advanced Troubleshooting

Utilise Direct Console User Interface (DCUI) and ESXi Shell to troubleshoot, configure and monitor and envirnoment

images

The DCUI

The Direct Console User Interface (DCUI) allows you to interact with the host locally using text-based menus. You can use the Direct Console User Interface to enable local and remote access to the ESXi Shell.

Access

  • SSH via Putty into your host
  • Type DCUI at the command line
  • You should see the screen below
  • Press F2
  • Login using your account

dcui0

The ESXi Shell includes a fully supported command list. To access the ESXi Shell from the DCUI, you must perform the following tasks: (Note that this may be enabled from the vSphere Client Configuration –> Security Profile tab)

  1. From the physical console of your ESXi host, press the F2 button and authenticate
  2. Select Troubleshooting Options and press Enter
  3. Select Disable ESXi Shell and press Enter to Enable ESXi Shell
  4. You may optionally adjust the timeout
  5. Press Esc to return to the main window
  6. At the main console screen, press Alt + F1 to open the ESXi Shell
  7. From within the shell, you may run esxcli commands, esxtop, and access the local filesystem.
  8. (Note: To return to the DCUI, press Alt + F2)

What can you do?

  • Configure Root password
  • Configure Lockdown mode
  • Configure Management network
  • Restart Management Network
  • Test or disable Management
  • Configure Keyboard
  • View Support Information
  • View System Logs
  • Restart Management Network
  • Reset System Configuration
  • Remove Custom Extensions
  • Shutdown or Restart /Reboot the ESXi Server

DCUI2

Determine the root cause of a vSphere management or connectivity issue

images

Points to think about

  • Check Network Connectivity
  • Check Storage Connectivity
  • Check vCenter Connectivity
  • Check Host Connectivity
  • Check VM Connectivity
  • Check Host Logs
  • Check vCenter Logs
  • Check Monitoring Systems
  • Check Physical Switches
  • Check Cables
  • Check Virtual Switches
  • Check FC Switches
  • Check SAN Storage
  • Check vCenter DB Connectivity
  • Check Router Connectivity
  • Check Power Issues
  • Check KB Articles for Error IDs

Troubleshoot ESXi host management and connectivity issues

images

Troubleshooting

  • Verify that the network adapter and server hardware are supported. For more information, see Verifying ESX/ESXi host hardware (System, Storage, and I/O devices) are supported (1003916).
  • Verify that the network link is up. For more information, see Verifying a network link (1003724).
  • Verify that proper VLAN IDs exist on the portgroup. For more information, see Configuring a VLAN on a portgroup (1003825).
  • If you are using NIC teaming on the virtual switch, verify that the physical switch ports are configured consistently for each teamed network adapter and that the proper load balancing policy is configured on the virtual switch. VMware recommends you to use the default Route based on the originating virtual port ID load balancing policy. If link aggregation on the physical switch is configured, use the Route based on ip hash load balancing policy. For more information, see NIC teaming in ESX/ESXi (1004088) and ESX/ESXi host requirements for link aggregation (1001938).
  • Verify that the speed and duplex of the network links are consistent. For more information, see Configuring the speed and duplex of an ESX/ESXi Server network adapter (1004089).
  • Verify the ESX host networking configuration. For more information, see Verifying ESX Server host networking configuration on the service console (1003796).
  • Verify that port security is not configured on the physical switch ports. For more information, see Loss of network connectivity when port security is configured on the physical switch (1002811).
  • Verify that portfast (or equivalent) is enabled on all of the ESX host’s physical switch ports. For more information, see STP may cause temporary loss of network connectivity when a failover or failback event occurs (1003804).
  • Verify the integrity of the physical network adapter. For more information, see Verifying the integrity of the physical network adapter (1003686).
  • Verify that no duplicate IP addresses exist on the network. For more information, see Warning for Duplicate IP Address for VMware VMotion Port Group (10165) or Duplicate IP address detected (1020647).
  • Verify that all the NICs participating as uplinks on the vSS and VDS are observing all the network information. For more information, see Observed IP range does not show network in ESX or ESXi (1006744). Until the time the issue of observed IP range is not resolved on external physical network, you can set the problematic NIC in unused mode and then verify the networking functionality again.

Useful Link

vSphere_Troubleshooting

Troubleshoot the ESXi firewall

padlock

ESXi Firewall Log Location

Firewall changes are located in this location/var/log/vobd.log

ESXCLI Command Set

esxirules2

  • esxcli network firewall

ESCLI_Firewall

  • esxcli network firewall ruleset list

ESXCLI_Firewall2

  • esxcli network firewall get

ESXCLI_Firewall4

  • esxcli network firewall set –enabled true

ESXCLI_Firewall5

Firewall Ports to check

The following ports are enabled by default. If your port is not listed, you may need to enable a pre-defined rule or setup a custom firewall rule

Firewall_Ports

Troubleshooting NFS Mounting and Permission issues

Picture1

What is NFS?

ESXi hosts can access a designated NFS volume located on a NAS (Network Attached Storage) server, can mount the volume, and can use it for its storage needs. You can use NFS volumes to store and boot virtual machines in the same way that you use VMFS datastores.

NAS stores virtual machine files on remote file servers that are accessed over a standard TCP/IP network. The NFS client built into the ESXi system uses NFS version 3 to communicate with NAS/NFS servers. For network connectivity, the host requires a standard network adapter.

Mounting

To use NFS as a shared repository, you create a directory on the NFS server and then mount the directory as a datastore on all hosts. If you use the datastore for ISO images, you can connect the virtual machineʹs CD‐ROM device to an ISO file on the datastore and install a guest operating system from the ISO file.

ESXCLI Command Set

NFS

Troubleshooting

  • Check the MTU size configuration on the port group which is designated as the NFS VMkernel port group. If it is set to anything other than 1500 or 9000, test the connectivity using the vmkping command

NFS3

  • See table below for command explanation

NFS4

  • Verify connectivity to the NFS server and ensure that it is accessible through the firewalls
  • Use netcat (nc) to see if you can reach the NFS server nfsd TCP/UDP port (default 2049) on the storage array from the host:

NFS2

  • Verify that the ESX host can vmkping the NFS server
  • Verify that the virtual switch being used for storage is configured correctly
  • Ensure that there are enough available ports on the virtual switch.
  • Verify that the storage array is listed in the Hardware Compatibility Guide
  • Verify that the physical hardware functions correctly.
  • If this is a Windows server, verify that it is correctly configured for NFS.
  • Verify that the permissions of the NFS server have not been set to read-only for this ESX host.
  • Verify that the NFS share was not mounted with the read-only box selected.
  • Ensure the access on the NFS server is set to Anonymous user, Root Access (no_root_squash), and Read/Write
  • If you cannot connect to an NFS Share there may be a misconfiguration on the Switch port. In this case, try using a different vmnic (or move NICs to Unused/Standby in the NIC teaming tab of the vSwitch or Portgroup properties).
  • The name of the NAS server is not resolved from the host side or vice versa. In this case, ensure that the DNS server and host-side entries are set properly.

Troubleshoot vCenter Server service and database connection issues

Lightbulb

Troubleshooting Steps

  • Verify that the VMware VirtualCenter Server service cannot be restarted.
  • Verify that the configuration of the ODBC Data Source (DSN) used for connection to the database for vCenter Server is correct. For more information, see vCenter Server installation fails with ODBC and DSN errors (1003928).
  • Verify that ports 902, 80, and 443 are not being used by any other application. If another application, such as Microsoft Internet Information Server (IIS) (also known as Web Server (IIS) on Windows 2008 Enterprise) or the World Wide Web Publishing Services (W3SVC) or the Citrix Licensing Support service is utilizing any of the ports, vCenter Server cannot start. For more information, see Port already in use when installing vCenter Server (4824652).
  • Verify the health of the database server that is being used for vCenter Server. If the hard drives are out of space, the database transaction logs are full, or if the database is heavily fragmented, vCenter Server may not start. For more information, see Investigating the health of a vCenter Server database (1003979).
  • Verify the VMware VirtualCenter Service is running with the proper credentials. For more information, see After installing vCenter Server, the VMware VirtualCenter Server service fails to start (1004280).
  • Verify that critical folders exist on the vCenter Server host. For more information, see  Missing folders on a vCenter Server prevent VirtualCenter Server service from starting (1005882).
  • Verify that no hardware or software changes have been made to the vCenter server that may have caused the failure. If you have recently made any changes to the vCenter server, undo these changes temporarily for testing purposes.
  • Before launching vCenter Server, ensure that the VMware VCMSDS service is running.
  • Check the vCenter logs at C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\Logs
  • Verify the relevant database services are running. SQL Services for example

Use ESXCLI to troubleshoot iSCSI related issues

Picture1

Troubleshooting iSCSI

ESXi systems include iSCSI technology to access remote storage using an IP network. You can use the vSphere Client, commands in the esxcli iscsi namespace, or the vicfg-iscsi command to configure both hardware and software iSCSI storage for your ESXi system.

Command Chart

iSCSI

VMware Link (Pages 53 onwards)

vSphere Command-Line Interface Concepts and Examples

Use esxcli to troubleshoot VMkernel storage module configurations

Lightbulb

Storage Modules

The VMkernel is a high-performance operating system that runs directly on the ESXi host. The VMkernel manages most of the physical resources on the hardware, including memory, physical processors, storage, and networking controllers

To manage storage, VMkernel has a storage subsystem that supports several Host Bus Adapters (HBAs) including parallel SCSI, SAS, Fibre Channel, FCoE, and iSCSI. These HBAs connect a wide variety of active-active, active-passive, and ALUA storage arrays that are certified for use with the VMkernel.

The primary file system that the VMkernel uses is the VMware Virtual Machine File System (VMFS). VMFS is a cluster file system designed and optimized to support large files such as virtual disks and swap files. The VMkernel also supports the storage of virtual disks on NFS file systems.

The storage I/O path provides virtual machines with access to storage devices through device emulation. This device emulation allows a virtual machine to access files on a VMFS or NFS file system as if they were SCSI devices. The VMkernel provides storage virtualization functions such as the scheduling of I/O requests from multiple virtual machines and multipathing.

In addition, VMkernel offers several Storage APIs that enable storage partners to integrate and optimize their products for vSphere.

The following graphic illustrates the basics of the VMkernel core, with special attention to the storage stack. Storage‐related modules reside between the logical device I/O scheduler and the adapter I/O scheduler layerThe esxcli system module namespace allows you to view load and enable VMKernel modules.

Capture

To get an overview use this command:

  • esxcli system module

Module

  • esxcli system module list

module2

  •  esxcli system module parameters list –module ModuleName

Capture

 

Identify Logs used to troubleshoot storage issues

images

Logs

Located in var/log

Log

The logs you will want to look at for storage issues are likely to be

  • /var/log/vmkeventd.log

VMkernel deamon related log

  • /var/log/vmkernel.log

Generic NMP messages, iSCSI and fibre channel messages, driver, device discovery, storage and networking devices

  • /var/log/vpxa.log

vCenter Server vpxa agent logs, including communication with vCenter Server and the Host Management hostd agent

  • /var/log/hostd.log

Host management service logs, including virtual  machine and host Task and Events, communication with the vSphere Client  and vCenter Server vpxa agent, and SDK connections

  • /var/log/vmkwarning.log

Generic storage messages, like disconnects. A summary of Warning and Alert log messages excerpted from the VMkernel logs.

  • /var/log/storagerm

If SIOC is enabled then all the logs regarding that will be here

  • vCenter logs

Analyse troubleshooting data to see if the problem lies in the Virtual or the Physical layer

images

Troubleshooting

Troubleshooting can often be frustrating and challenging, and knowing where to look and what to do is the key to quickly finding and resolving problems. You shouldn’t just look through log files when you are experiencing known problems, however. Often, many problems might not be that obvious, and the log files are a good place to look for signs of them happening. You should keep a list of all the log files handy so that you can quickly access them if needed and so not have to waste time when a problem is happening trying to remember their path and filenames. You might not know how to resolve or troubleshoot every problem you encounter, so be sure to rely on the resources available to you, including documentation, support forums, knowledge base, and VMware’s technical support. Being properly prepared to handle problems when they occur is one of the best troubleshooting skills that you can have.

What you can do

  • Check Monitoring Systems if you have them. SCOM, Nagios etc. Some companies have real-time screens showing monitoring
  • Check with your Network Team as they will more than likely be alerted to physical problems faster than you
  • Can you isolate the problem to a VM, Host, Switch or router or is the issue affecting the whole network
  • Ensure that the Port Group name(s) associated with the virtual machine’s network adapter(s) exists in your vSwitch or Virtual Distributed Switch and is/are spelt correctly.
  • Check any warning Triangles or exclamation marks on the standard or distributed switches
  • Verify the virtual network adapter is present and connected for all VMkernel ports
  • Verify that the networking within the virtual machine’s guest operating system is correct
  • Verify that the vSwitch has enough ports for the virtual machine
  • Ensure the physical switch ports are configured as port-channel
  • Shut down all but one of the physical ports the NICs are connected to, and toggle this between all the ports by keeping only one port connected at a time. Take note of the port/NIC combination where the virtual machines lose network connectivity.
  • Check Logs