Tag Archive for NIC

Understand the NIC Teaming failover types and related physical network settings

etherchannel

What is Network Teaming and Load Balancing?

Network Teaming is the process of combining NICs to provide bandwidth and failover to a switch. Load balancing and failover policies allow you to determine how network traffic is distributed between network adapters and how to re-route traffic in the event of adapter failure.

You can set NIC Teaming on the following objects

  • Standard Switch
  • Distributed Switch
  • Standard Port Group
  • Distributed Port Group

Settings Overview for a Standard Switch

portgroup_network_policies

Procedure for a Standard Switch and a Standard Port Group

  • Log in to the vSphere Client and select the server from the inventory panel.
    The hardware configuration page for this server appears.
  • Click the Configuration tab and click Networking.
  • Select a Standard Switch or Standard Port Group and click Edit.
  • Click the Ports tab.
  • To edit the Failover and Load Balancing values, select the standard switch item and click Properties.
  • Click the NIC Teaming tab
  • You can override the failover order at the port group level. By default, new adapters are active for all policies. New adapters carry traffic for the standard switch and its port group unless you specify otherwise.

Specify how to choose an uplink.

  • Route based on the originating virtual port
  1. Choose an uplink based on the virtual port where the traffic entered the distributed switch. This is the default configuration and the one most commonly deployed.
  2. When you use this setting, traffic from a given virtual Ethernet adapter is consistently sent to the same physical adapter unless there is a failover to another adapter in the NIC team.
  3. Replies are received on the same physical adapter as the physical switch learns the port association.
  4. This setting provides an even distribution of traffic if the number of virtual Ethernet adapters is greater than the number of physical adapters.
  5. A given virtual machine cannot use more than one physical Ethernet adapter at any given time unless it has multiple virtual adapters.
  6. This setting places slightly less load on the ESX Server host than the MAC hash setting.
  • Route based on ip hash
  1. Choose an uplink based on a hash of the source and destination IP addresses of each packet. For non-IP packets, whatever is at those offsets is used to compute the hash.
  2. Evenness of traffic distribution depends on the number of TCP/IP sessions to unique destinations. There is no benefit for bulk transfer between a single pair of hosts.
  3. You can use link aggregation — grouping multiple physical adapters to create a fast network pipe for a single virtual adapter in a virtual machine.
  4. When you configure the system to use link aggregation, packet reflections are prevented because aggregated ports do not retransmit broadcast or multicast traffic.
  5. The physical switch sees the client MAC address on multiple ports. There is no way to predict which physical Ethernet adapter will receive inbound traffic.
  6. All adapters in the NIC team must be attached to the same physical switch or an appropriate set of stacked physical switches. (Contact your switch vendor to find out whether 802.3ad teaming is supported across multiple stacked chassis.) That switch or set of stacked switches must be 802.3ad-compliant and configured to use that link-aggregation standard in static mode (that is, with no LACP).
  7. All adapters must be active. You should make the setting on the virtual switch and ensure that it is inherited by all port groups within that virtual switch
  • Route based on source MAC hash
  1. Choose an uplink based on a hash of the source Ethernet MAC Address
  2. When you use this setting, traffic from a given virtual Ethernet adapter is consistently sent to the same physical adapter unless there is a failover to another adapter in the NIC team.
  3. Replies are received on the same physical adapter as the physical switch learns the port association.
  4. This setting provides an even distribution of traffic if the number of virtual Ethernet adapters is greater than the number of physical adapters.
  5. A given virtual machine cannot use more than one physical Ethernet adapter at any given time unless it uses multiple source MAC addresses for traffic it sends.
  • Use explicit failover order

Always use the highest order uplink from the list of Active adapters which passes failover detection criteria.

NOTE IP-based Hash teaming requires that the physical switch be configured with etherchannel. For all other options, etherchannel should be disabled.

Settings Overview for a Distributed Switch

Capture

Procedure for a Distributed Switch and a Distributed Port Group

  • Log in to the vSphere Client and select the server from the inventory panel. The hardware configuration page for this server appears.
  • Click the Configuration tab and click Networking.
  • Select a Distributed Switch or Distributed Port Group and click Edit.
  • Click the Ports tab.
  • To edit the Failover and Load Balancing values, select the standard switch item and click Properties.
  • Click the NIC Teaming tab
  • You can override the failover order at the port group level. By default, new adapters are active for all policies. New adapters carry traffic for the standard switch and its port group unless you specify otherwise.

Specify how to choose an uplink.

  • Route based on the originating virtual port
  1. Choose an uplink based on the virtual port where the traffic entered the distributed switch. This is the default configuration and the one most commonly deployed.
  2. When you use this setting, traffic from a given virtual Ethernet adapter is consistently sent to the same physical adapter unless there is a failover to another adapter in the NIC team.
  3. Replies are received on the same physical adapter as the physical switch learns the port association.
  4. This setting provides an even distribution of traffic if the number of virtual Ethernet adapters is greater than the number of physical adapters.
  5. A given virtual machine cannot use more than one physical Ethernet adapter at any given time unless it has multiple virtual adapters.
  6. This setting places slightly less load on the ESX Server host than the MAC hash setting.
  • Route based on ip hash
  1. Choose an uplink based on a hash of the source and destination IP addresses of each packet. For non-IP packets, whatever is at those offsets is used to compute the hash.
  2. Evenness of traffic distribution depends on the number of TCP/IP sessions to unique destinations. There is no benefit for bulk transfer between a single pair of hosts.
  3. You can use link aggregation — grouping multiple physical adapters to create a fast network pipe for a single virtual adapter in a virtual machine.
  4. When you configure the system to use link aggregation, packet reflections are prevented because aggregated ports do not retransmit broadcast or multicast traffic.
  5. The physical switch sees the client MAC address on multiple ports. There is no way to predict which physical Ethernet adapter will receive inbound traffic.
  6. All adapters in the NIC team must be attached to the same physical switch or an appropriate set of stacked physical switches. (Contact your switch vendor to find out whether 802.3ad teaming is supported across multiple stacked chassis.) That switch or set of stacked switches must be 802.3ad-compliant and configured to use that link-aggregation standard in static mode (that is, with no LACP).
  7. All adapters must be active. You should make the setting on the virtual switch and ensure that it is inherited by all port groups within that virtual switch
  • Route based on source MAC hash
  1. Choose an uplink based on a hash of the source Ethernet MAC Address
  2. When you use this setting, traffic from a given virtual Ethernet adapter is consistently sent to the same physical adapter unless there is a failover to another adapter in the NIC team.
  3. Replies are received on the same physical adapter as the physical switch learns the port association.
  4. This setting provides an even distribution of traffic if the number of virtual Ethernet adapters is greater than the number of physical adapters.
  5. A given virtual machine cannot use more than one physical Ethernet adapter at any given time unless it uses multiple source MAC addresses for traffic it sends.
  • Route based on physical NIC Load
  1. Load based teaming uses the same inital port assignment as the “Originating Port ID” policy. The first virtual NIC is affiliated to the first physical NIC, the second virtual NIC to the second physical NIC etc. After initial placement, load based teaming examines both ingress and egress load of each uplink in the team. Load based teaming then adjusts the virtual NIC to Physical mapping if an uplink is congested. the NIC team load balancer flagas a congestion condition of an uplink experiences a mean use of 75% or more over a 30 second period
  • Use explicit failover order

Always use the highest order uplink from the list of Active adapters which passes failover detection criteria.

NOTE: IP-based Hash teaming requires that the physical switch be configured with etherchannel. For all other options, etherchannel should be disabled.

Further Information on Route based on the originating Port ID

it’s important to understand the basic behavior in this configuration. Because the vSwitch is set to “Route based on originating virtual port ID”, network traffic will be placed onto a specific uplink and won’t use any other uplinks until that uplink fails. Every VM and every VMkernel port gets its own virtual port ID. These virtual port IDs are visible using esxtop (launch esxtop, then press “n” to switch to network statistics)

  • Each VM will only use a single network uplink, regardless of how many different connections that particular VM may be handling. All traffic to and from that VM will be place on that single uplink, regardless of how many uplinks are configured on the vSwitch.
  • Each VMkernel NIC will only use a single network uplink. This is true both for VMotion as well as IP-based storage traffic, and is true regardless of how many uplinks are configured on the vSwitch.
  • Even when the traffic patterns are such that using multiple uplinks would be helpful—for example, when a VM is copying data to or from two different network locations at the same time, or when a VMkernel NIC is accessing two different iSCSI targets—only a single uplink will be utilized.
  • It’s unclear at what point VMware ESX creates the link between the virtual port ID and the uplink. In tests, rebooting the guest OS and power cycling the VM results in  having it come back on the same uplink again. Only a VMotion off the server and back again caused a VM to move to a new uplink.
  • There is no control over the placement of VMs onto uplinks without the use of multiple port groups. (Keep in mind that you can have multiple port groups corresponding to a single VLAN.)
  • Because multiple VMs could be assigned to the same uplink, and because users have no control over the placement of VMs onto uplinks, it’s quite possible for multiple VMs to be assigned to the same uplink, or for VMs to distributed unevenly across the uplinks. Organizations that will have multiple “network heavy hitters” on the same host and vSwitch may run into situations where those systems are all sharing the same network bandwidth.

These considerations are not significant, but they are not insignificant, either. The workaround for the potential problems outlined above involves using multiple port groups with different NIC failover orders so as to have more fine-grained control over the placement of VMs on uplinks. In larger environments, however, this quickly becomes unwieldy. The final release of the Distributed vSwitch will help ease configuration management in this sort of situation.

Useful Blogs on VMware NIC Teaming. (Thanks to Scott Lowe)

http://blog.scottlowe.org/2008/10/08/more-on-vmware-esx-nic-utilization/
http://blog.scottlowe.org/2008/07/16/understanding-nic-utilization-in-vmware-esx/

VMware NIC Teaming Settings

Benefits of NIC teaming include load balancing and failover: However, those policies will affect outbound traffic only. In order to control inbound traffic, you have to get the physical switches involved.

  • Load balancing: Load balancing allows you to spread network traffic from virtual machines on a virtual switch across two or more physical Ethernet adapters, providing higher throughput. NIC teaming offers different options for load balancing, including route based load balancing on the originating virtual switch port ID, on the source MAC hash, or on the IP hash.
  • Failover: You can specify either Link status or Beacon Probing to be used for failover detection. Link Status relies solely on the link status of the network adapter. Failures such as cable pulls and physical switch power failures are detected, but configuration errors are not. The Beacon Probing method sends out beacon probes to detect upstream network connection failures. This method detects many of the failure types not detected by link status alone. By default, NIC teaming applies a fail-back policy, whereby physical Ethernet adapters are returned to active duty immediately when they recover, displacing standby adapters

NIC Teaming Policies

Network Teaming Setting

Description

Route based on the originating virtual port Choose an uplink based on the virtual port where the traffic entered the virtual switch.
Route based on IP hash Choose an uplink based on a hash of the source and destination IP addresses of each packet. For non-IP packets, whatever is at those offsets is used to compute the hash.Used for Etherchannel when set on the switch
Route based on source MAC hash Choose an uplink based on a hash of the source Ethernet.
Route based on physical NIC load Choose an uplink based on the current loads of physical NICs.
Use explicit failover order Always use the highest order uplink from the list of Active adapters which passes failover detection criteria

There are two ways of handling NIC teaming in VMware ESX:

  1. Without any physical switch configuration
  2. With physical switch configuration (EtherChannel, static LACP/802.3ad, or its equivalent)

There is a corresponding vSwitch configuration that matches each of these types of NIC teaming:

  1. For NIC teaming without physical switch configuration, the vSwitch must be set to either “Route based on originating virtual port ID”, “Route based on source MAC hash”, or “Use explicit failover order”
  2. For NIC teaming with physical switch configuration—EtherChannel, static LACP/802.3ad, or its equivalent—the vSwitch must be set to “Route based on ip hash”

Considerations for NIC teaming without physical switch configuration

Something to be aware of when setting up NIC Teaming without physical switch configuration is that you don’t get true load balancing as you do with Etherchannel. The following applies to the NIC Teaming Settings

Route based on the originating virtual switch port ID

Choose an uplink based on the virtual port where the traffic entered the virtual switch. This is the default configuration and the one most commonly deployed.
When you use this setting, traffic from a given virtual Ethernet adapter is consistently sent to the same physical adapter unless there is a failover to another adapter in the NIC team.
Replies are received on the same physical adapter as the physical switch learns the port association.

* This setting provides an even distribution of traffic if the number of virtual Ethernet adapters is greater than the number of physical adapters.

Route based on source MAC hash

Choose an uplink based on a hash of the source Ethernet MAC address.
When you use this setting, traffic from a given virtual Ethernet adapter is consistently sent to the same physical adapter unless there is a failover to another adapter in the NIC team.
Replies are received on the same physical adapter as the physical switch learns the port association.

* This setting provides an even distribution of traffic if the number of virtual Ethernet adapters is greater than the number of physical adapters.

Choosing a network adapter for your virtual machine

When creating a Virtual machine, VMware will normally offer you several choices of network adaptor depending on what O/S you select.

Network Adaptor Types

  • Vlance – An emulated version of the AMD 79C970 PCnet32- LANCE NIC, an older 10Mbps NIC with drivers available in most 32-bit guest operating systems except Windows Vista and later. A virtual machine configured with this network adapter can use its network immediately.
  • VMXNET – The VMXNET virtual network adapter has no physical counterpart. VMXNET is optimized for performance in a virtual machine. Because operating system vendors do not provide built-in drivers for this card, you must install VMware Tools to have a driver for the VMXNET network adapter available.
  • Flexible – The Flexible network adapter identifies itself as a Vlance adapter when a virtual machine boots, but initializes itself and functions as either a Vlance or a VMXNET adapter, depending on which driver initializes it. With VMware Tools installed, the VMXNET driver changes the Vlance adapter to the higher performance VMXNET adapter.
  • E1000— An emulated version of the Intel 82545EM Gigabit Ethernet NIC. A driver for this NIC is not included with all guest operating systems. Typically Linux versions 2.4.19 and later, Windows XP Professional x64 Edition and later, and Windows Server 2003 (32-bit) and later include the E1000 driver.Note: E1000 does not support jumbo frames prior to ESX/ESXi 4.1.
  • E1000e – This feature would emulate a newer model of Intel gigabit NIC (number 82574) in the virtual hardware. This would be known as the “e1000e” vNIC. e1000e would be available only on hardware version 8 (and newer) VMs in vSphere5. It would be the default vNIC for Windows 8 and newer (Windows) guest OSes. For Linux guests, e1000e would not be available from the UI (e1000, flexible vmxnet, enhanced vmxnet, and vmxnet3 would be available for Linux).
  • VMXNET 2 (Enhanced) – The VMXNET 2 adapter is based on the VMXNET adapter but provides some high-performance features commonly used on modern networks, such as jumbo frames and hardware offloads. This virtual network adapter is available only for some guest operating systems on ESX/ESXi 3.5 and late
  • VMXNET 3– The VMXNET 3 adapter is the next generation of a paravirtualized NIC designed for performance, and is not related to VMXNET or VMXNET 2. It offers all the features available in VMXNET 2, and adds several new features like multiqueue support (also known as Receive Side Scaling in Windows), IPv6 offloads, and MSI/MSI-X interrupt delivery.VMXNET 3 is supported only for virtual machines version 7 and later, with a limited set of guest operating systems:
  • 32- and 64-bit versions of Microsoft Windows XP,7, 2003, 2003 R2, 2008, and 2008 R2
  • 32- and 64-bit versions of Red Hat Enterprise Linux 5.0 and later
  • 32- and 64-bit versions of SUSE Linux Enterprise Server 10 and later
  • 32- and 64-bit versions of Asianux 3 and later
  • 32- and 64-bit versions of Debian 4
  • 32- and 64-bit versions of Ubuntu 7.04 and later
  • 32- and 64-bit versions of Sun Solaris 10 U4 and later

New Features

  • TSO, Jumbo Frames, TCP/IP Checksum Offload

You can enable Jumbo frames on a vSphere Distributed Switch or Standard switch by changing the maximum MTU. TSO (TCP Segmentation Offload is enabled on the VMKernel interface by default but must be enabled at the VM level. Just change the nic to VMXNet 3 to take advantage of this feature

  • MSI/MSI‐X support (subject to guest operating system
    kernel support)

A Message Signaled Interrupt is a write from the device to a special address which causes an interrupt to be received by the CPU. The MSI capability was first specified in PCI 2.2 and was later enhanced in PCI 3.0 to allow each interrupt to be masked individually. The MSI-X capability was also introduced with PCI 3.0.  It supports more interrupts – 26 per device than MSI and allows interrupts to be independently configured.

MSI, Message Signaled Interrupts, uses in-band pci memory space message to raise interrupt, instead of conventional out-band pci INTx pin. MSI-X is an extension to MSI, for supporting more vectors. MSI can support at most 32 vectors while MSI-X can support up to 2048. Using msi can lower interrupt latency, by giving every kind of interrupt its own vector/handler. When kernel see the message, it will directly vector to the interrupt service routine associated with the address/data. The address/data (vector) were allocated by system, while driver needs to register handler with the vector. 

  • Receive Side Scaling (RSS, supported in Windows 2008 when explicitly enabled)

When Receive Side Scaling (RSS) is enabled, all of the receive data processing for a particular TCP connection is shared across multiple processors or processor cores. Without RSS all of the processing is performed by a single processor, resulting in inefficient system cache utilization

RSS is enabled on the Advanced tab of the adapter property sheet. If your adapter does not support RSS, or if your operating system does not support it, the RSS setting will not be displayed.

rss

  • IPv6 TCP Segmentation Offloading (TSO over IPv6)

IPv6 TCP Segmentation Offloading significantly helps to reduce transmit processing performed by the vCPUs and improves both transmit efficiency and throughput. If the uplink NIC supports TSO6, the segmentation work will be offloaded to the network hardware; otherwise, software segmentation will be conducted inside the VMkernel before passing packets to the uplink. Therefore, TSO6 can be enabled for VMXNET3 whether or not the hardware NIC supports it

  • NAPI (supported in Linux)

The VMXNET3 driver is NAPI‐compliant on Linux guests. NAPI is an interrupt mitigation mechanism that improves high‐speed networking performance on Linux by switching back and forth between interrupt mode and polling mode during packet receive. It is a proven technique to improve CPU efficiency and allows the guest to process higher packet loads

New API (also referred to as NAPI) is an interface to use interrupt mitigation techniques for networking devices in the Linux kernel. Such an approach is intended to reduce the overhead of packet receiving. The idea is to defer incoming message handling until there is a sufficient amount of them so that it is worth handling them all at once.

A straightforward method of implementing a network driver is to interrupt the kernel by issuing an interrupt request (IRQ) for each and every incoming packet. However, servicing IRQs is costly in terms of processor resources and time. Therefore the straightforward implementation can be very inefficient in high-speed networks, constantly interrupting the kernel with the thousands of packets per second. Overall performance of the system as well as network throughput can suffer as a result.

Polling is an alternative to interrupt-based processing. The kernel can periodically check for the arrival of incoming network packets without being interrupted, which eliminates the overhead of interrupt processing. Establishing an optimal polling frequency is important, however. Too frequent polling wastes CPU resources by repeatedly checking for incoming packets that have not yet arrived. On the other hand, polling too infrequently introduces latency by reducing system reactivity to incoming packets, and it may result in the loss of packets if the incoming packet buffer fills up before being processed.

As a compromise, the Linux kernel uses the interrupt-driven mode by default and only switches to polling mode when the flow of incoming packets exceeds a certain threshold, known as the “weight” of the network interface

  • LRO (supported in Linux, VM‐VM only)

VMXNET3 also supports Large Receive Offload (LRO) on Linux guests. However, in ESX 4.0 the VMkernel backend supports large receive packets only if the packets originate from another virtual machine running on the same host.