Archive for VMware

Network Partition on vSAN Cluster

The problem

We had an interesting problem with a 6 host vSAN cluster where 1 host seemed to be in a network partition according to Skyline Health. I thought it would be useful to document our troubleshooting steps as it can come in useful. Our problem wasn’t one of the usual network mis-configurations but in order to reach that conclusion we needed to perform some usual tests

We had removed this host from the vSAN cluster, the HA cluster and removed from the inventory and rebuilt it, then tried adding it back into the vSAN cluster with the other 5 hosts. It let us add the host to the current vSAN Sub-cluster UUID but then partitioned itself from the other 5 hosts.

Usual restart of hostd, vpxa, clomd and vsanmgmtd did not help.

Test 1 – Check each host’s vSAN details

Running the command below will tell you a lot of information on the problem host, in our case techlabesxi1

esxcli vsan cluster get

  • Enabled: True
  • Current Local Time: 2021-07-16T14:49:44Z
  • Local Node UUID: 70ed98a3-56b4-4g90-607c4cc0e809
  • Local Node Type: NORMAL
  • Local Node State: MASTER
  • Local Node Health State: HEALTHY
  • Sub-Cluster Master UUID: 70ed45ac-bcb1-c165-98404b745103
  • Sub-Cluster Backup UUID: 70av34ab-cd31-dc45-8734b32d104
  • Sub-Cluster UUID: 527645bc-789d-745e-577a68ba6d27
  • Sub-Cluster Member Entry Revision: 1
  • Sub-Cluster Member Count: 1
  • Sub-Cluster Member UUIDs: 98ab45c4-9640-8ed0-1034-503b4d875604
  • Sub-Cluster Member Hostnames: techlabesx01
  • Unicast Mode Enabled: true
  • Maintenance Mode State: OFF
  • Config Generation: b3289723-3bed-4df5-b34f-a5ed43b4542b43 2021-07-13T12:57:06.153

Straightaway we can see it is partitioned as the Sub-Cluster Member UUIDs should have the other 5 hosts’ UUID in and the Sub-Cluster Member Hostnames should have techlabesxi2, techlabesxi3, techlabesxi4, techlabesxi5, techlabesxi6. It has also made itself a MASTER where as we already have a master with the other partitioned vSAN cluster and there can’t be two masters in a cluster.

Master role:

  • A cluster should have only one host with the Master role. More than a single host with the Master role indicates a problem
  • The host with the Master role receives all CMMDS updates from all hosts in the cluster

Backup role:

  • The host with the Backup role assumes the Master role if the current Master fails
  • Normally, only one host has the Backup role

Agent role:

  • Hosts with the Agent role are members of the cluster
  • Hosts with the Agent role can assume the Backup role or the Master role as circumstances change
  • In clusters of four or more hosts, more than one host has the Agent role

Test 2 – Can each host ping the other one?

A lot of problems can be caused by the misconfiguration of the vsan vmkernel and/or other vmkernel ports however, this was not our issue. It is worth double checking everything though. IP addresses across the specific vmkernel ports must be in the same subnet.

Get the networking details from each host by using the below command. This will give you the full vmkernel networking details including the IP address, Subnet Mask, Gateway and Broadcast

esxcli network ip interface ipv4 address list

It may be necessary to test VMkernel network connectivity between ESXi hosts in your environment. From the problem host, we tried pinging the other hosts management network.

vmkping -I vmkX x.x.x.x

Where x.x.x.x is the hostname or IP address of the server that you want to ping and vmkX is the vmkernel interface to ping out of.

This was all successful

Test 3 – Check the unicast agent list and check the NodeUUIDs on each host

To get a check on what each host’s nodeUUID is, you can run

esxcli vsan cluster unicastagent list

Conclusion

We think what happened was that the non partitioned hosts had a reference to an old UUID for techlabesx01 due to us rebuilding the host. The host was removed from the vSAN cluster and the HA cluster and completely rebuilt. However, when we removed this host originally, the other hosts did not seem to update themselves once it had gone. So when we tried to add it back in, the other hosts didn’t recognise it.

The Fix

What we had to do was disable ClustermemberListUpdates on each host

esxcfg-advcfg -s 1 /VSAN/IgnoreClustermemberListupdates

Then remove the old unicastagent information from each host for techlabesx01

esxcli vsan cluster unicastagent remove -a 192.168.1.10

Then add the new unicastagent details to each host. You have seen where to get the host UUID in Step 1

esxcli vsan cluster unicastagent add -t node -u 70ed98a3-56b4-4g90-607c4cc0e809 -U true -a 192.168.1.10 -p 12321

This resolved the issue

Comparing VM Encryption performance between ESXi 6.7U3 + vSAN and ESXi 7.0U2 + vSAN

This blog is similar to another I wrote which compared VM Encryption and vSAN encryption on ESXi 6.7U3. This time, I’m comparing VM Encryption performance on ESXi 6.7U3 and ESXi 7.0U2 running on vSAN.

What is the problem which needs to be solved?

I have posted this section before on the previous blog however it is important to understand the effect of an extra layer of encryption has on the performance of your systems. It has become a requirement (sometimes mandatory) for companies to enable protection of both personal identifiable information and data; including protecting other communications within and across environments New EU General Data Protection Regulations (GDPR) are now a legal requirement for global companies to protect the personal identifiable information of all European Union residents. In the last year, the United Kingdom has left the EU, however the General Data Protection Regulations will still be important to implement. “The Payment Card Industry Data Security Standards (PCI DSS) requires encrypted card numbers. The Health Insurance Portability and Accountability Act and Health Information Technology for Economic and Clinical Health Acts (HIPAA/HITECH) require encryption of Electronic Protected Health Information (ePHI).” (Townsendsecurity, 2019) Little is known about the effect encryption has on the performance of different data held on virtual infrastructure. VM encryption and vSAN encryption are the two data protection options I will evaluate for a better understanding of the functionality and performance effect on software defined storage.

It may be important to understand encryption functionality in order to match business and legal requirements. Certain regulations may need to be met which only specific encryption solutions can provide. Additionally, encryption adds a layer of functionality which is known to have an effect on system performance. With systems which scale into thousands, it is critical to understand what effect encryption will have on functionality and performance in large environments. It will also help when purchasing hardware which has been designed for specific environments to allow some headroom in the specification for the overhead of encryption

Testing Components

Test lab hardware (8 Servers)

HCIBench Test VMs

80 HCIBench Test VMs will be used for this test. I have placed 10 VMs on each of the 8 Dell R640 servers to provide a balanced configuration. No virtual machines other than the HCIBench test VMs will be run on this system to avoid interference with the testing.

The HCIBench appliance is running vdBench, not Fio

The specification of the 80 HCIBench Test VMs are as follows.

RAID Configuration

VM encryption will be tested on RAID1 and RAID6 vSAN storage

VM encryption RAID1 storage policy

Test ParametersConfiguration
vCenter Storage PolicyName = raid1_vsan_policy
Storage Type = vSAN
Failures to tolerate = 2 (RAID 1) Thin provisioned = Yes
Number of disk stripes per object = 2
Encryption enabled = Yes Deduplication and Compression enabled = No

VM encryption RAID6 storage policy

Test ParametersConfiguration
vCenter Storage PolicyName = raid6_vsan_policy
Storage Type = vSAN
Failures to tolerate = 2 (RAID6)
Thin provisioned = Yes
Number of disk stripes per object = 1
Encryption enabled = Yes Deduplication and Compression enabled = No

HCIBench Test Parameters

The test will run through various types of read/write workload at the different block sizes to replicate different types of applications using 1 and 2 threads.

  • 0% Read 100% Write
  • 20% Read 80% Write
  • 70% Read 30% Write

The block sizes used are

  • 4k
  • 16k
  • 64k
  • 128k

The test plan below containing 24 tests will be run for VM Encryption on 6.7U3 and again for VM Encryption on 7.0U2. These are all parameter files which are uploaded in HCIBench then can run sequentially without intervention through the test. I think I left these running for 3 days! It refreshes the cache in between tests.

Scroll across at the bottom to see the whole table

TestNumber of disksWorking Set %Number of threadsBlock size (k)Read %Write %Random %Test time (s)
12 (O/S and Data)100%14k01001007200
22 (O/S and Data)100%24k01001007200
32 (O/S and Data)100%14k20801007200
42 (O/S and Data)100%24k20801007200
52 (O/S and Data)100%14k70301007200
62 (O/S and Data)100%24k70301007200
72 (O/S and Data)100%116k01001007200
82 (O/S and Data)100%216k01001007200
92 (O/S and Data)100%116k20801007200
102 (O/S and Data)100%216k20801007200
112 (O/S and Data)100%116k70301007200
122 (O/S and Data)100%216k70301007200
132 (O/S and Data)100%164k01001007200
142 (O/S and Data)100%264k01001007200
152 (O/S and Data)100%164k20801007200
162 (O/S and Data)100%264k20801007200
172 (O/S and Data)100%164k70301007200
182 (O/S and Data)100%264k70301007200
192 (O/S and Data)100%1128k01001007200
202 (O/S and Data)100%2128k01001007200
212 (O/S and Data)100%1128k20801007200
222 (O/S and Data)100%2128k20801007200
232 (O/S and Data)100%1128k70301007200
242 (O/S and Data)100%2128k70301007200

HCIBench Performance Metrics

These metrics will be measured across all tests

Workload ParameterExplanationValue
IOPsIOPS measures the number of read and write operations per secondInput/Outputs per second
ThroughputThroughput measures the number of bits read or written per second Average IO size x IOPS = Throughput in MB/sMB/s
Read LatencyLatency is the response time when you send a small I/O to a storage device. If the I/O is a data read, latency is the time it takes for the data to come backms
Write LatencyLatency is the response time when you send a small I/O to a storage device. If the I/O is a write, latency is the time for the write acknowledgement to return.ms
Latency Standard DeviationStandard deviation is a measure of the amount of variation within a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider rangeValues must be compared to the standard deviation
Average ESXi CPU usageAverage ESXi Host CPU usage%
Average vSAN CPU usageAverage CPU use for vSAN traffic only%

Results

IOPs

IOPS measures the number of read and write operations per second. The pattern for the 3 different tests is consistent where the heavier write tests show the least IOPs gradually increasing in IOPs as the writes decrease.

IOPS and block size tend to have an inverse relationship. As the block size increases, it takes longer latency to read a single block, and therefore the number of IOPS decreases however, smaller block sizes yield higher IOPS

With RAID1 VM Encryption, 7.0U2 performs better than 6.7U3 at the lower block level – 4k and 16k but as we get into the larger 64k and 128k blocks, there is less of a difference with 6.7U3 having the slight edge over IOps performance.

With RAID6 VM Encryption, 7.0U2 has consistently higher IOPS across all tests than 6.7U3.

RAID6 VM Encryption produces less IOPs than RAID1 VM Encryption which is expected due to the increased overhead RAID6 incurs over RAID1 in general. RAID 1 results in 2 writes, one to each mirror. A RAID6 single write operation results in 3 reads and 3 writes (due to double parity) Each write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity.

RAID 1 VM Encryption

The graph below shows the comparison of IOPs between 6.7U3 and 7.0U2 with RAID 1 VM Encryption

Click the graph for an enlarged view

RAID 6 VM Encryption

The graph below shows the comparison of IOPs between 6.7U3 and 7.0U2 with RAID6 VM Encryption

Click the graph for an enlarged view

Throughput

IOPs and throughput are closely related by the following equation.

Throughput (MB/s) = IOPS * Block size

IOPS measures the number of read and write operations per second, while throughput measures the number of bits read or written per second. The higher the throughput, the more data which can be transferred. The graphs follow a consistent pattern from the heavier to the lighter workload tests. I can see the larger block sizes such as 64K and 128K have the greater throughput in each of the workload tests than 4K or 8K. As the block sizes get larger in a workload, the number of IOPS will decrease. Even though it’s fewer IOPS, you’re getting more data throughput because the block sizes are bigger. The vSAN datastore is a native 4K system. It’s important to remember that storage systems may be optimized for different block sizes. It is often the operating system and applications which set the block sizes which then run on the underlying storage. It is important to test different block sizes on storage systems to see the effect these have.

With RAID1 VM Encryption at at lower block sizes, 4k and 16k, 7.0U2 performs better with greater throughput. At the higher block sizes 64k and 128k, there is less of a difference with 6.7U3 performing slightly better but the increase is minimal.

With RAID6 VM Encryption, there is generally a higher throughput at the lower block sizes but not at the higher block sizes

RAID1 VM Encryption

The graph below shows the comparison of throughput between 6.7U3 and 7.0U2 with RAID1 VM Encryption

Click the graph for an enlarged view

RAID6 VM Encryption

The graph below shows the comparison of throughput between 6.7U3 and 7.0U2 with RAID6 VM Encryption

Click the graph for an enlarged view

Average Latency

With RAID1 VM Encryption at at lower block sizes, 4k and 16k, 7.0U2 shows less latency but at the higher block sizes there is a slight increase in latency than 6.7U3

With RAID6 VM Encryption, the 7.0U2 tests are better showing less latency than the 6.7U3 tests

RAID1 VM Encryption

The graph below shows the comparison of average latency between 6.7U3 and 7.0U2 with RAID1 VM Encryption

Click the graph for an enlarged view

RAID6 VM Encryption

The graph below shows the comparison of average latency between 6.7U3 and 7.0U2 with RAID6 VM Encryption

Click the graph for an enlarged view

Read Latency

The pattern is consistent between the read/write workloads. As the workload decreases, read latency decreases although the figures are generally quite close. Read latency for all tests varies between 0.30 and 1.40ms which is under a generally recommended limit of 15-20ms before latency starts to cause performance problems.

RAID1 VM Encryption shows lower read latency for the 7.0U2 tests than 6.7U3. There are outlier values for the Read Latency across the 4K and 16K block size when testing 2 threads which may be something to note if applications will be used at these block sizes.

RAID6 shows a slightly better latency result than RAID1 however RAID6 has more disks than mirrored RAID1 disks to read from than RAID1 therefore the reads are very fast which is reflected in the results. Faster reads result in lower latency. Overall 7.0U2 performs better than 6.7U3 apart from one value at the 128k block size with 2 threads which may be an outlier.

RAID1 VM Encryption

Click the graph for an enlarged view

RAID6 VM Encryption

Click the graph for an enlarged view

Write Latency

The lowest write latency is 0.72ms and the largest is 9.56ms. Up to 20ms is the recommended value from VMware however with all flash arrays, thse values are expected and well within these limits. With NVMe and flash disks, the faster hardware may expose bottlenecks elsewhere in hardware stack and architecture which can be compared with internal VMware host layer monitoring. Write latency can occur at several virtualization layers and filters which each cause their own latency. The layers can be seen below.

This image has an empty alt attribute; its file name is image-14.png

Latency can be caused by limits on the storage controller, queuing at the VMkernel layer, the disk IOPS limit being reached and the types of workloads being run possibly alongside other types of workloads which cause more processing.

With RAID1 Encryption, 7.0U2 performed better at the lower block size with less write latency than 6.7U3. However on the higher block sizes, 64k and 128k, 6.7U3 performs slightly better but we are talking 1-2ms.

With RAID6 VM Encryption, 7.0U2 performed well with less latency across all tests than 6.7U3.

As expected, all the RAID6 results incurred more write latency than the RAID1 results. Each RAID6 write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity producing a heavy write penalty and therefore more latency

RAID1 VM Encryption

Click the graph for an enlarged view

RAID6 VM Encryption

Click the graph for an enlarged view

Latency Standard Deviation

The standard deviation value in the testing results uses a 95th percentile. This is explained below with examples.

  • An average latency of 2ms and a 95th percentile of 6ms means that 95% of the IO were serviced under 6ms, and that would be a good result
  • An average latency of 2ms and a 95th percentile latency of 200ms means 95% of the IO were serviced under 200ms (keeping in mind that some will be higher than 200ms). This means that latencies are unpredictable and some may take a long time to complete. An operation could take less than 2ms, but every once in a while, it could take well over 200
  • Assuming a good average latency, it is typical to see the 95th percentile latency no more than 3 times the average latency.

With RAID1 Encryption, 7.0U2 performed better at the lower block size with less latency standard deviation than 6.7U3. However on the higher block sizes, 64k and 128k, 6.7U3 performs slightly better.

With RAID 6 VM Encryption, 7.0U2 performed with less standard deviation across all the tests.

RAID1 VM Encryption

Click the graph for an enlarged view

RAID6 VM Encryption

Click the graph for an enlarged view

ESXi CPU Usage %

With RAID1 VM Encryption, at the lower block sizes, 4k and 16k, 7.0U2 uses more CPU but at the higher block sizes, 7.0U2 uses slightly less CPU usage.

With RAID6 VM Encryption, there is an increase in CPU usage across all 7.0U2 compared to 6.7U3 tests. RAID 6 has a higher computational penalty than RAID1.

RAID1 VM Encryption

Click the graph for an enlarged view

RAID6 VM Encryption

Click the graph for an enlarged view

Conclusion

The performance tests were designed to get an overall view from a low workload test of 30% Write, 70% Read through a series of increasing workload tests of 80% Write, 20% Read and 100% Write, 0% Read simulation. These tests used different block sizes to simulate different application block sizes. Testing was carried out on an all flash RAID1 and RAID6 vSAN datastore to compare the performance for VM encryption between ESXi 6.7U3 and 7.0U2. The environment was set up to vendor best practice across vSphere ESXi, vSAN, vCenter and the Dell server configuration.

RAID1 VM Encryption

  • With 6.7U3, IOPs at the higher block sizes, 64k and 128k can be slightly better than 7.0U2 but not at lower block sizes.
  • With 6.7U3, throughput at the higher block sizes, 64k and 128k can be slightly better than 7.0U2 but not at lower block sizes
  • Overall latency for 6.7U3 at the higher block sizes, 64k and 128k can be slightly better than 7.0U2 but not for the lower block size
  • Read latency for 6.7U3 is higher than 7.0U2.
  • Write latency at the higher block sizes, 64k and 128k can be slightly better than 7.0U2 but not for the lower block sizes.
  • There is more standard deviation for 6.7U3 then 7.0U2.
  • At the lower blocks sizes, 6.7U3 uses less CPU on the whole but at the higher block sizes, 7.0U2 uses less CPU

RAID6 VM Encryption

  • There are higher IOPs for 7.0U2 than 6.7U3 across all tests.
  • There is generally a higher throughput for 7.0U2 at the lower block sizes, than 6.7U3 but not at the higher block sizes. However, the difference is minimal.
  • There is lower overall latency for 7.0U2 than 6.7U3 across all tests
  • There is lower read latency for 7.0U2 than 6.7U3 across all tests
  • There is lower write latency for 7.0U2 than 6.7U3 across all tests
  • There is less standard deviation for 7.0U2 than 6.7U3 across all tests
  • There is a higher CPU % usage for 7.0U2 than 6.7U3 across all tests

With newer processors, AES improvements, memory improvements, RDMA NICs and storage controller driver improvements, we may see further performance improvements in new server models.

Using PowerCLI Image Builder CLI to build a new ESXi 7.0U1c image

What do we need to build a custom image?

  • An ESXi image (Download from myvmware.com) and use the depot zip
  • VMware PowerCLI and the ESXi Image Builder module

For more information on setting this up, see this blog. Thanks to Michelle Laverick.

  • Other software depots

The vSphere ESXi depot is the main software depot you will need but there are other depots provided by vendors who create collections of VIBs specially packaged for distribution. Depots can be Online and Offline. An online software depot is accessed remotely using the HTTP protocol. An offline software depot is downloaded and accessed locally. These depots have the vendor specific VIBs that you will need to combine with the vSphere ESXi depot in order to create your custom installation image. An example could be HP’s depot on this link

What are VIBS?

VIB actually stands for vSphere Installation Bundle. It is basically a collection of files packaged into a single archive to facilitate distribution. It is composed of 3 parts

  • A file archive (The files which will be installed on the host)
  • An xml descriptor file (Describes the contents of the VIB. It contains the requirements for installing the VIB and identifies who created the VIB and the amount of testing that’s been done including any dependencies, any compatibility issues, and whether the VIB can be installed without rebooting.)
  • A signature file (Verifies the acceptance level of the VIB) There are 4 acceptance levels. See next paragraph

Acceptance levels

Each VIB is released with an acceptance level that cannot be changed. The host acceptance level determines which VIBs can be installed to a host.

VMwareCertfied

The VMwareCertified acceptance level has the most stringent requirements. VIBs with this level go through thorough testing fully equivalent to VMware in-house Quality Assurance testing for the same technology. Today, only I/O Vendor Program (IOVP) program drivers are published at this level. VMware takes support calls for VIBs with this acceptance level.

VMwareAccepted

VIBs with this acceptance level go through verification testing, but the tests do not fully test every function of the software. The partner runs the tests and VMware verifies the result. Today, CIM providers and PSA plug-ins are among the VIBs published at this level. VMware directs support calls for VIBs with this acceptance level to the partner’s support organization.

PartnerSupported

VIBs with the PartnerSupported acceptance level are published by a partner that VMware trusts. The partner performs all testing. VMware does not verify the results. This level is used for a new or nonmainstream technology that partners want to enable for VMware systems. Today, driver VIB technologies such as Infiniband, ATAoE, and SSD are at this level with nonstandard hardware drivers. VMware directs support calls for VIBs with this acceptance level to the partner’s support organization.

CommunitySupported

The CommunitySupported acceptance level is for VIBs created by individuals or companies outside of VMware partner programs. VIBs at this level have not gone through any VMware-approved testing program and are not supported by VMware Technical Support or by a VMware partner.

Steps to create an custom ESXi image

  1. I have an ESXI 7.0U1c software depot zip file and I am going to use an Intel VIB which I will add into the custom image

2. Open PowerCLI and connect to your vCenter

Connect-VIServer <vCenterServer>

3. Next I add my vSphere ESXi and Intel software depot zips

Add-EsxSoftwareDepot c:\Users\rhian\Downloads\VMware-ESXi-7.0U1c-17325551-depot.zip

Add-EsxSoftwareDepot c:\Users\rhian\Downloads\intel-nvme-vmd-en_2.5.0.1066-1OEM.700.1.0.15843807_17238162.zip

4. If you want to check what packages are available once the software depots have been added.

Get-EsxSoftwarePackage

5. Next we can check what image profiles are available. We are going to clone one of these profiles

Get-EsxImageProfile

6. There are two ways to create a new image profile, you can create an empty image profile and manually specify the VIBs you want to add, or you can clone an existing image profile and use that. I have cloned an existing image profile

New-EsxImageProfile -CloneProfile ESXi-7.0U1c-17325551-standard -name esxi701c-imageprofile -vendor vmware -AcceptanceLevel PartnerSupported

If I do a Get-EsxImageProfile now, I can see the new image profile I created

7. Next, I’ll use the Add-EsxSoftwarePackage to add and remove VIBs to/from the image profile. First of all I’ll check my extra Intel package to get the driver name then I will add the software package

Get-EsxSoftwarePackage | where {$_.Vendor -eq “INT”}

Add-EsxSoftwarePackage -ImageProfile esxi701c-imageprofile -SoftwarePackage intel-nvme-vmd -Force

8. We now have the option to export the profile as a zip or an iso.

Export-EsxImageProfile -ImageProfile esxi701c-imageprofile -FilePath c:\Users\rhian\Downloads\esxi701c-imageprofile.zip -ExportToBundle -Force -NoSignatureCheck

Export-EsxImageProfile -ImageProfile esxi701c-imageprofile -FilePath c:\Users\rhian\Downloads\esxi701c-imageprofile.iso -ExportToIso -Force -NoSignatureCheck

9. Just as a note, If you need to change the acceptance level, then you can do so by running the following command before creating the iso or zip. The example below shows changing the imageprofile to the PartnerSupport acceptance level.

Set-EsxImageProfile -AcceptanceLevel PartnerSupported –ImageProfile esxi701c-imageprofile

Useful tip

Typing history in PowerCLI will show you all the commands you have typed. Very handy to check mistakes or save the commands for future use.

Comparing the functionality and performance of VM encryption and vSAN encryption on RAID1 and RAID6 vSAN storage

What is the problem which needs to be solved?

It has become a requirement for companies to enable protection of both personal identifiable information and data; including protecting other communications within and across environments New EU General Data Protection Regulations (GDPR) are now a legal requirement for global companies to protect the personal identifiable information of all European Union residents. In the last year, the United Kingdom has left the EU, however the General Data Protection Regulations will still be important to implement. “The Payment Card Industry Data Security Standards (PCI DSS) requires encrypted card numbers. The Health Insurance Portability and Accountability Act and Health Information Technology for Economic and Clinical Health Acts (HIPAA/HITECH) require encryption of Electronic Protected Health Information (ePHI).” (Townsendsecurity, 2019) Little is known about the effect encryption has on the performance of different data held on virtual infrastructure. VM encryption and vSAN encryption are the two data protection options I will evaluate for a better understanding of the functionality and performance effect on software defined storage.

It may be important to understand encryption functionality in order to match business and legal requirements. Certain regulations may need to be met which only specific encryption solutions can provide. Additionally, encryption adds a layer of functionality which is known to have an effect on system performance. With systems which scale into thousands, it is critical to understand what effect encryption will have on functionality and performance in large environments. It will also help when purchasing hardware which has been designed for specific environments to allow some headroom in the specification for the overhead of encryption.

What will be used to test

Key IT AspectsDescription
VMware vSphere ESXi servers8 x Dell R640 ESXi servers run the virtual lab environment and the software defined storage.
HCIBench test machines80 x Linux Photon 1.0 virtual machines.
vSAN storageVirtual datastore combining all 8 ESXi server local NVMe disks. The datastore uses RAID (Redundant array of inexpensive disks), a technique combining multiple disks together for data redundancy and performance.
Key Encryption Management ServersClustered and load balanced Thales key management servers for encryption key management.
Encryption SoftwareVM encryption and vSAN encryption
Benchmarking softwareHCIBench v2.3.5 and Oracle Vdbench

Test lab hardware

8 servers

ArchitectureDetails
Server ModelDell R640 1U rackmount
CPU ModelIntel Xeon Gold 6148
CPU count2
Core count20 per CPU
Processor AES-NIEnabled in the BIOS
RAM768GB (12 x 64GB LRDIMM)
NICMellanox ConnectX-4 Lx Dual Port 25GbE rNDC
O/S Disk1 x 240GB Solid State SATADOM
vSAN Data Disk3 x 4TB U2 Intel P4510 NVMe
vSAN Cache Disk1 x 350GB Intel Optane P4800X NVMe
Physical switchCisco Nexus N9K-C93180YC-EX
Physical switch ports48 x 25GbE and 4 x 40GbE
Virtual switch typeVMware Virtual Distributed Switch
Virtual switch port typesElastic

HCIBench Test VMs

80 HCIBench Test VMs will be used for this test. I have placed 10 VMs on each of the 8 Dell R640 servers to provide a balanced configuration. No virtual machines other than the HCIBench test VMs will be run on this system to avoid interference with the testing.

The specification of the 80 HCIBench Test VMs are as follows.

ResourcesDetails
CPU4
RAM8GB
O/S VMDK primary disk16GB
Data VMDK disk20GB
Network25Gb/s

HCIBench Performance Metrics

Workload ParameterExplanationValue
IOPsIOPS measures the number of read and write operations per secondInput/Outputs per second
ThroughputThroughput measures the number of bits read or written per second Average IO size x IOPS = Throughput in MB/sMB/s
Read LatencyLatency is the response time when you send a small I/O to a storage device. If the I/O is a data read, latency is the time it takes for the data to come backms
Write LatencyLatency is the response time when you send a small I/O to a storage device. If the I/O is a write, latency is the time for the write acknowledgement to return.ms
Latency Standard DeviationStandard deviation is a measure of the amount of variation within a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider rangeValues must be compared to the standard deviation
Average ESXi CPU usageAverage ESXi Host CPU usage%
Average vSAN CPU usageAverage CPU use for vSAN traffic only%

HCIBench Test Parameter Options

The HCIBench performance options allow you to set the block size and the types of read/write ratios. In these tests, I will be using the following block sizes to give a representation of the different types of applications you can see on corporate systems

  • 4k
  • 16k
  • 64k
  • 128k

In these tests I will be using the following Read/Write ratios to also give a representation of the different types of applications you can see on corporate systems

  • 0% Read 100% Write
  • 20% Read 80% Write
  • 70% Read 30% Write

RAID Configuration

  • VM encryption will be tested on RAID1 and RAID6 vSAN storage
  • vSAN encryption will be tested on RAID1 and RAID6 vSAN storage

Note: vSAN encryption is not configured at all in the policy for vSAN encryption as this is turned on at the datastore level but we still need a generic RAID1 and RAID6 storage policy.

VM encryption RAID1 storage policy

Test ParametersConfiguration
vCenter Storage PolicyName = raid1_vsan_policy
Storage Type = vSAN
Failures to tolerate = 1 (RAID 1) Thin provisioned = Yes
Number of disk stripes per object = 1
Encryption enabled = Yes Deduplication and Compression enabled = No

VM encryption RAID6 storage policy

Test ParametersConfiguration
vCenter Storage PolicyName = raid6_vsan_policy
Storage Type = vSAN
Failures to tolerate = 2 (RAID6)
Thin provisioned = Yes
Number of disk stripes per object = 1
Encryption enabled = Yes Deduplication and Compression enabled = No

vSAN encryption RAID1 storage policy

Test ParametersConfiguration
vCenter Storage PolicyName = raid1_vsan_policy
Storage Type = vSAN
Failures to tolerate = 1 (RAID 1) Thin provisioned = Yes
Number of disk stripes per object = 1
Deduplication and Compression enabled = No

vSAN encryption RAID6 storage policy

Test ParametersConfiguration
vCenter Storage PolicyName = raid6_vsan_policy Storage Type = vSAN
Failures to tolerate = 2 (RAID6)
Thin provisioned = Yes
Number of disk stripes per object = 1
Deduplication and Compression enabled = No

Test Plans

The table below shows one individual test plan I have created. This plan is replicated for each of the tests listed below. Scroll across at the bottom to see the whole table.

  1. RAID1 Baseline
  2. RAID1 VM Encryption
  3. RAID1 vSAN Encryption
  4. RAID6 Baseline
  5. RAID6 VM Encryption
  6. RAID6 vSAN Encryption

The tests were run for 3 hours each including a warm up and warm down period.

TestNumber of disksWorking Set %Number of threadsBlock size (k)Read %Write %Random %Test time (s)
12 (O/S and Data)100%14k01001007200
22 (O/S and Data)100%24k01001007200
32 (O/S and Data)100%14k20801007200
42 (O/S and Data)100%24k20801007200
52 (O/S and Data)100%14k70301007200
62 (O/S and Data)100%24k70301007200
72 (O/S and Data)100%116k01001007200
82 (O/S and Data)100%216k01001007200
92 (O/S and Data)100%116k20801007200
102 (O/S and Data)100%216k20801007200
112 (O/S and Data)100%116k70301007200
122 (O/S and Data)100%216k70301007200
132 (O/S and Data)100%164k01001007200
142 (O/S and Data)100%264k01001007200
152 (O/S and Data)100%164k20801007200
162 (O/S and Data)100%264k20801007200
172 (O/S and Data)100%164k70301007200
182 (O/S and Data)100%264k70301007200
192 (O/S and Data)100%1128k01001007200
202 (O/S and Data)100%2128k01001007200
212 (O/S and Data)100%1128k20801007200
222 (O/S and Data)100%2128k20801007200
232 (O/S and Data)100%1128k70301007200
242 (O/S and Data)100%2128k70301007200

Results

Click on the graphs for a larger view

  • IOPS comparison for all RAID1 and RAID6 tests

IOPS measures the number of read and write operations per second. The pattern for the 3 different tests is consistent where the heavier write tests show the least IOPs gradually increasing in IOPs as the writes decrease.  IOPS and block size tend to have an inverse relationship. As the block size increases, it takes longer latency to read a single block, and therefore the number of IOPS decreases however, smaller block sizes yield higher IOPS.

It is clear to see from the graphs that RAID1 VM encryption and RAID1 vSAN encryption produces more IOPS for all tests than RAID6 VM encryption and RAID6 vSAN encryption. This is expected due to the increased overhead RAID6 incurs over RAID1 in general. RAID 1 results in 2 writes, one to each mirror. A RAID6 single write operation results in 3 reads and 3 writes (due to double parity)

Each write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity.

RAID1 VM encryption outperforms RAID1 vSAN encryption in terms of IOPs. The RAID6 results are interesting where at the lower block sizes, RAID6 VM encryption outperforms RAID6 vSAN encryption however at the higher block sizes, RAID6 vSAN encryption outperforms VM encryption.

In order of the highest IOPs

  1. RAID1 VM encryption
  2. RAID1 vSAN encryption
  3. RAID6 VM encryption
  4. RAID 6 vSAN encryption
  • Throughput comparison for all RAID1 and RAID6 tests

IOPs and throughput are closely related by the following equation.

Throughput (MB/s) = IOPS * Block size

IOPS measures the number of read and write operations per second, while throughput measures the number of bits read or written per second. The higher the throughput, the more data which can be transferred. The graphs follow a consistent pattern from the heavier to the lighter workload tests. I can see the larger block sizes such as 64K and 128K have the greater throughput in each of the workload tests than 4K or 8K. As the block sizes get larger in a workload, the number of IOPS will decrease. Even though it’s fewer IOPS, you’re getting more data throughput because the block sizes are bigger. The vSAN datastore is a native 4K system. It’s important to remember that storage systems may be optimized for different block sizes. It is often the operating system and applications which set the block sizes which then run on the underlying storage. It is important to test different block sizes on storage systems to see the effect these have.

RAID1 VM encryption has the best performance in terms of throughput against RAID1 vSAN encryption however the results are very close together.

RAID6 vSAN encryption has the best performance in terms of throughput against RAID6 VM encryption.

In order of highest throughput

  1. RAID1 VM encryption
  2. RAID1 vSAN encryption
  3. RAID6 vSAN encryption
  4. RAID6 VM encryption
  • Read Latency comparison for all RAID1 and RAID6 tests

The pattern is consistent between the read/write workloads. As the workload decreases, read latency decreases although the figures are generally quite close. Read latency for all tests varies between 0.40 and 1.70ms which is under a generally recommended limit of 15ms before latency starts to cause performance problems.

There are outlier values for the Read Latency across RAID1 VM Encryption and RAID1 vSAN encryption at 4K and 16K when testing 2 threads which may be something to note if applications will be used at these block sizes.

RAID1 vSAN encryption incurs a higher read latency in general than RAID1 VM encryption and RAID6 VM encryption incurs a higher read latency in general than RAID6 vSAN encryption however the figures are very close for all figures from the baseline.

RAID6 has more disks than mirrored RAID1 disks to read from than RAID1 therefore the reads are very fast which is reflected in the results. Faster reads result in lower latency.

From the lowest read latency to the highest

  1. RAID6 vSAN encryption
  2. RAID6 VM encryption
  3. RAID1 VM encryption
  4. RAID1 vSAN encryption
  • Write latency comparison for all RAID1 and RAID6 tests

The lowest write latency is 0.8ms and the largest is 9.38ms. Up to 20ms is the recommended value from VMware however with all flash arrays, this should be significantly lower which is what I can see from the results. With NVMe and flash disks, the faster hardware may expose bottlenecks elsewhere in hardware stack and architecture which can be compared with internal VMware host layer monitoring. Write latency can occur at several virtualization layers and filters which each cause their own latency. The layers can be seen below.

Latency can be caused by limits on the storage controller, queuing at the VMkernel layer, the disk IOPS limit being reached and the types of workloads being run possibly alongside other types of workloads which cause more processing.

The set of tests at the 100% write/0% read and 80% write/20% read have nearly no change in the write latency but it does decrease more significantly for the 30% write/70% read test.

As expected, all the RAID6 results incurred more write latency than the RAID1 results. Each RAID6 write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity producing a heavy write penalty and therefore more latency.

When split into the RAID1 VM encryption and RAID1 vSAN encryption results, RAID1 VM encryption incurs less write latency than RAID1 vSAN encryption however the values are very close.

When split into the RAID6 VM encryption and RAID6 vSAN encryption results, RAID6 VM encryption seems to perform with less write latency at the lower block sizes however performs with more write latency at the higher block sizes than RAID6 vSAN encryption.

From the lowest write latency to the highest.

  1. RAID1 VM encryption
  2. RAID1 vSAN encryption
  3. RAID6 vSAN encryption
  4. RAID6 VM encryption

Latency Standard Deviation comparison for all RAID1 and RAID6 tests

The standard deviation value in the testing results uses a 95th percentile. This is explained below with examples.

  • An average latency of 2ms and a 95th percentile of 6ms means that 95% of the IO were serviced under 6ms, and that would be a good result
  • An average latency of 2ms and a 95th percentile latency of 200ms means 95% of the IO were serviced under 200ms (keeping in mind that some will be higher than 200ms). This means that latencies are unpredictable and some may take a long time to complete. An operation could take less than 2ms, but every once in a while, it could take well over 200
  • Assuming a good average latency, it is typical to see the 95th percentile latency no more than 3 times the average latency.

I analysed the results to see if the 95th percentile latency was no more than 3 times the average latency for all tests. I added new columns for multiplying the latency figures for all tests by 3 then comparing this to the standard deviation figure. The formula for these columns was =sum(<relevant_latency_column*3)

In the 80% write, 20% read test for the 64K RAID1 Baseline there was one result which was more than 3 times the average latency however not by a significant amount. In the 30% write, 70% read test for the 64K RAID6 Baseline, there were two results which were more than 3 times the average latency however not by a significant amount.

For all the RAID1 and RAID6 VM encryption and vSAN encryption tests, all standard deviation results overall were less than 3 times the average latency indicating that potentially, AES-NI may give encryption a performance enhancement which prevents significant latency deviations.

ESXi CPU usage comparison for all RAID1 and RAID6 tests

I used a percentage change formula on the ESXi CPU usage data for all tests. Percentage change differs from percent increase and percent decrease formulas because both directions of the change (Negative or positive) are seen. VMware calculated that using a percentage change formula, that VM encryption added up to 20% overhead to CPU usage (This was for an older vSphere O/S). There are no figures for vSAN encryption from VMware so I have used the same formula for all tests. I used the formula below to calculate the percentage change for all tests.

 % change = 100 x  (test value – baseline value)/baseline value

The lowest percentage change is -7.73% and the highest percentage change is 18.37% so the tests are all within VMware’s recommendation that encryption can add up to 20% more server CPU usage.  Interestingly when the figures are negative, it shows an improvement over the baseline. This could be due to the way AES-NI boosts performance when encryption is enabled. RAID6 VM Encryption and vSAN encryption show more results which outperformed the baseline in these tests than RAID1 VM Encryption and vSAN encryption.

What is interesting about the RAID1 vSAN encryption and RAID6 vSAN encryption figures is that RAID1 vSAN encryption CPU usage goes up between 1 and 2 threads however RAID6 vSAN encryption CPU usage goes down between 1 and 2 threads.

Overall, there is a definite increase in CPU usage when VM encryption or vSAN encryption is enabled for both RAID1 and RAID6 however from looking at graphs, the impact is minimal even at the higher workloads.

RAID6 VM encryption uses less CPU at the higher block sizes than RAID6 vSAN encryption.

From the lowest ESXi CPU Usage to the highest.

  1. RAID6 VM encryption
  2. RAID6 vSAN encryption
  3. RAID1 VM encryption
  4. RAID1 vSAN encryption

vSAN CPU usage comparison for all RAID1 and RAID6 tests

For the vSAN CPU usage tests. I used a percentage change formula on the data for the vSAN CPU usage comparison tests. Percentage change differs from percent increase and percent decrease formulas because I can see both directions of the change (Negative or positive) Negative values indicate the vSAN CPU usage with encryption performed better than the baseline. VMware calculated that using a percentage change formula, that VM encryption would add up to 20% overhead. There are no figures for vSAN encryption from VMware so I have used the same formula for these tests also.

 % change = 100 x  (test value – baseline value)/baseline value

The lowest percentage change is -21.88% and the highest percentage change is 12.50% so the tests are all within VMware’s recommendation that encryption in general can add up to 20% more CPU usage. Interestingly when the figures are negative, it shows an improvement over the baseline. This could be due to the way AES-NI boosts performance when encryption is enabled.

RAID1 VM encryption and RAID1 vSAN encryption uses more vSAN CPU than RAID6 VM encryption and RAID6 vSAN encryption. All of the RAID6 VM encryption figures performed better than the RAID6 baseline with the majority of RAID6 vSAN encryption figures performing better than the baseline. In comparison RAID1 VM encryption and RAID1 vSAN encryption nearly always used more CPU than the RAID1 baseline.

From the lowest vSAN CPU usage to the highest.

  1. RAID6 VM encryption
  2. RAID6 vSAN encryption
  3. RAID1 vSAN encryption
  4. RAID1 VM encryption

Conclusion

The following pages provide a final conclusion on the comparison between the functionality and performance of VM Encryption and vSAN Encryption.

Functionality

The main functionality differences can be summed up as follows

VM Encryption

  • Storage Policy based (enable per VM)
  • Data travels encrypted.
  • No deduplication or compression.
  • Simple to set up with a key management server.
  • The DEK key is stored encrypted in the VMX file/VM advanced settings.
  • vSAN and VM encryption use the exact same encryption and kmip libraries but they have very different profiles. VM Encryption is a per-VM encryption.
  • VM Encryption utilizes the vCenter server for key management server key transfer. The hosts do not contact the key management server. vCenter only is a licensed key management client reducing license costs.
  • Storage agnostic

vSAN Encryption

  • Enabled on a virtual cluster datastore level. Encryption is happening at different places in the hypervisor’s layers.
  • Data travels unencrypted, but it is written encrypted to the cache layer.
  • Full compatibility with deduplication and compression.
  • More complicated to set up with a key management server as each vendor has a different way of managing the trust between the key management server the vCenter Server.
  • The DEK key is stored encrypted in metadata on each disk.
  • vSAN and VM encryption use the exact same libraries but they have very different profiles.
  • VM Encryption utilizes the vCenter server for key management server key transfer. The hosts do not contact the key management server. vCenter only is a licensed key management client reducing license costs.
  • vSAN only, no other storage is able to be used for vSAN encryption.

Functionality conclusion

VM encryption and vSAN encryption are similar in some functionality. Both use a KMS server, both support RAID1, RAID5 and RAID6 encryption and both use the same encryption libraries and the kmip protocol. However, there are some fundamental differences. VM encryption gives the flexibility of encrypting individual virtual machines on a datastore opposed to encrypting a complete datastore with vSAN encryption where all VMs will automatically be encrypted. Both solutions provide data at rest encryption but only VM encryption provides end to end encryption as it writes an encrypted data stream whereas vSAN encryption receives an unencrypted data stream and encrypts it during the write process. Due to this level at which data is encrypted at, VM encryption cannot be used with features such as deduplication and compression however vSAN encryption can. It depends if this functionality is required and if the space which could be saved was significant. VM encryption is datastore independent and can use vSAN, NAS, FC and iSCSi datastores. vSAN encryption can only be used on virtual machines on a vSAN datastore. Choosing the encryption depends on whether different types of storage reside in the environment and whether they require encryption.

The choice between VM encryption functionality and vSAN encryption functionality will be on a use case dependency of whether individual virtual machine encryption control is required and/or whether there is other storage in an organization targeted for encryption. If this is the case, VM encryption will be best. If these factors are not required and deduplication and compression are required, then vSAN encryption is recommended.

Performance conclusion

The performance tests were designed to get an overall view from a low workload test of 30% Write, 70% Read through a series of increasing workload tests of 80% Write, 20% Read and 100% Write, 0% Read simulation. These tests used different block sizes to simulate different application block sizes. Testing was carried out on an all flash RAID1 and RAID6 vSAN datastore to compare the performance for VM encryption and vSAN encryption. The environment was set up to vendor best practice across vSphere ESXi, vSAN, vCenter and the Dell server configuration.

It can be seen in all these tests that performance is affected by the below factors.

  • Block size.
  • Workload ratios.
  • RAID level.
  • Threads used
  • Application configuration settings.
  • Access pattern of the application.

The table below shows a breakdown of the performance but in some cases the results are very close

Metric1st2nd3rd4th
IOPsRAID1 VM encryptionRAID1 vSAN encryptionRAID6 VM encryptionRAID6 vSAN encryption
ThroughputRAID1 VM encryptionRAID1 vSAN encryptionRAID6 vSAN encryptionRAID6 VM encryption
Read LatencyRAID6 vSAN encryptionRAID6 VM encryptionRAID1 VM encryptionRAID1 vSAN encryption
Write LatencyRAID1 VM encryptionRAID1 vSAN encryptionRAID6 vSAN encryptionRAID6 VM encryption
Standard DevAll standard deviation results were less than 3 times the average latency which is recommended with minor outliersAll standard deviation results were less than 3 times the average latency which is recommended with minor outliersAll standard deviation results were less than 3 times the average latency which is recommended with minor outliersAll standard deviation results were less than 3 times the average latency which is recommended with minor outliers
ESXi CPU UsageRAID6 VM encryptionRAID6 vSAN encryptionRAID1 VM encryptionRAID1 vSAN encryption
vSAN CPU UsageRAID6 VM encryptionRAID6 vSAN encryptionRAID1 vSAN encryptionRAID1 VM encryption

In terms of IOPs, RAID1 VM encryption produces the highest IOPS for all tests. This is expected due to the increased overhead RAID6 incurs over RAID1 in general. RAID 1 results in 2 writes, one to each mirror. A RAID6 single write operation results in 3 reads and 3 writes (due to double parity) causing more latency decreasing the IOPs.

In terms of throughput, RAID1 VM encryption produces the highest throughput for all tests. It is expected that by producing the highest IOPs in the majority of tests would mean it would produce a similar result for the throughput. Depending on whether your environment needs larger IOPs or larger throughput depends on the block sizing. Larger block sizes produce the best throughput due to getting more data through the system in bigger blocks. As the block size increases, it takes longer latency to read a single block, and therefore the number of IOPS decreases however, smaller block sizes yield higher IOPS.

In terms of read latency, RAID6 vSAN encryption performed best in the read latency tests. Read latency for all tests varies between 0.40 and 1.70ms which is under a generally recommended limit of 15ms before latency starts to cause performance problems. RAID6 has more disks than mirrored RAID1 disks to read from than RAID1 therefore the reads are very fast which is reflected in the results. Faster reads result in lower latency. The values overall were very close.

In terms of write latency, RAID1 VM encryption performed best. All the RAID6 results incurred more write latency than the RAID1 results which was to be expected. Each RAID6 write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity producing a heavy write penalty and therefore more latency. The lowest write latency is 0.8ms and the largest is 9.38ms. Up to 20ms is the recommended value therefore all tests were well within acceptable limits.

The performance of encrypted data also seems to be enhanced by the use of newer flash disks like SSDs and NVME showing latency figures which were within the acceptable values. SSD and NVMe uses a streamlined lightweight protocol compared to SAS, SCSI and AHC protocols while also reducing CPU cycles.

In terms of standard deviation, all standard deviation test results were less than 3 times the average latency which is recommended.

In terms of average ESXi CPU and vSAN CPU usage, RAID6 VM encryption produced the lowest increase in CPU. All encryption appeared to be enhanced by leveraging the AES-NI instructions in Intel and AMD CPU’s. The increase in CPU usage by the hosts and vSAN compared to the baseline for both sets of encryption tests is minimal and within acceptable margins by a considerable amount. In some cases, there was lower CPU use than the baseline possibly due to the AES-NI offload.

Encryption recommendation

Overall RAID1 VM encryption produces the best IOPs, throughput and write latency including the standard deviation metric values for latency being well under the acceptable limits. RAID1 ESXi CPU usage and vSAN CPU usage is higher than RAID6 however the difference is minimal when looking at the graphs especially in some cases where both sets of tests can outperform the baseline across the different block sizes. For applications which need very fast read performance, RAID6 will always be the best option due to having more disks than mirrored RAID1 disks to read from therefore this encryption should be matched to a specific application requirement if reads are a priority.

Reference

(Townsendsecurity, 2019) The Definitive Guide to VMware Encryption and Key Management [Online]. Available at https://info.townsendsecurity.com/vmware-encryption-key-management-definitive-guide (Accessed 19 February 2020)

Installing and Upgrading UMDS for ESXi 6.7U3 on a Linux-Based Operating System – Ubuntu 14.04.06

In vSphere 6.7 releases and older, the UMDS 6.7 is bundled with the vCenter Server Appliance installer. You can use the UMDS bundle from the vCenter Server Appliance to install UMDS 6.7 on a separate Linux-based system.

UMDS is a 64-bit application and requires a 64-bit Linux-based system.

You cannot upgrade UMDS that runs on a Linux-based operating system. You can uninstall the current version of UMDS, perform a fresh installation of UMDS according to all system requirements, and use the existing patch store from the UMDS that you uninstalled.

Supported Operating Systems

The Update Manager Download Service (UMDS) can run on a limited number of Linux-based operating systems.

  • Ubuntu 14.0.4
  • Ubuntu 18.04
  • Red Hat Enterprise Linux 7.4
  • Red Hat Enterprise Linux 7.5

Download from

http://releases.ubuntu.com/14.04/

Pre-requisites

  • UMDS Software found on the vCenter Server Appliance 6.7U3 iso.
  • Ubuntu 14.04.x

Procedure

  • Upload the Ubuntu ISO to your datastore
  • In vCenter, create a new Ubuntu VM
  • Once this is built, edit the settings and attach the iso to the Ubuntu CD drive via the datastore
  • On the “Virtual Hardware” tab, expand “CPU,” and select “Expose hardware assisted virtualization to guest OS.”
  • Right-click your newly created VM and select “Power On.”
  • Right-click the VM again and select “Open Console.”
  • The Ubuntu installation process should begin automatically and the first prompt is to choose a language. Select English and press Enter
  • Highlight “Install Ubuntu Server” and press Enter
  • Select English
  • Select your location – In my case the United Kingdom
  • On the “Configure the keyboard” screen select “Yes” and press Enter. Once it has taken you through the configuration, you will see the page below
  • It will run through the below screen
  • Enter a hostname
  • Ubuntu prompts you to create a user account to be used instead of a root account. Start by entering the full name (first and last) of the user and Press Enter
  • Enter a username for the account
  • Choose a password
  • Enter the password again
  • I chose not to encrypt my home directory, so select “No” and press Enter
  • Configure the clock
  • Select “Guided – use entire disk and set up LVM” and press Enter
  • Choose the disk to partition. I only have one
  • Select “Yes” and press enter to write the changes to disk and configure LVM.
  • Accept the default amount of the volume group that will be used for guided partitioning. This tells the installer to use the full disk and press  Enter
  • Select Yes and press enter to write the changes to disk
  • It will proceed to install the system
  • The install will begin and at some point it will prompt you to enter your Internet proxy. I don’t use one, so I left it blank and pressed Enter
  • A dialog will be presented asking you how you want to manage system upgrades. In this case I’ll manually apply updates, so I selected “No automatic updates” and pressed Enter
  • On the Software selection I have just chosen OpenSSH and PostGres as I am installing a VMware UMDS server
  • I clicked Yes to the Grub boot loader message
  • Finish the Installation
  • You are now ready to use the Ubuntu VM

Update VMware Tools

VMware Tools is a group of utilities and drivers that enhance the performance of the virtual machine’s guest operating system when running on an ESXi host. The steps below walk you through installing VMware Tools on our Ubuntu Server 14.04.06 virtual machine using the command line. Note that whenever you update the Linux kernel you will have to reinstall VMware Tools.

  • Launch a Web browser and login to the vSphere Web Client.
  • From the vCenter Home page click on “VMs and Templates.”
  • Right-click the VM and navigate to “All vCenter Actions” > “Guest OS” > “Install VMware Tools.”
  • When prompted click “Mount” to mount the VMware Tools installation disk image on the virtual CD/DVD drive of the Ubuntu Server virtual machine.
  • Right-click the VM again and select “Open Console.”
  • Login with the credentials used during the server installation process
  • Mount the VMware Tools CD image to /media/cdrom:
  • $ sudo mount /dev/cdrom /media/cdrom mount
  • /dev/sr0 is write-protected, mounting read-only
  • Extract the VMware Tools installer archive file to /tmp
  • $ tar xzvf /media/cdrom/VMwareTools-*.tar.gz -C /tmp/
  • Install VMware Tools by running the command below. Note that the -d switch assumes that you want to accept the defaults. If you don’t use -d switch you can opt to choose the default or a custom setting for each question.
  • $ cd /tmp/vmware-tools-distrib/
  • $ sudo ./vmware-install.pl -d ... The configuration of VMware Tools 9.4.5 build-1598834 for Linux for this running kernel completed successfully. ...
  • Reboot the virtual machine after the installation completes:
  • $ sudo reboot

Preparing the VM for UMDS

The next step is to prepare the VM for UMDS and then install it.

The following pre-requisite components for Linux are required but read on..

  • perl
  • tar
  • sed
  • psmisc
  • unixodbc
  • postgresql
  • postgresql-contrib
  • odbc-postgresql

William Lam has a really great script that automates this: http://www.virtuallyghetto.com/2016/11/automating-the-installation-of-vum-update-manager-download-service-umds-for-linux-in-vsphere-6-5.html

When you install UMDS manually, you are prompted for several responses and the script currently just uses those defaults. If you wish to change them, you simply just need to edit the “answer” file that the script generates to provide to the UMDS installer itself.

Here is what the script is doing at a high level:

  1. Extract the UMDS installer into /tmp
  2. Install all OS package dependencies
  3. Create UMDS installer answer file /tmp/answer
  4. Create the /etc/odbc.ini and /etc/odbcinst.ini configuration file
  5. Updating pg_hba.conf to allow UMDS user to access the DB
  6. Start Postgres DB
  7. Create UMDSDB user and setting the assigned password
  8. Install UMDS

Procedure

  • Upload both the UMDS install script (install_umds65.sh) as well as the UMDS install package found in the VCSA 6.5 ISO to an already deployed Ubuntu system
  • The script needs to run as root and it requires the following 5 command-line options:
  1. UMDS package installer
  2. Name of the UMDS Database
  3. Name of the UMDS DSN Entry
  4. Username for running the UMDS service
  5. Password for the UMDS username

Running the script

I found I had to adjust the permissions on the script which I did via WinSCP first

sudo ./install_umds65.sh VMware-UMDS-6.7.0-14203538.tar.gz UMDSDB UMDS_DSN umdsuser VMware1!

It will start to install and this is what you will see

Extracting VMware-UMDS-6.7.0-14203538.tar.gz to /tmp …
vmware-umds-distrib/
vmware-umds-distrib/bin/
vmware-umds-distrib/bin/7z
vmware-umds-distrib/bin/vciInstallUtils
vmware-umds-distrib/bin/vmware-umds
vmware-umds-distrib/bin/downloadConfig.xml
vmware-umds-distrib/bin/vciInstallUtils_config.xml
vmware-umds-distrib/bin/unzip
vmware-umds-distrib/bin/zip
vmware-umds-distrib/bin/umds
vmware-umds-distrib/bin/vmware-updatemgr-wrapper
vmware-umds-distrib/bin/vmware-vciInstallUtils
vmware-umds-distrib/EULA
vmware-umds-distrib/share/
vmware-umds-distrib/share/VCI_proc_postgresql-100-110.sql
vmware-umds-distrib/share/VCI_proc_postgresql-110-120.sql
vmware-umds-distrib/share/VCI_data_postgresql-100-110.sql
vmware-umds-distrib/share/VCI_table_postgresql-110-120.sql
vmware-umds-distrib/share/VCI_base_postgresql.sql
vmware-umds-distrib/share/VCI_undo_postgresql.sql
vmware-umds-distrib/share/VCI_initialsetup_postgresql.sql
vmware-umds-distrib/share/VCI_table_postgresql-100-110.sql
vmware-umds-distrib/share/VCI_data_postgresql-110-120.sql
vmware-umds-distrib/share/VCI_proc_postgresql.sql
vmware-umds-distrib/lib/
vmware-umds-distrib/lib/libvim-types.so
vmware-umds-distrib/lib/libboost_program_options-gcc48-mt-1_61.so.1.61.0
vmware-umds-distrib/lib/libboost_serialization-gcc48-mt-1_61.so.1.61.0
vmware-umds-distrib/lib/libvci-types.so
vmware-umds-distrib/lib/libssl.so.1.0.2
vmware-umds-distrib/lib/libvmacore.so
vmware-umds-distrib/lib/libvci-registrar.so
vmware-umds-distrib/lib/libboost_thread-gcc48-mt-1_61.so.1.61.0
vmware-umds-distrib/lib/libvci-vcIntegrity.so
vmware-umds-distrib/lib/libufa-types.so
vmware-umds-distrib/lib/libssoclient.so
vmware-umds-distrib/lib/libvsanmgmt-types.so
vmware-umds-distrib/lib/liblog4cpp.so.4
vmware-umds-distrib/lib/libstdc++.so.6
vmware-umds-distrib/lib/libboost_filesystem-gcc48-mt-1_61.so.1.61.0
vmware-umds-distrib/lib/libcares.so.2
vmware-umds-distrib/lib/libodbc.so.2
vmware-umds-distrib/lib/libufa-common.so
vmware-umds-distrib/lib/libcurl.so.4
vmware-umds-distrib/lib/libvmomi.so
vmware-umds-distrib/lib/libexpat.so
vmware-umds-distrib/lib/libboost_system-gcc48-mt-1_61.so.1.61.0
vmware-umds-distrib/lib/libnfclib.so
vmware-umds-distrib/lib/libcrypto.so.1.0.2
vmware-umds-distrib/lib/libz.so.1
vmware-umds-distrib/vmware-install.pl
Installing UMDS package dependencies …
Hit http://security.ubuntu.com trusty-security InRelease
Ign http://gb.archive.ubuntu.com trusty InRelease
Hit http://gb.archive.ubuntu.com trusty-updates InRelease
Hit http://gb.archive.ubuntu.com trusty-backports InRelease
Hit http://gb.archive.ubuntu.com trusty Release.gpg
Hit http://gb.archive.ubuntu.com trusty Release
Hit http://security.ubuntu.com trusty-security/main Sources
Hit http://gb.archive.ubuntu.com trusty-updates/main Sources
Hit http://security.ubuntu.com trusty-security/restricted Sources
Hit http://gb.archive.ubuntu.com trusty-updates/restricted Sources
Hit http://security.ubuntu.com trusty-security/universe Sources
Hit http://gb.archive.ubuntu.com trusty-updates/universe Sources
Hit http://security.ubuntu.com trusty-security/multiverse Sources
Hit http://gb.archive.ubuntu.com trusty-updates/multiverse Sources
Hit http://security.ubuntu.com trusty-security/main amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/main amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/restricted amd64 Packages
Hit http://security.ubuntu.com trusty-security/restricted amd64 Packages
Hit http://security.ubuntu.com trusty-security/universe amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/universe amd64 Packages
Hit http://security.ubuntu.com trusty-security/multiverse amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/multiverse amd64 Packages
Hit http://security.ubuntu.com trusty-security/main i386 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/main i386 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/restricted i386 Packages
Hit http://security.ubuntu.com trusty-security/restricted i386 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/universe i386 Packages
Hit http://security.ubuntu.com trusty-security/universe i386 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/multiverse i386 Packages
Hit http://security.ubuntu.com trusty-security/multiverse i386 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/main Translation-en
Hit http://security.ubuntu.com trusty-security/main Translation-en
Hit http://security.ubuntu.com trusty-security/multiverse Translation-en
Hit http://security.ubuntu.com trusty-security/restricted Translation-en
Hit http://security.ubuntu.com trusty-security/universe Translation-en
Hit http://gb.archive.ubuntu.com trusty-updates/multiverse Translation-en
Hit http://gb.archive.ubuntu.com trusty-updates/restricted Translation-en
Hit http://gb.archive.ubuntu.com trusty-updates/universe Translation-en
Hit http://gb.archive.ubuntu.com trusty-backports/main Sources
Hit http://gb.archive.ubuntu.com trusty-backports/restricted Sources
Hit http://gb.archive.ubuntu.com trusty-backports/universe Sources
Hit http://gb.archive.ubuntu.com trusty-backports/multiverse Sources
Hit http://gb.archive.ubuntu.com trusty-backports/main amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/restricted amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/universe amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/multiverse amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/main i386 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/restricted i386 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/universe i386 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/multiverse i386 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/main Translation-en
Hit http://gb.archive.ubuntu.com trusty-backports/multiverse Translation-en
Hit http://gb.archive.ubuntu.com trusty-backports/restricted Translation-en
Hit http://gb.archive.ubuntu.com trusty-backports/universe Translation-en
Hit http://gb.archive.ubuntu.com trusty/main Sources
Hit http://gb.archive.ubuntu.com trusty/restricted Sources
Hit http://gb.archive.ubuntu.com trusty/universe Sources
Hit http://gb.archive.ubuntu.com trusty/multiverse Sources
Hit http://gb.archive.ubuntu.com trusty/main amd64 Packages
Hit http://gb.archive.ubuntu.com trusty/restricted amd64 Packages
Hit http://gb.archive.ubuntu.com trusty/universe amd64 Packages
Hit http://gb.archive.ubuntu.com trusty/multiverse amd64 Packages
Hit http://gb.archive.ubuntu.com trusty/main i386 Packages
Hit http://gb.archive.ubuntu.com trusty/restricted i386 Packages
Hit http://gb.archive.ubuntu.com trusty/universe i386 Packages
Hit http://gb.archive.ubuntu.com trusty/multiverse i386 Packages
Hit http://gb.archive.ubuntu.com trusty/main Translation-en_GB
Hit http://gb.archive.ubuntu.com trusty/main Translation-en
Hit http://gb.archive.ubuntu.com trusty/multiverse Translation-en_GB
Hit http://gb.archive.ubuntu.com trusty/multiverse Translation-en
Hit http://gb.archive.ubuntu.com trusty/restricted Translation-en_GB
Hit http://gb.archive.ubuntu.com trusty/restricted Translation-en
Hit http://gb.archive.ubuntu.com trusty/universe Translation-en_GB
Hit http://gb.archive.ubuntu.com trusty/universe Translation-en
Reading package lists… Done
Reading package lists… Done
Building dependency tree
Reading state information… Done
psmisc is already the newest version.
sed is already the newest version.
perl is already the newest version.
postgresql is already the newest version.
postgresql-contrib is already the newest version.
tar is already the newest version.
vim is already the newest version.
The following extra packages will be installed:
libltdl7 libodbc1 odbcinst odbcinst1debian2
Suggested packages:
libmyodbc tdsodbc unixodbc-bin
The following NEW packages will be installed
libltdl7 libodbc1 odbc-postgresql odbcinst odbcinst1debian2 unixodbc
0 to upgrade, 6 to newly install, 0 to remove and 32 not to upgrade.
Need to get 791 kB of archives.
After this operation, 2,607 kB of additional disk space will be used.
Get:1 http://gb.archive.ubuntu.com/ubuntu/ trusty/main libltdl7 amd64 2.4.2-1.7ubuntu1 [35.0 kB]
Get:2 http://gb.archive.ubuntu.com/ubuntu/ trusty/main libodbc1 amd64 2.2.14p2-5ubuntu5 [175 kB]
Get:3 http://gb.archive.ubuntu.com/ubuntu/ trusty/main odbcinst amd64 2.2.14p2-5ubuntu5 [12.6 kB]
Get:4 http://gb.archive.ubuntu.com/ubuntu/ trusty/main odbcinst1debian2 amd64 2.2.14p2-5ubuntu5 [40.6 kB]
Get:5 http://gb.archive.ubuntu.com/ubuntu/ trusty/universe odbc-postgresql amd64 1:09.02.0100-2ubuntu1 [507 kB]
Get:6 http://gb.archive.ubuntu.com/ubuntu/ trusty/main unixodbc amd64 2.2.14p2-5ubuntu5 [19.8 kB]
Fetched 791 kB in 0s (3,669 kB/s)
Selecting previously unselected package libltdl7:amd64.
(Reading database … 61291 files and directories currently installed.)
Preparing to unpack …/libltdl7_2.4.2-1.7ubuntu1_amd64.deb …
Unpacking libltdl7:amd64 (2.4.2-1.7ubuntu1) …
Selecting previously unselected package libodbc1:amd64.
Preparing to unpack …/libodbc1_2.2.14p2-5ubuntu5_amd64.deb …
Unpacking libodbc1:amd64 (2.2.14p2-5ubuntu5) …
Selecting previously unselected package odbcinst.
Preparing to unpack …/odbcinst_2.2.14p2-5ubuntu5_amd64.deb …
Unpacking odbcinst (2.2.14p2-5ubuntu5) …
Selecting previously unselected package odbcinst1debian2:amd64.
Preparing to unpack …/odbcinst1debian2_2.2.14p2-5ubuntu5_amd64.deb …
Unpacking odbcinst1debian2:amd64 (2.2.14p2-5ubuntu5) …
Selecting previously unselected package odbc-postgresql:amd64.
Preparing to unpack …/odbc-postgresql_1%3a09.02.0100-2ubuntu1_amd64.deb …
Unpacking odbc-postgresql:amd64 (1:09.02.0100-2ubuntu1) …
Selecting previously unselected package unixodbc.
Preparing to unpack …/unixodbc_2.2.14p2-5ubuntu5_amd64.deb …
Unpacking unixodbc (2.2.14p2-5ubuntu5) …
Processing triggers for man-db (2.6.7.1-1ubuntu1) …
Setting up libltdl7:amd64 (2.4.2-1.7ubuntu1) …
Setting up libodbc1:amd64 (2.2.14p2-5ubuntu5) …
Setting up odbcinst (2.2.14p2-5ubuntu5) …
Setting up odbcinst1debian2:amd64 (2.2.14p2-5ubuntu5) …
Setting up odbc-postgresql:amd64 (1:09.02.0100-2ubuntu1) …
odbcinst: Driver installed. Usage count increased to 1.
Target directory is /etc
odbcinst: Driver installed. Usage count increased to 1.
Target directory is /etc
Setting up unixodbc (2.2.14p2-5ubuntu5) …
Processing triggers for libc-bin (2.19-0ubuntu6.14) …
Creating UMDS Installer answer file …
Creating /etc/odbc.ini …
Updating /etc/odbcinst.ini …
Updating pg_hba.conf …
Symlink /var/run/postgresql/.s.PGSQL.5432 /tmp/.s.PGSQL.5432 …
Starting Postgres …

  • Starting PostgreSQL 9.3 database server [ OK ]
    Sleeping for 60 seconds for Postgres DB to be ready …
    Creating UMDS DB + User …
    SELECT pg_catalog.set_config(‘search_path’, ”, false)
    CREATE ROLE umdsuser NOSUPERUSER CREATEDB CREATEROLE INHERIT LOGIN;
    ALTER ROLE
    Install VUM UMDS …
    Installing VMware Update Manager Download Service.

Logs would be store at /var/log/vmware/vmware-updatemgr/umds
Creating the log directory if required….

In which directory do you want to install Download service?
[/usr/local/vmware-umds]
The path “/usr/local/vmware-umds” does not exist currently. This program is
going to create it, including needed parent directories. Is this what you want?

Let us setup some things for you…

Do you need proxy to connect to internet? [no]
One more thing…we need a storage location to store patches. Make sure you
have enough space in that location

Where do you want download service to store patches
[/var/lib/vmware-umds]
The path “/var/lib/vmware-umds” does not exist currently. This program is going
to create it, including needed parent directories. Is this what you want?

The installation of VMware Update Manager Download Service 6.7.0 build-14203538
completed successfully. You can decide to remove this software from your system
at any time by invoking the following command:
“/usr/local/vmware-umds/vmware-uninstall-umds.pl”.

Enjoy,

–the VMware team

Post script install

Once the UMDS installer script completes, you can verify by running the following two commands which provides you with the version of UMDS as well as the current configurations:

/usr/local/vmware-umds/bin/vmware-umds -v

/usr/local/vmware-umds/bin/vmware-umds -G

More setup Commands

Log in to the machine where UMDS is installed, and open a Command Prompt window.

The default location in 64-bit Linux is /usr/local/vmware-umds.

Enabling ESXi Updated and VA Updates

To set up a download of all ESXi host updates and all virtual appliance upgrades, run the following command:

vmware-umds -S –enable-host –enable-va

To set up a download of all ESXi host updates and disable the download of virtual appliance upgrades, run the following command:

vmware-umds -S –enable-host –disable-va

To set up a download of all virtual appliance upgrades and disable the download of host updates, run the following command:

vmware-umds -S –disable-host –enable-va

Changing the Patch Store folder

The default folder to which UMDS downloads patch binaries and patch metadata on a Linux machine is /var/lib/vmware-umds 

vmware-umds -S –patch-store your_new_patchstore_folder

Adding a new URL

To add a new URL address for downloading patches and notifications for ESXi 5.5, ESXi 6.0, or ESXi 6.5 hosts, run the following command:

vmware-umds -S –add-url https://host_URL/index.xml –url-type HOST

And to remove a URL

vmware-umds.exe -S –remove-url https://URL_to_remove/index.xml

Download selected updates

This command downloads all the upgrades, patches and notifications from the configured sources for the first time. Subsequently, it downloads all new patches and notifications released after the previous UMDS download.

vmware-umds -D

You should now see the below when it finishes

Making the Content available via a Web Server

You have now successfully installed UMDS. Once you have download all of your content, you will need to setup an HTTP server to make it available to VUM instance in the vCenter Server Appliance. You can configure any popular HTTP Server such as Nginx or Apache. For my lab, I used the tiny HTTP server that Python provides.

To make the content under /var/lib/vmware-umds available, just change into that directory and run the following command:

python -m SimpleHTTPServer

Then if you navigate to a browser and type and http://192.168.1.69:8000, you should see your files

You can now add this URL into your vCenter Update Manager download settings

Thanks

Thanks to William Lam who’s blog I followed to set this up.

Testing port connectivity from vCenters and Hosts

Using Curl to test port conectivity from vCenter

Curl is available in the VMware vCenter Server Appliance command line interface. This small blog provides a simple example of using Curl to simulate a telnet connection to test port connectivity

To test port connectivity in VMware vCenter Server Appliance:

  1. Log in as root user through the VMware vCenter Server Appliance console.
  2. Run this command on the vCenter Server Appliance:

curl -v telnet://target ip address:port number

Example of testing port connectivity

All vCenter servers must have access to the UMDS server on port 80 (http)

The below screen-print shows a working curl test from a vCenter to a Windows UMDS Server on IP address 10.124.74.65 over port 80.

Using Netcat to test port connectivity from hosts

The telnet command is not available in any versions of ESXi and, therefore, you must use netcat (nc) to confirm connectivity to a TCP port on a remote host. The syntax of the nc command is:

nc -z <destination-ip> <destination-port>

UMDS Ports required when vCenter, Hosts and UMDS server are on separate networks

So what ports do you need to open when you have the below 3 objects on different networks?

  • vCenter
  • Hosts
  • UMDS Server

VMware Ports

https://kb.vmware.com/s/article/52962

UMDS, vCenter and ESXi Hosts

Customising an ESXi Image Profile (v6.5U2)

Within AutoDeploy, we sometimes need to update our base ESXi image and this blog will go through the process to do this. We use the HPE Custom Image for VMware ESXi 6.5 U2 Offline Bundle currently but what if we want to add a security patch?

Steps

a) Download the VMware-ESXi-6.5.0-Update2-10719125-HPE-Gen9plus-650.U2.10.4.0.29-Apr2019-depot from myvmware.com

This image has an empty alt attribute; its file name is Customise3-1024x536.png

b) Click the icon to add a new Software depot and add a name

We now see our Software Depot named VMware ESXi 6.5U2 including Patches

Click the green up arrow to upload the VMware-ESXi-6.5.0-Update2-10719125-HPE-Gen9plus-650.U2.10.4.0.29-April2019-depot into the Software Depots within AutoDeploy.

d) Log into VMware’s Patch portal

https://my.vmware.com/group/vmware/patch#search

There are filters which allow you to select the type of update and severity including information about the patch

We will download the latest critical security patch

It downloads as a zip file

Upload this file into AutoDeploy. On the Software Depots tab and click the green up arrow to upload the patch zip file

f) We are now going to clone the VMware-ESXi-6.5.0-Update2-10719125-HPE-Gen9plus-650.U2.10.4.0.29-Apr2019-depot

Click on the VMware-ESXi-6.5.0-Update2-10719125-HPE-Gen9plus-650.U2.10.4.0.29-Apr2019-depot. Under Image Profiles select the vendor image and click Clone. We are cloning the vendor image to replace the updated VIBs.

Fill in the Name, Vendor and description. Choose your newly created software depot

Choose Partner Supported from the drop-down

g) Leave this box for a minute as we need to check the bulletins associated with the security patch we downloaded – Link below for reference

https://docs.vmware.com/en/VMware-vSphere/6.5/rn/esxi650-201903001.html

What we see in this bulletin is the vibs which are updated

h) Use the search function in the clone wizard to find each of the updated VIBs. Un-select the existing version and select the new version to add it to the build. In the example below I have unticked the older version and ticked the newer version

Do the same for the other 3 affected VIBs. Uncheck the older one and tick the newer one

Check the final screen and click Finish

You should now be able to click on your software depot – VMware ESXi 6.5U2 including patches and see the Cloned Image Profile which contains the security patch

i) Now we can add our patched Image Profile into an AutoDeploy Rule

I’m not going to go through the whole process of creating a rule but as you can see below, I can now edit the deploy rule (must be deactivated to edit)

You can then select the software depot which will contain the patched ESXi image with the security patch

j) If you are updating an existing Deploy Rule then you will need to use PowerCLI to connect to the vCenter and run the below command to refresh the Autodeploy cache before rebooting a host and testing the image applies correctly

You can either do a single command on a host you want to test or run a command which updates all the hosts at once. In order to repair a single host to do a test we can use the below piped command. If you get an empty string back then the cache is correct and ready to use the new image

Test-DeployRuleSetCompliance lg-spsp-cex03.lseg.stockex.local | Repair-DeployRuleSetCompliance

Or you can use the piped command below runs the same command on all hosts

foreach ($esx in get-vmhost) {$esx | test-deployrulesetcompliance | repair-deployrulesetcompliance}

k) Reboot a host and test the image applies correctly

Host Profile Compliance Unknown in vSphere 6.5U2

The Problem

This error comes up on a 6.5U2 environment running on HP Proliant BL460c Gen9 Servers. Where it previously had fully compliant hosts, now when checking compliance of a host against a host profile, all you see is Host Compliance Unknown.

The Resolution

I’m still not entirely sure how we went from a state of Compliant hosts to Unknown but this is how we resolved it

  • Deleted the AutoDeploy Rule
  • Recreated the AutoDeploy Rule
  • Ran the below command. Anytime you make a change to the active ruleset that results in a host using a different image profile or host profile or being assigned to a different vCenter location; you need to update the rules in the active ruleset but you also need to update the host entries saved in the cache for the affected hosts.  This is done using the Test-DeployRuleSetCompliance cmdlet together with the Repair-DeployRuleSetCompliance cmdlet or running the single PowerCLI command below

foreach ($esx in get-vmhost) {$esx | test-deployrulesetcompliance | repair-deployrulesetcompliance}

  • Checked the status of the atftpd service in vCenter. Note: We are using the inbuilt TFTP server in vCenter however this is now not supported by VMware but we find it works just fine. Solarwinds is a good alternative.

service atftpd status

service atftpd start

  • Rebooted one host while checking the startup in the ILO console.
  • You may need to remediate your hosts after boot-up or if everything is absolutely configured correctly, it will simply boot up, add itself into the cluster and become compliant

Taking a look at AutoDeploy in vSphere 6.5U2

What is AutoDeploy?

vSphere Auto Deploy lets you provision hundreds of physical hosts with ESXi software.

Using Auto Deploy, administrators can manage large deployments efficiently. Hosts are network-booted from a central Auto Deploy server. Optionally, hosts are configured with a host profile of a reference host. The host profile can be set up to prompt the user for input. After boot up and configuration complete, the hosts are managed by vCenter Server just like other ESXi hosts.

Types of AutoDeploy Install

Auto Deploy can also be used for Stateless caching or Stateful install. There are several more options than there were before which are shown below in a screen-print from a host profile.

What is stateless caching?

Stateless caching addresses this by caching the ESXi image on the host’s local storage. If AutoDeploy is unavailable then the host will boot from its local cached image. There are a few things that need to be in place before stateless caching can be enabled:

  • Hosts should be set to boot from network first, and local disk second
  • Ensure that there is a disk with at least 1 GB available
  • The host should be set up to get it’s settings from a Host Profile as part of the AutoDeploy rule set.

To configure a host to use stateless caching, the host profile that it will receive needs to be updated with the relevant settings. To do so, edit the host profile, and navigate to the ‘System Image Cache Profile Settings’ section, and change the drop-down menu to ‘Enable stateless caching on the host’

Stateless caching can be seen in the below diagram

What is Stateful Caching?

It is also possible to have AutoDeploy install ESXi. When the host first boots it will pull the image from the AutoDeploy server, then on all subsequent restarts the host will boot from the locally installed image, just as with a manually built host. With stateful installs, ensure that the host is set to boot from disk firstly, followed by network boot.

AutoDeploy stateful installs are configured in the same way as stateless caching. Edit the host profile, this time changing the option to ‘Enable stateful installs on the host’:

AutoDeploy Architecture

Pre-requisites

A vSphere Auto Deploy infrastructure will contain the below components

  • vSphere vCenter Server – vSphere 6.7U1 is the best and most comprehensive option to date.
  • A DHCP server to assign IP addresses and TFTP details to hosts on boot up – Windows Server DHCP will do just fine.
  • A TFTP server to serve the iPXE boot loader
  • An ESXi offline bundle image – Download from my.vmware.com.
  • A host profile to configure and customize provisioned hosts – Use the vSphere Web Client.
  • ESXi hosts with PXE enabled network cards 

1.VMware AutoDeploy Server

  • Serves images and host profiles to ESXi hosts.
  • vSphere Auto Deploy rules engine
  • Sends information to the vSphere Auto Deploy server which image profile and which host profile to serve to which host. Administrators use vSphere Auto Deploy to define the rules that assign image profiles and host profiles to host

2. Image Profile Server

Define the set of VIBs to boot ESXi hosts with.

  • VMware and VMware partners make image profiles and VIBs available in public depots. Use vSphere ESXi Image Builder to examine the depot and use the vSphere Auto Deploy rules engine to specify which image profile to assign to which host.
  • VMware customers can create a custom image profile based on the public image profiles and VIBs in the depot and apply that image profile to the host

3. Host Profiles

Define machine-specific configuration such as networking or storage setup. Use the host profile UI to create host profiles. You can create a host profile for a reference host and apply that host profile to other hosts in your environment for a consistent configuration

4. Host customization

Stores information that the user provides when host profiles are applied to the host. Host customization might contain an IP address or other information that the user supplied for that host. For more information about host customizations, see the vSphere Host Profiles documentation.

Host customization was called answer file in earlier releases of vSphere Auto Deploy

5. Rules and Rule Sets

Rules

Rules can assign image profiles and host profiles to a set of hosts, or specify the location (folder or cluster) of a host on the target vCenter Server system. A rule can identify target hosts by boot MAC address, SMBIOS information, BIOS UUID, Vendor, Model, or fixed DHCP IP address. In most cases, rules apply to multiple hosts. You create rules by using the vSphere Client or vSphere Auto Deploy cmdlets in a PowerCLI session. After you create a rule, you must add it to a rule set. Only two rule sets, the active rule set and the working rule set, are supported. A rule can belong to both sets, the default, or only to the working rule set. After you add a rule to a rule set, you can no longer change the rule. Instead, you copy the rule and replace items or patterns in the copy. If you are managing vSphere Auto Deploy with the vSphere Client, you can edit a rule if it is in inactive state

You can specify the following parameters in a rule.

Active Rule Set

When a newly started host contacts the vSphere Auto Deploy server with a request for an image profile, the vSphere Auto Deploy server checks the active rule set for matching rules. The image profile, host profile, vCenter Server inventory location, and script object that are mapped by matching rules are then used to boot the host. If more than one item of the same type is mapped by the rules, the vSphere Auto Deploy server uses the item that is first in the rule set.

Working Rule Set

The working rule set allows you to test changes to rules before making the changes active. For example, you can use vSphere Auto Deploy cmdlets for testing compliance with the working rule set. The test verifies that hosts managed by a vCenter Server system are following the rules in the working rule set. By default, cmdlets add the rule to the working rule set and activate the rules. Use the NoActivate parameter to add a rule only to the working rule set.

You use the following workflow with rules and rule sets.

  1. Make changes to the working rule set.
  2. Test the working rule set rules against a host to make sure that everything is working correctly.
  3. Refine and retest the rules in the working rule set.
  4. Activate the rules in the working rule set.If you add a rule in a PowerCLI session and do not specify the NoActivate parameter, all rules that are currently in the working rule set are activated. You cannot activate individual rules

AutoDeploy Boot Process

The boot process is different for hosts that have not yet been provisioned with vSphere Auto Deploy (first boot) and for hosts that have been provisioned with vSphere Auto Deploy and added to a vCenter Server system (subsequent boot).

First Boot Prerequisites

Before a first boot process, you must set up your system. .

  • Set up a DHCP server that assigns an IP address to each host upon startup and that points the host to the TFTP server to download the iPXE boot loader from.
  • If the hosts that you plan to provision with vSphere Auto Deploy are with legacy BIOS, verify that the vSphere Auto Deploy server has an IPv4 address. PXE booting with legacy BIOS firmware is possible only over IPv4. PXE booting with UEFI firmware is possible with either IPv4 or IPv6.
  • Identify an image profile to be used in one of the following ways.
    • Choose an ESXi image profile in a public depot.
    • Create a custom image profile by using vSphere ESXi Image Builder, and place the image profile in a depot that the vSphere Auto Deploy server can access. The image profile must include a base ESXi VIB.
  • If you have a reference host in your environment, export the host profile of the reference host and define a rule that applies the host profile to one or more hosts.
  • Specify rules for the deployment of the host and add the rules to the active rule set.

First Boot Overview

When a host that has not yet been provisioned with vSphere Auto Deploy boots (first boot), the host interacts with several vSphere Auto Deploy components.

When a host that has not yet been provisioned with vSphere Auto Deploy boots (first boot), the host interacts with several vSphere Auto Deploy components.

  1. When the administrator turns on a host, the host starts a PXE boot sequence.The DHCP Server assigns an IP address to the host and instructs the host to contact the TFTP server.
  2. The host contacts the TFTP server and downloads the iPXE file (executable boot loader) and an iPXE configuration file.
  3. iPXE starts executing.The configuration file instructs the host to make a HTTP boot request to the vSphere Auto Deploy server. The HTTP request includes hardware and network information.
  4. In response, the vSphere Auto Deploy server performs these tasks:
    1. Queries the rules engine for information about the host.
    2. Streams the components specified in the image profile, the optional host profile, and optional vCenter Server location information.
  5. The host boots using the image profile.If the vSphere Auto Deploy server provided a host profile, the host profile is applied to the host.
  6. vSphere Auto Deploy adds the host to thevCenter Server system that vSphere Auto Deploy is registered with.
    1. If a rule specifies a target folder or cluster on the vCenter Server system, the host is placed in that folder or cluster. The target folder must be under a data center.
    2. If no rule exists that specifies a vCenter Server inventory location, vSphere Auto Deploy adds the host to the first datacenter displayed in the vSphere Client UI.
  7. If the host profile requires the user to specify certain information, such as a static IP address, the host is placed in maintenance mode when the host is added to the vCenter Server system.You must reapply the host profile and update the host customization to have the host exit maintenance mode. When you update the host customization, answer any questions when prompted.
  8. If the host is part of a DRS cluster, virtual machines from other hosts might be migrated to the host after the host has successfully been added to the vCenter Server system.

Subsequent Boots Without Updates

For hosts that are provisioned with vSphere Auto Deploy and managed by avCenter Server system, subsequent boots can become completely automatic.

  1. The administrator reboots the host.
  2. As the host boots up, vSphere Auto Deploy provisions the host with its image profile and host profile.
  3. Virtual machines are brought up or migrated to the host based on the settings of the host.
    • Standalone host. Virtual machines are powered on according to autostart rules defined on the host.
    • DRS cluster host. Virtual machines that were successfully migrated to other hosts stay there. Virtual machines for which no host had enough resources are registered to the rebooted host.

If the vCenter Server system is unavailable, the host contacts the vSphere Auto Deploy server and is provisioned with an image profile. The host continues to contact the vSphere Auto Deploy server until vSphere Auto Deploy reconnects to the vCenter Server system.

vSphere Auto Deploy cannot set up vSphere distributed switches if vCenter Server is unavailable, and virtual machines are assigned to hosts only if they participate in an HA cluster. Until the host is reconnected to vCenter Server and the host profile is applied, the switch cannot be created. Because the host is in maintenance mode, virtual machines cannot start.

Important: Any hosts that are set up to require user input are placed in maintenance mode

Subsequent Boots With Updates

You can change the image profile, host profile, vCenter Server location, or script bundle for hosts. The process includes changing rules and testing and repairing the host’s rule compliance.

  1. The administrator uses the Copy-DeployRule PowerCLI cmdlet to copy and edit one or more rules and updates the rule set. .
  2. The administrator runs the Test-DeployRulesetCompliance cmdlet to check whether each host is using the information that the current rule set specifies.
  3. The host returns a PowerCLI object that encapsulates compliance information.
  4. The administrator runs the Repair-DeployRulesetCompliance cmdlet to update the image profile, host profile, or vCenter Server location the vCenter Server system stores for each host.
  5. When the host reboots, it uses the updated image profile, host profile, vCenter Server location, or script bundle for the host.If the host profile is set up to request user input, the host is placed in maintenance mode

Note: Do not change the boot configuration parameters to avoid problems with your distributed switch

Prepare your system for AutoDeploy

Before you can PXE boot an ESXi host with vSphere Auto Deploy, you must install prerequisite software and set up the DHCP and TFTP servers that vSphere Auto Deploy interacts with.

Prerequisites

  • Verify that the hosts that you plan to provision with vSphere Auto Deploy meet the hardware requirements for ESXi. See ESXi Hardware Requirements.
  • Verify that the ESXi hosts have network connectivity to vCenter Server and that all port requirements are met. See vCenter Server Upgrade.
  • Verify that you have a TFTP server and a DHCP server in your environment to send files and assign network addresses to the ESXi hosts that Auto Deploy provisions.
  • Verify that the ESXi hosts have network connectivity to DHCP, TFTP, and vSphere Auto Deploy servers.
  • If you want to use VLANs in your vSphere Auto Deploy environment, you must set up the end to end networking properly. When the host is PXE booting, the firmware driver must be set up to tag the frames with proper VLAN IDs. You must do this set up manually by making the correct changes in the UEFI/BIOS interface. You must also correctly configure the ESXi port groups with the correct VLAN IDs. Ask your network administrator how VLAN IDs are used in your environment.
  • Verify that you have enough storage for the vSphere Auto Deploy repository. The vSphere Auto Deploy server uses the repository to store data it needs, including the rules and rule sets you create and the VIBs and image profiles that you specify in your rules.Best practice is to allocate 2 GB to have enough room for four image profiles and some extra space. Each image profile requires approximately 350 MB. Determine how much space to reserve for the vSphere Auto Deploy repository by considering how many image profiles you expect to use.
  • Obtain administrative privileges to the DHCP server that manages the network segment you want to boot from. You can use a DHCP server already in your environment, or install a DHCP server. For your vSphere Auto Deploy setup, replace the gpxelinux.0 filename with snponly64.efi.vmw-hardwired for UEFI or undionly.kpxe.vmw-hardwired for BIOS.
  • Secure your network as you would for any other PXE-based deployment method. vSphere Auto Deploy transfers data over SSL to prevent casual interference and snooping. However, the authenticity of the client or the vSphere Auto Deploy server is not checked during a PXE boot.
  • If you want to manage vSphere Auto Deploy with PowerCLI cmdlets, verify that Microsoft .NET Framework 4.5 or 4.5.x and Windows PowerShell 3.0 or 4.0 are installed on a Windows machine. You can install PowerCLI on the Windows system on which vCenter Server is installed or on a different Windows system. See the vSphere PowerCLI User’s Guide.
  • Set up a remote Syslog server. See the vCenter Server and Host Management documentation for Syslog server configuration information. Configure the first host you boot to use the remote Syslog server and apply that host’s host profile to all other target hosts. Optionally, install and use the vSphere Syslog Collector, a vCenter Server support tool that provides a unified architecture for system logging and enables network logging and combining of logs from multiple hosts.
  • Install ESXi Dump Collector, set up your first host so that all core dumps are directed to ESXi Dump Collector, and apply the host profile from that host to all other hosts.
  • If the hosts that you plan to provision with vSphere Auto Deploy are with legacy BIOS, verify that the vSphere Auto Deploy server has an IPv4 address. PXE booting with legacy BIOS firmware is possible only over IPv4. PXE booting with UEFI firmware is possible with either IPv4 or IPv6.

Starting to configure AutoDeploy

Step 1 – Enable the AutoDeploy, Image Builder Service and Dump Collector Service

  • Install vCenter Server or deploy the vCenter Server Appliance.The vSphere Auto Deploy server is included with the management node.
  • Configure the vSphere Auto Deploy service startup type.
  • On the vSphere Web Client Home page, click Administration.
  • Under System Configuration, click Services
This image has an empty alt attribute; its file name is Autodeploy-7-1024x487.png
This image has an empty alt attribute; its file name is Autodeploy-8-1024x335.png
  • Select Auto Deploy, click the Actions menu, and select Edit Startup Type and select Automatic
  • (Optional) If you want to manage vSphere Auto Deploy with the vSphere Web Client, configure the vSphere ESXi Image Builder service startup type
  • Check the Startup
  • Log out of the vSphere Web Client and log in again.The Auto Deploy icon is visible on the Home page of the vSphere Web Client
  • Enable the Dump Collector
  • You can now either set the dump collector manually on each host or configure the host profile with the settings
  • If you want to enter it manually and point the dump collector to the vCenter then the following commands are used
  • esxcli system coredump network set –interface-name vmk0 –server-ipv4 10.242.217.11 –server-port 6500
  • esxcli system coredump network set –enable true
  • Enable Automatic Startup

Step 2 Configure the TFTP server

There are different options here. Some people use Solarwinds or there is the option now to use an inbuilt TFTP service on the vCenter

Important: The TFTP service in vCenter is only supported for dev and test environments, not production and will be coming out of future releases of vCenter. It is best to have a separate TFTP server.

Instructions

  • Now that Auto Deploy is enabled we can configure the TFTP server. Enable SSH on the VCSA by browsing to the Appliance Management page: https://VCSA:5480 where VCSA is the IP or FQDN of your appliance.
  • Log in as the root account. From the Access page enable SSH Login and Bash Shell.
  • SSH onto the vCenter Appliance, using a client such as Putty, and log in with the root account. First type shell and hit enter to launch Bash.
  • To start the TFTP service enter service atftpd start
  • Check the service is started using service atftpd status
  • To allow TFTP traffic through the firewall on port 69; we must run the following command. (Note double dashes in front of dport)
  • iptables -A port_filter -p udp -m udp –dport 69 -j ACCEPT
  • Validate traffic is being accepted over port 69 using the following command
  •  iptables -nL | grep 69
  • iptables can be found in /etc/systemd/scripts just for reference
  • Validate traffic is being accepted over port 69 using iptables -nL | grep 69
  • Type chkconfig atftpd on
  • To make the iptables rules persistent is to load them after a reboot from a script.
  • Save the current active rules to a file

iptables-save > /etc/iptables.rules

  • Next create the below script and call it starttftp.sh
#! /bin/sh 
#
# TFTP Start/Stop the TFTP service and allow port 69
#
# chkconfig: 345 80 05
# description: atftpd
### BEGIN INIT INFO
# Provides: atftpd
# Required-Start: $local_fs $remote_fs $network
# Required-Stop:
# Default-Start: 3 5
# Default-Stop: 0 1 2 6
# Description: TFTP
### END INIT INFO
service atftpd start
iptables-restore -c < /etc/iptables.rules
  • Put the starttftp.sh script in /etc/init.d via WinSCP
  • Put full permissions on the script
  • This should execute the command and reload the firewall tables after the system is rebooted
  • Reboot the vCenter appliance to test the script is running. If successful the atftpd service will be started and port 69 allowed, you can check these with service atftpd status and iptables -nL | grep 69.
  • Your TFTP directory is located at /tftpboot/
  • The TRAMP file on the vCenter must also now be modified and the DNS name removed and replaced with the IP address of the vCenter. Auto Deploy will not work without doing this part
  • The directory already contains the necessary files for Auto Deploy (tramp file, undionly.kpxe.vmw-hardwired, etc) Normally if you use Solarwinds TFTP server, you would need to download the TFTP Boot Zip and extract the files into the TFTP Root folder
  • Note there may be an issue with downloading this file due to security restrictions being enabled by some of the well known browsers – This is the likely message seen below. You may have to modify a browser setting in order to access the file
  • If everything is ok then you’ll be able to download it but note again, you do not need to download this if you are using the inbuilt TFTP server in vCenter as the files are already there.

Step 3 – Setting up DHCP options

  • The DHCP server assigns an IP address to the ESXi host when the host boots. The DHCP server also provides two required options to point the host to the TFTP server and to the boot files necessary for vSphere Auto Deploy to work. These additional DHCP options are as follows:
  • 066 – Boot Server Host Name – This option must be enabled, and the IP address of the server running TFTP should be inserted into the data entry field.
  • 067 – Bootfile Name –The “BIOS DHCP File Name” found in the vSphere Auto Deploy settings of your vCenter Server must be used here. The file name is undionly.kpxe.vmw-hardwired.
  • Go to Server Options and click Configure Options

  • In the value for option 066 (next-server) enter the FQDN of the TFTP boot server. In my case my vCenter Server hosting the TFTP service
  • Select option 67 and type in undionly.kpxe.vmw-hardwired.The undionly.kpxe.vmw-hardwired iPXE binary will be used to boot the ESXi host
  • Note: if you were using UEFI, you would need to put snponly64.efi.vmw-hardwired
  • You should now see the two options in DHCP
  • Next we need to add a scope and reservations to this scope
  • Right click IPv4 and select New Scope
  • A wizard will pop up
  • Put in a name and description
  • Put in the network IP range and subnet mask for the scope. Note: I have 3 hosts for testing.
  • Ignore the next screen and click Next
  • Ignore the next screen and click Next
  • Click No to configure options afterwards
  • Click Finish
  • We now need to create a DHCP reservation for each target ESXi host
  • In the DHCP window, navigate to DHCP > hostname > IPv4 > Autodeploy Scope > Reservations.
  • Right-click Reservations and select New Reservation.
  • In the New Reservation window, specify a name, IP address, and the MAC address for one of the hosts. Do not include the colon (:) in the MAC address.
  • The initial installation and setup is now finished and we can now start with the next stage

Stage 4 Image Builder and AutoDeploy GUI

  • The next stage involves logging into myvmware.com and downloading an offline bundle of the version of ESXi you need
  • Go to Home > Autodeploy in vCenter and select Add a Software Depot
  • Click Software Depots and then click Import Software Depot and upload. 4 images are normally recommended space wise.
  • Once uploaded, click on the depot and you should see the below
  • And
  • If you click on an image, you get options above where you can clone or export to an iso for example

Stage 5 – Creating an Deploy Rule

  • A deploy rule gives you control over the deployment process since you can specify which image profile is rolled out and on which server. Once a rule is created, you can also Edit or Clone it. Once created, the rule has to be activated for it to apply. If rules are not activated, Auto Deploy will fail
  • Click on the Deploy Rules tab and add a name
  • Next we want to select hosts that match the following pattern. There are multiple options
  • Asset
  • Domain
  • Gateway IPv4
  • Hostname
  • IPv4
  • IPv6
  • MAC address
  • Model
  • OEM string
  • Serial number
  • UUID
  • Vendor
  • I am going to use an IP range of my 3 hosts which is 192.168.1.100-192.168.1.102
  • Next Select an Image Profile
  • Select the ESXi image to deploy to the hosts, change the software depot from the drop down menu if needed, then click Next. If you have any issues with vib signatures you can skip the signature checks using the tick box.
  • Host Profile selection screen
  • Next Select a location
  • Next you will be on the Ready to Complete screen. Check the details and click Finish if you are happy
  • Note: The rule will be inactive – To use it, you will need to activate it but we will cover this in the next steps
  • The deploy rule is created but in an inactive state. Select the deploy rule and note the options; Activate / Deactivate, Clone, Edit, Delete. Click Activate / Deactivate, a new window will open. Select the newly created deploy rule and click ActivateNext, and Finish.
  • Now the deploy rule is activated; when you boot a host where the deploy rule is applicable you will see it load ESXi and the customization specified in the host profile. Deploy rules need to be deactivated before they can be edited.
  • You can setup multiple deploy rules using different images for different clusters or host variables. Hosts using an Auto Deploy ruleset are listed in the Deployed Hosts tab, hosts that didn’t match any deploy rules are listed under Discovered Hosts

Stage 6 – Reboot the ESXi host and see if the AutoDeploy deployment works as expected.

  • When you reboot a host, it will then come up as per the below screenprint
  • Once booted up, remediate the host
  • If you type in the following URL – https://<vCenter IP>:6502/vmw/rbd, it should take you to the Auto Deploy Debugging page where you can view registered hosts along with a detailed view of host and PXE information as well as the Auto Deploy Cache content

What do you do when you need to modify the Image Profile or Host Profile?

There are 2 commands you need to run to ensure the hosts can pick up the new data from the AutoDeploy rule whether it be a new image or a new/modified host profile. If you don’t run these, you will likely find that when you reboot your vSphere hosts they still boot from the old image”.

Test-DeploySetRuleCompliance <server-name>

Test-DeploySetRuleCompliance <server-name> | Repair-DeploySetRuleCompliance

This situation occurs when you update the active ruleset without updating the corresponding host entries in the auto deploy cache.  The first time a host boots the Auto Deploy server parses the host attributes against the active ruleset to determine (a) The image profile, (b) The host profile, and (c) the location of the host in the vCenter inventory.  This information then gets saved in the auto deploy cache and reused on all future reboots.  The strategy behind saving this information is to reduce the load on the auto deploy server by eliminating the need to parse each host against the rules engine on every reboot.  With this approach each host only gets parsed against the active ruleset once (on the initial boot) after which the results  get saved and reused on all subsequent reboots.

However, anytime you make a change to the active ruleset that results in a host using a different image profile or host profile or being assigned to a different vCenter location.  When you make changes not only do you need to update the rules in the active ruleset but you also need to update the host entries saved in the cache for the affected hosts.  This is done using the Test-DeployRuleSetCompliance cmdlet together with the Repair-DeployRuleSetCompliance cmdlet.

Use the “Test-DeployRuleSetCompliance” cmdlet to check if the host information saved on the auto deploy server is up-to-date.  This cmdlet parses the host attributes against the active ruleset and compares the results with the information saved in the cache.  If the saved information is incorrect (i.e. out of compliance) the cmdlet will return a status of “Non-Compliant” and show what needs to be updated.  If the information in the cache is correct, then the command will simply return an empty string.

Thanks to Kyle Gleed for his blog on the above

https://blogs.vmware.com/vsphere/2012/11/auto-deploy-host-booting-from-wrong-image-profile.html

Steps to test the DeployRuleSetCompliance

  • Connect to vCenter through Putty
  • In order to check one host, we can use Test-DeployRuleSetCompliance lg-spsp-cex03.lseg.stockex.local. it will tell us it is non-compliant
  • In order to repair a single host to do a test we can use the below piped command. If you get an empty string back then the cache is not correct and ready to use the new image
  • Test-DeployRuleSetCompliance lg-spsp-cex03.lseg.stockex.local | Repair-DeployRuleSetCompliance
  • However, if we want to be clever about this because we have a lot of hosts, then we can run a quick simple PowerCLI “foreach” loop so we don’t have to update one host at a time
  • foreach ($esx in get-vmhost) {$esx | test-deployrulesetcompliance | repair-deployrulesetcompliance}
  • At this point, I would now start the TFTP service on the vCenter. Note: If you are using Solarwinds, this not necessary! Unless you want to double check it is all ok first.
  • Next Reboot the hosts and check they come up as the right version, example of our environment below pre and post remediation

Other issues we faced!

Issue 1 – TFTP Service on the vCenter

We used the TFTP service which was inbuilt to the vCenter. What you will find if you use this is that it will start but then it will automatically stop itself after a while which is fine. It’s just a case of remembering to start it. I found that with our HPE hosts, even after modifying the AutoDeploy rule and running the TestDeploy and RepairDeploy rules, that it was still booting from cache. In the ILO screen, you could see it picking up a DHCP address and the DHCP service passing the TFTP server to the host but then it timed out, Once the service was started on the vCenter it was fine.

service atftpd start

service atftp status

Note: Apparently VMware do not support the inbuilt vCenter Service so when we asked how we could keep the service running, we were told they wouldn’t help with it. So probably best to install something like Solarwinds which will keep the service running continuously.

Issue 2 – HPE Oneview Setting for PXE Boot

We found that with HPE BL460 Blades with SSD cards in, sometimes an empty host would boot up and lock a partition. This resulted in the host profile not being able to be applied, settings all over the place and there was absolutely no way of getting round it. We could only resolve it by using gparted to wipe the disk and boot again. There seemed to be no logic though as 5 out of 10 fresh hosts would boot up fine and 5 would not and lock the partition.

This what you would see if you hover over the error in vCenter