Patch build 16324942 for ESXi 7.0 has been released on June 23rd 2020. It will raise ESXi 7.0 GA to ESXi 7.0b. As usual I’m patching my homelab systems ASAP. As all hosts are fully compliant with HCL, I chose a fully automated cluster remediation by vSphere Lifecycle Manager (vLCM).
7.0 GA build 15843807 (before) / 7.0b build 16324942 (after)
During host reboot I realized a temperature warning LED on the chassis. A look into IPMI revealed a critical CPU temperature state. Also the fans responsible for CPU airflow ran at maximum speed.
As you can see, system temperature was moderate and fans usually run at low to medium speed under these conditions. Air intake temperature was 25°C.
My ESXi nodes rebooted with the new build 16324942 and there were no errors in vLCM. But I could hear there’s somethin wrong. A fan running at speed over 8000 RPM will tell you there IS something to look after. Also the boot procedure took much longer than usual.
I quickly shut down the whole cluster in order to avoid a core meltdown.
None of the issues above did fit my observed problem. A good startpoint should be a look into vua.log on the affected host.
Unfortunately that didn’t help either. So we had (again) a closer look at the VMware upgrade path matrix. A direct host upgrade from ESXi 6.0 to ESXi 6.7U3 is supported but while we re-checked the matrix our attention was drawn to a little footnote.
KB 76555 says there’s an issue with expired VIB certificates on hosts below a specific build numer.
ESXi 6.0 GA before build 9239799
ESXi 6.5 GA before build 8294253
In fact our ESXi host 6.0 had a build level of 7967664 (U3e) which is in the critical range. So we had to install some patches up to July 2018 (ESXi600-201807001). After that the upgrade to ESXI 6.7U3 went flawlessly.
What went wrong?
Of course we did check the matrix during the planning phase in early March 2020. That’s a standard operating procedure. Unfortunately something has changed in the meantime (the footnote was added). KB 76555 was updated in May 2020 and the issue affects upgrades to versions of ESXi 6.7 beyond April 28th 2020.
Take home message: Check your design and matrices again right before the projects starts.
There have been issues with VMware network driver igbn which is responsible for Intel 82580, I210, I350, and I354 Gigabit Ethernet Controllers. Under certain conditions this can lead to a PSOD, which makes it a critical issue for all hosts with one of the ethernet controllers mentioned above.
Currently there’s no VMware patch to solve the problem. It is recommended to replace the VMware driver with a newer version (1.4.10) of Intels native driver.
If we start SSH service on the host, we can check the installed igbn version.
First we have to download the driver package from VMware (login required) and extract the archive. It contains a documentation with release notes and update guide, a VMware Installation Bundle (VIB) and an offline bundle (ZIP). While it is possible to install the VIB on a command shell from an ESXi host, it is more convenient to use VMware Update Manager (VUM). The latter is the procedure I will explain here.
Open vSphere-Client and go to Menu > Update Manager. If you’re not running vSphere 6.7 U1 or later, you’ll have to use the infamous Web-Client (Flash-Client). Select Updates and click on “Upload from File”.
Select the extracted ZIP File (Offline Bundle). Just to avoid some confusion: The file you’ve downloaded from VMware is a ZIP-archive. Extract it once. Within that archive there’s another ZIP-archive. Do not extract that one! From the dialogue we select that ‘inner’ ZIP-file for upload to VUM.