Read HBA Driver and Firmware Version

Before upgrading an ESXi host, it is best practice to to look at VMware HCL and check compatibility of host and IO devices. The combination of driver version, firmware version and ESXi release is crucial for compatibility. Even minor updates might lead to loss of HCL compatibility. A system that used to be HCL compliant at time of deployment, might no longer be compatible after e.g. the third ESXi update release. Updates can bring new driver versions which in turn might require higher firmware versions.

If you’re lucky you may have a software solution that keeps track of all your firmware and driver versions. Runecast Analyzer for example does a pretty good job and shows you current compatibility issues with a single click. Furthermore you can simulate updates/upgrades to any higher vSphere version and the resulting HCL status.

Unfortunately many customers do not have a software solution like that. In these cases you need to go back to the roots (literally) and gather all information on the ESXi shell. To do so you need to enable SSH service on all hosts you want to verify. That can be done in the vSphere-Client or more elegant and faster by a PowerCLI command.

18. July 201921. July 2019

vSwitch rescue from the CLI

Virtual Distributed Switches have many advantages over standard switches. Because you have a centralized configuration over all hosts they’re less error prone to configuration errors than standard switches. Call me old fashioned but I prefer to have at least the hosts management interface on a standard switch. In case something bad happens, you can still access the host and make changes on the interface.

Recently a customers host had failed. After restoring configuration, for some reason vmnics were swapped between vdSwitches and it wasn’t possible to configure that host neither with hostclient nor with vCenter. The customer was short on vmnics in the past and has configured Management Network on a distributed Portgroup on a distributed vSwitch. This is legal and usually not a problem. In that special case it was a problem. I was literally locked out of the host. Reassigning NICs in the DCUI didn’t work, because they were all claimed by Distributed-vSwitches thus not available for standard switches.

What now ?

There’s help, but you need to access the CLI of DCUI.
Login to DCUI console, select “Troubleshooting Options” in the main menu.

26. February 2019

ESXi host restore with obstacles

Unable to re-join EVC cluster after restore of ESXi system

Changing boot media of ESXi hosts (unfortunately) has become a routine job. It is based on the fact, that many flash media have a limited lifespan. To be fair, I need to point out that many customers use (cheap and dirty) USB flash sticks as boot media. But what is good in a homelab, turns out to be a bad idea in enterprise environments.

The usual procedure for media replacement is fairly simple:

export host configuration
evacuate and shut down host
prepare fresh boot medium with installation ISO that has the same or lower patchlevel as the old installation
boot freshly installed host
apply (intermediate) IP address if no DHCP available
restore host configuration
re-connect to cluster
apply patches if neccessary

So far so good. But last week I had a nasty experience with a recovered ESXi host. Continue reading “ESXi host restore with obstacles”