Problem with Emulex OneConnect OCe14000 NIC LoM

Troubleshooting driver, firmware and ESXi version combinations

Hardware failures in vSphere clusters normally aren’t a big issue. Almost every component is redundant in one or the other way. If one component fails, another one will jump in and take over its function. Malfunction is a different thing and more serious than failure. Such a Zombie can become a real problem because as long as there are signs of life, a replacement will not jump in and there will be no failover.

I witnessed such a situation after a scheduled reboot of a Top-of-Rack (ToR) switch. An ESXi host that was connected to the switch with a 10 Gbit uplink became malfunctioning but didn’t fail.

As you can see in the picture, the link indicator and activity LEDs are active although the cable has been disconnected. A true sign that there is something wrong.

Continue reading “Problem with Emulex OneConnect OCe14000 NIC LoM”

Upgrade ESXi 6.5 with Fujitsu Custom Image

VIB Conflict

Host upgrades with custom images offer extended driver support for vendor specific hardware or agents. You’ll get drivers that are not included in a standard VMware (Vanilla) image. Upgrading with customized images may lead into trouble while updating existing driver packages. There used to be a nasty bug with the lsiprovider package on Fujitsu ESXi 5.1 images. Another example was the “death by upgrade” bug (blog post in German) when upgrading a customized Fujitsu installation to ESXi 6.0. There are other examples from different vendors in the hall of shame.

Continue reading “Upgrade ESXi 6.5 with Fujitsu Custom Image”

ESX physical uplink resiliency

Ensure vmnic uplink redundancy with Link State Tracking / Smart Links

A vSphere cluster is redundant in many aspects. The loss of one component may not lead to a loss of functionality. Therefore we are building RAID sets from multiple disk drives, have redundant controllers in our storages, have multiple paths, redundant LAN- and SAN-switches and multiple uplinks from a host to the physical network.

VMware vSphere uses multiple physical NICs to form a logical NIC in order to gain redundancy. This is crucial for kernelports, which are responsible for vMotion, Management Network, FT, iSCSI and Heartbeats.

But there are scenarios where all vmnics have physical uplink, but a path loss further downstream towards the core lets packets wander into a black hole.

We will now discuss some network architectures and how to work around the issue.

Continue reading “ESX physical uplink resiliency”

Backup and Restore of ESXi host configurations with PowerCLI

I’m a big fan of PowerCLI one-liners. 🙂

Before performing updates, upgrades or any other maintenance on ESXi hosts, you should backup your ESXi host configuration. Setting up a new ESXi host as replacement is a no-brainer, but rebuilding a lost configuration can be a PITA and might take hours.

In the old times it was necesary to open a SSH shell connection or to use vSphereCLI to issue backup commands to ESXi hosts. Recently I realized that there is a very handy PowerShell commandlet to backup and restore the configuration. Continue reading “Backup and Restore of ESXi host configurations with PowerCLI”