ESX physical uplink resiliency (part 2)

What is beacon probing?

In my recent blog article “ESX physical uplink resiliency” I’ve discussed countermeasures to harden vSphere traffic against downstream physical failures. Today I will discuss another failover detection method which can handle uplinks that are not yet dead but not functional either.

Reasons for failure can be driver / firmware related errors on the NIC itself, or a broken downstream path (cable / switch).

Beacon probing

Beacon probing is a mechanism, where an ESX host will send out beacon packets over every uplink port every second to verify that each other uplink is reachable.

Continue reading “ESX physical uplink resiliency (part 2)”

Problem with Emulex OneConnect OCe14000 NIC LoM

Troubleshooting driver, firmware and ESXi version combinations

Hardware failures in vSphere clusters normally aren’t a big issue. Almost every component is redundant in one or the other way. If one component fails, another one will jump in and take over its function. Malfunction is a different thing and more serious than failure. Such a Zombie can become a real problem because as long as there are signs of life, a replacement will not jump in and there will be no failover.

I witnessed such a situation after a scheduled reboot of a Top-of-Rack (ToR) switch. An ESXi host that was connected to the switch with a 10 Gbit uplink became malfunctioning but didn’t fail.

As you can see in the picture, the link indicator and activity LEDs are active although the cable has been disconnected. A true sign that there is something wrong.

Continue reading “Problem with Emulex OneConnect OCe14000 NIC LoM”

Upgrade ESXi 6.5 with Fujitsu Custom Image

VIB Conflict

Host upgrades with custom images offer extended driver support for vendor specific hardware or agents. You’ll get drivers that are not included in a standard VMware (Vanilla) image. Upgrading with customized images may lead into trouble while updating existing driver packages. There used to be a nasty bug with the lsiprovider package on Fujitsu ESXi 5.1 images. Another example was the “death by upgrade” bug (blog post in German) when upgrading a customized Fujitsu installation to ESXi 6.0. There are other examples from different vendors in the hall of shame.

Continue reading “Upgrade ESXi 6.5 with Fujitsu Custom Image”

Find VMs without tags

Check Backup-Tag SLA

VMware tags are a versatile tool to dynamically assign VMs to groups. One use-case is leveraging VM-Tags to guarantee backup-SLA. Im my case there’s a category named “Backup” which contains several backup SLA tags for weekly or daily backups.

Oneliner

With PowerCLI you can find out quickly which VMs have no tags.

connect-viserver myVC
get-vm | ?{ (get-tagassignment $_) -eq $null}

This query isn’t sufficient yet. It’ll report only VMs that have no tags at all. But we’d like to find VMs that have no tags from the category “Backup”. So we have to modify our query a little bit.

get-vm | ?{ (get-tagassignment $_ -category Backup) -eq $null}

You need to adjust your query with the corresponding category name.