vSwitch rescue from the CLI

Virtual Distributed Switches have many advantages over standard switches. Because you have a centralized configuration over all hosts they’re less error prone to configuration errors than standard switches. Call me old fashioned but I prefer to have at least the hosts management interface on a standard switch. In case something bad happens, you can still access the host and make changes on the interface.

Recently a customers host had failed. After restoring configuration, for some reason vmnics were swapped between vdSwitches and it wasn’t possible to configure that host neither with hostclient nor with vCenter. The customer was short on vmnics in the past and has configured Management Network on a distributed Portgroup on a distributed vSwitch. This is legal and usually not a problem. In that special case it was a problem. I was literally locked out of the host. Reassigning NICs in the DCUI didn’t work, because they were all claimed by Distributed-vSwitches thus not available for standard switches.

What now ?

There’s help, but you need to access the CLI of DCUI.
Login to DCUI console, select “Troubleshooting Options” in the main menu.

Continue reading “vSwitch rescue from the CLI”

Automatic Segmentation of VDI Endpoints

Automatic VLAN assignment and use of DHCP relays

Software defined datacenters (SDDC) enable us to keep many components within the hypervisors software layer. But sooner or later we need to exit that layer in order to get in touch with the user. Usually Thin- or Zeroclients are used as VDI endpoints. Those hardware boxes are connected by LAN and need to have an IP address.

I will demonstrate how to assign endpoints  to separate them into subnet segments and VLANs and still assign IP addresses by a centralized DHCP server.

Continue reading “Automatic Segmentation of VDI Endpoints”

ESX physical uplink resiliency (part 2)

What is beacon probing?

In my recent blog article “ESX physical uplink resiliency” I’ve discussed countermeasures to harden vSphere traffic against downstream physical failures. Today I will discuss another failover detection method which can handle uplinks that are not yet dead but not functional either.

Reasons for failure can be driver / firmware related errors on the NIC itself, or a broken downstream path (cable / switch).

Beacon probing

Beacon probing is a mechanism, where an ESX host will send out beacon packets over every uplink port every second to verify that each other uplink is reachable.

Continue reading “ESX physical uplink resiliency (part 2)”

Problem with Emulex OneConnect OCe14000 NIC LoM

Troubleshooting driver, firmware and ESXi version combinations

Hardware failures in vSphere clusters normally aren’t a big issue. Almost every component is redundant in one or the other way. If one component fails, another one will jump in and take over its function. Malfunction is a different thing and more serious than failure. Such a Zombie can become a real problem because as long as there are signs of life, a replacement will not jump in and there will be no failover.

I witnessed such a situation after a scheduled reboot of a Top-of-Rack (ToR) switch. An ESXi host that was connected to the switch with a 10 Gbit uplink became malfunctioning but didn’t fail.

As you can see in the picture, the link indicator and activity LEDs are active although the cable has been disconnected. A true sign that there is something wrong.

Continue reading “Problem with Emulex OneConnect OCe14000 NIC LoM”