Replace Intel igbn Driver

There have been issues with VMware network driver igbn which is responsible for Intel 82580, I210, I350, and I354 Gigabit Ethernet Controllers. Under certain conditions this can lead to a PSOD, which makes it a critical issue for all hosts with one of the ethernet controllers mentioned above.

Currently there’s no VMware patch to solve the problem. It is recommended to replace the VMware driver with a newer version (1.4.10) of Intels native driver.

If we start SSH service on the host, we can check the installed igbn version.

esxcfg-module -i igbn 

esxcfg-module module information
input file: /usr/lib/vmware/vmkmod/igbn
License: ThirdParty:Intel Propietary
Version: 0.1.1.0-5vmw.670.3.73.14320388

Updating the driver

First we have to download the driver package from VMware (login required) and extract the archive. It contains a documentation with release notes and update guide, a VMware Installation Bundle (VIB) and an offline bundle (ZIP). While it is possible to install the VIB on a command shell from an ESXi host, it is more convenient to use VMware Update Manager (VUM). The latter is the procedure I will explain here.

Open vSphere-Client and go to Menu > Update Manager. If you’re not running vSphere 6.7 U1 or later, you’ll have to use the infamous Web-Client (Flash-Client). Select Updates and click on “Upload from File”.

Select the extracted ZIP File (Offline Bundle). Just to avoid some confusion: The file you’ve downloaded from VMware is a ZIP-archive. Extract it once. Within that archive there’s another ZIP-archive. Do not extract that one! From the dialogue we select that ‘inner’ ZIP-file for upload to VUM.

Continue reading “Replace Intel igbn Driver”

vSAN Health – vSAN Disk Balance

If you have joined VMware Customer Experience Improvement Program (CEIP), you’re able to use Skyline-Health in your cluster. In older versions of vSphere/vSAN this feature used to be called vSphere-Health and vSAN-Health respectively. They both have been renamed to Skyline Health. You can access Skyline-Health in the vSphere-Client by navigating to Monitor > vSAN > Skyline-Health.

Today I’ve seen a warning after powering on up my homelab.

Drilling into details showed one of 4 hosts issued a warning: “Proactive rebalance is needed”.

Usually a vSAN cluster will distribute load amongst capacity disks automatically. For some reason that wasn’t the case in my homelab. But there’s help. You can click on “Configure Automatic Rebalance” directly from Skyline-Health (see picture below).

You’ll be redirected to vSAN cluster configuration. As you can see in the screenshot below, my cluster wasn’t configured for automatic rebalance.

Just move the slider and vSAN will automatically start to balance disks. A couple of minutes later the warning had switched to green. Depending on the cluster load and how imbalanced the capacity disks are, this process might take a while.

Links

VMware KB 2149809 – vSAN proactive rebalance

Why does a vSAN cluster need slack space?

I usually get a lot of questions during trainings or in the process of vSAN designs. People ask me why there is a requirement for 30% of slack space in a vSAN cluster. If you look at it without going deeper, it looks like a waste of (expensive) resources. Especially with all-flash clusters it’s a strong cost factor. Often this slack space is mistaken as growth reserve. But that’s wrong. By no means it’s a reserve for future growth. On the contrary – it is a short term allocation space, needed by the vSAN cluster for rearrangements during storage policy changes.

Continue reading “Why does a vSAN cluster need slack space?”

Unclaim vSAN Disks in ESXi Host

While playing with the latest ESXi / vSAN beta, I ran into a problem. I was about to deploy a vCenter Server Appliance (VCSA) onto a single ESXi host, that was designated to become a vSAN Cluster. During initial configuration of vCenter something stalled. Needless to say that it’s been a DNS problem. 😉

That part of vCenter/vSAN deployment is delicate. If something goes wrong here, you have to start over again and deploy a new vCenter appliance. When you run the installer a second time (after you have fixed your DNS issues) you won’t see any disk devices to be claimed by vSAN. Where have they gone? Well, actually they are still there, but during the first deployment effort they were claimed by vSAN and now form a vSAN datastore. But a greenfield vSAN deployment on a first host needs disks that do not contain any vSAN or VMFS datastore.

How to release disks?

Usually you can remove Disk Groups in vCenter. But we don’t have a vCenter at this point. Looks like a chicken-and-egg problem. But we do have a host and a shell and esxcli. Start SSH service on the host and connect to the shell (e.g. Putty).

Continue reading “Unclaim vSAN Disks in ESXi Host”