After a failed firmware update on my Intel x722 NICs one host came up without its 10 Gbit kernelports (vSAN Network). Every effort of recovery failed and I had to send in my “bricked” host to Supermicro. Normally this shouldn’t be a big issue in a 4-node cluster. But the fact that management interfaces were up and vSAN interfaces were not must have caused some “disturbance” on the cluster and all my VM objects were marked as “invalid” on the 3 remaining hosts.
I was busy on projects so I didn’t have much lab-time anyway, so I waited for the repair of the 4th host. Last week it finally arrived and I instantly assembled boot media, cache and capacity disks. I checked MAC addresses and settings on the repaired host and everything looked good. But after booting the reunited cluster still all objects were marked invalid.
Time for troubleshooting
First I opened SSH shells to each host. There’s a quick powerCLI one-liner to enable SSH throughout the cluster. Too bad I didn’t have a functional vCenter at that time, so I had to activate SSH on each host with the host client.
From the shell of the repaired host I’ve checked the vSAN-Network connection to all other vSAN kernel ports . The command below pings from interface vmk1 (vSAN) to IP 10.0.100.11 (vSAN kernel port of esx01 for example)
vmkping -I vmk1 10.0.100.11
I received ping responses from all hosts on all vSAN kernel ports. So I could conclude there’s no connection issue in the vSAN-network.
Patch build 16324942 for ESXi 7.0 has been released on June 23rd 2020. It will raise ESXi 7.0 GA to ESXi 7.0b. As usual I’m patching my homelab systems ASAP. As all hosts are fully compliant with HCL, I chose a fully automated cluster remediation by vSphere Lifecycle Manager (vLCM).
7.0 GA build 15843807 (before) / 7.0b build 16324942 (after)
During host reboot I realized a temperature warning LED on the chassis. A look into IPMI revealed a critical CPU temperature state. Also the fans responsible for CPU airflow ran at maximum speed.
As you can see, system temperature was moderate and fans usually run at low to medium speed under these conditions. Air intake temperature was 25°C.
My ESXi nodes rebooted with the new build 16324942 and there were no errors in vLCM. But I could hear there’s somethin wrong. A fan running at speed over 8000 RPM will tell you there IS something to look after. Also the boot procedure took much longer than usual.
I quickly shut down the whole cluster in order to avoid a core meltdown.
There have been many new releases in the first quarter of 2020. The long anticipated release of Veeam Backup & Replication version 10, we’ve been waiting for since 2017 and also the latest generation of VMware vSphere. While I had vSAN 7 beta running on my homelab cluster before GA, I’ve worked with Veeam Backup 10 only in customer projects. There’s unfortunately no room for playing with new features unless the customer requests it. One of the new features of Veeam v10 is the ability to use Linux proxies and repositories. With XFS filesystem on the repository you can use the fast clone feature which is similar to ReFS on Windows.
In this tutorial I will show how to:
Deploy and size the Veeam server
Show base configuration to integrate vCenter
Build, configure and deploy a Linux proxy and its integration into backup infrastructure
Build, configure and deploy a Linux XFS repository
Using Veeam Backup on a vSAN Cluster has special design requirements. There’s no direct SAN backup on VMware vSAN because there’s neither a SAN, nor a fabric and nor HBAs. There are only two backup methods available: Network Mode (nbd) and Virtual Appliance Mode (hotadd). The latter is recommended for vSAN, but you should deploy one proxy per host to avoid unnecessary traffic on the vSAN interfaces. Hotadd also utilizes Veeam Advanced Data Fetcher (ADF).
Talking about licenses: Having Linux proxies on each host will reduce the cost of Windows licensing. One more reason to play around with this new feature. A Veeam license will be required too, but as a vExpert I can get a NFR (not for resale) license which is valid for one year. Just one of the advantages of being a vExpert. 🙂
Let the games begin. We’ll need a Veeam server that holds the job database and the main application. The proxy and repository role will be kept on individual (Linux) servers.
Recently I’ve upgraded my homelab from 6.7U3 to vSphere7. The workflow is straightforward and very easy. The VMware Design team did a very good job with the UI.
I cannot point that out enough: check the VMware HCL. Just because your system is supported under your current vSphere version, doesn’t mean it’ll be supported under vSphere7 too. On the day I’ve upgraded, vSphere7 was brand new and there were just a few entries in the HCL. But it’s a homelab and if something breaks I don’t care to rebuild it from scratch. Don’t do this in production!
Although my Supermicro E300-9D is not yet certified for version 7.0, it works like a charm. I guess it’s just a matter of time, because the VMware Nano-Edge cluster is based on that hardware.
Before we can start, you need to download the vCenter Server Appliance 7.0 (VCSA) from VMware downloads (Login required). You also need to have new license keys for vCenter, ESXi and vSAN (if yor cluster is hyperconverged).