After a failed firmware update on my Intel x722 NICs one host came up without its 10 Gbit kernelports (vSAN Network). Every effort of recovery failed and I had to send in my “bricked” host to Supermicro. Normally this shouldn’t be a big issue in a 4-node cluster. But the fact that management interfaces were up and vSAN interfaces were not must have caused some “disturbance” on the cluster and all my VM objects were marked as “invalid” on the 3 remaining hosts.
I was busy on projects so I didn’t have much lab-time anyway, so I waited for the repair of the 4th host. Last week it finally arrived and I instantly assembled boot media, cache and capacity disks. I checked MAC addresses and settings on the repaired host and everything looked good. But after booting the reunited cluster still all objects were marked invalid.
Time for troubleshooting
First I opened SSH shells to each host. There’s a quick powerCLI one-liner to enable SSH throughout the cluster. Too bad I didn’t have a functional vCenter at that time, so I had to activate SSH on each host with the host client.
From the shell of the repaired host I’ve checked the vSAN-Network connection to all other vSAN kernel ports . The command below pings from interface vmk1 (vSAN) to IP 10.0.100.11 (vSAN kernel port of esx01 for example)
vmkping -I vmk1 10.0.100.11
I received ping responses from all hosts on all vSAN kernel ports. So I could conclude there’s no connection issue in the vSAN-network.
Servers and switches are built for use in data centers where noise pollution is only a minor issue. The focus is on maximum performance and cooling. In the Homelab, however, things look different. Server rooms in private households are probably the exception and so most homelabs are located somewhere near the desk. A case fan with high speed can be very annoying.
For my vSAN cluster I use a Netgear XS716T 10 Gigabit switch. During system startup the fans rotate at maximum speed and then settle down a bit in normal mode. But even the lower noise level is still annoying.
We need new fans
As part of a handicraft experiment, I tried to get the noise problem under control and bought some Noctua fans which are popular in the homlab scene. The Netgear switch is equipped with two 40 mm fans. These will be replaced by two Noctua NF-A4x20 fans. A simple exchange would be somewhat unsatisfactory, though. There should be at least some kind of quantification (just a science habit).
In the picture below you can see the original fans of the Netgear 10G switch. The 16-port model is equipped with two fans while the 8-port model has just one.
Disclaimer No.1: Before removing the casing cover, the power supply must be disconnected!
Disclaimer No.2: Opening the casing may void your warranty.
There have been many new releases in the first quarter of 2020. The long anticipated release of Veeam Backup & Recovery version 10, we’ve been waiting for since 2017 and also the latest generation of VMware vSphere. While I had vSAN 7 beta running on my homelab cluster before GA, I’ve worked with Veeam Backup 10 only in customer projects. There’s unfortunately no room for playing with new features unless the customer requests it. One of the new features of Veeam v10 is the ability to use Linux proxies and repositories. With XFS filesystem on the repository you can use the fast clone feature which is similar to ReFS on Windows.
In this tutorial I will show how to:
Deploy and size the Veeam server
Show base configuration to integrate vCenter
Build, configure and deploy a Linux proxy and its integration into backup infrastructure
Build, configure and deploy a Linux XFS repository
Using Veeam Backup on a vSAN Cluster has special design requirements. There’s no direct SAN backup on VMware vSAN because there’s neither a SAN, nor a fabric and nor HBAs. There are only two backup methods available: Network Mode (nbd) and Virtual Appliance Mode (hotadd). The latter is recommended for vSAN, but you should deploy one proxy per host to avoid unnecessary traffic on the vSAN interfaces. Hotadd also utilizes Veeam Advanced Data Fetcher (ADF).
Talking about licenses: Having Linux proxies on each host will reduce the cost of Windows licensing. One more reason to play around with this new feature. A Veeam license will be required too, but as a vExpert I can get a NFR (not for resale) license which is valid for one year. Just one of the advantages of being a vExpert. 🙂
Let the games begin. We’ll need a Veeam server that holds the job database and the main application. The proxy and repository role will be kept on individual (Linux) servers.
If you have joined VMware Customer Experience Improvement Program (CEIP), you’re able to use Skyline-Health in your cluster. In older versions of vSphere/vSAN this feature used to be called vSphere-Health and vSAN-Health respectively. They both have been renamed to Skyline Health. You can access Skyline-Health in the vSphere-Client by navigating to Monitor > vSAN > Skyline-Health.
Today I’ve seen a warning after powering on up my homelab.
Drilling into details showed one of 4 hosts issued a warning: “Proactive rebalance is needed”.
Usually a vSAN cluster will distribute load amongst capacity disks automatically. For some reason that wasn’t the case in my homelab. But there’s help. You can click on “Configure Automatic Rebalance” directly from Skyline-Health (see picture below).
You’ll be redirected to vSAN cluster configuration. As you can see in the screenshot below, my cluster wasn’t configured for automatic rebalance.
Just move the slider and vSAN will automatically start to balance disks. A couple of minutes later the warning had switched to green. Depending on the cluster load and how imbalanced the capacity disks are, this process might take a while.