vSAN Objects invalid

After a failed firmware update on my Intel x722 NICs one host came up without its 10 Gbit kernelports (vSAN Network). Every effort of recovery failed and I had to send in my “bricked” host to Supermicro. Normally this shouldn’t be a big issue in a 4-node cluster. But the fact that management interfaces were up and vSAN interfaces were not must have caused some “disturbance” on the cluster and all my VM objects were marked as “invalid” on the 3 remaining hosts.

I was busy on projects so I didn’t have much lab-time anyway, so I waited for the repair of the 4th host. Last week it finally arrived and I instantly assembled boot media, cache and capacity disks. I checked MAC addresses and settings on the repaired host and everything looked good. But after booting the reunited cluster still all objects were marked invalid.

Time for troubleshooting

First I opened SSH shells to each host. There’s a quick powerCLI one-liner to enable SSH throughout the cluster. Too bad I didn’t have a functional vCenter at that time, so I had to activate SSH on each host with the host client.

From the shell of the repaired host I’ve checked the vSAN-Network connection to all other vSAN kernel ports . The command below pings from interface vmk1 (vSAN) to IP (vSAN kernel port of esx01 for example)

vmkping -I vmk1

I received ping responses from all hosts on all vSAN kernel ports. So I could conclude there’s no connection issue in the vSAN-network.

Continue reading “vSAN Objects invalid”

Announcement of VMware Cloud Foundation 4.1

Together with vSphere7 and vSAN7, VMware Cloud Foundation (VCF) 4.0 with Tanzu was released in March this year. Now VMware has announced VMware Cloud Foundation 4.1 along with vSphere 7.0 Update1 and vSAN 7 U1, which builds on some of the new features of vSAN.

What’s new?

  • vSAN Data Persistence Platform – This is an important feature for manufacturers of virtual appliances and container solutions that run on vSAN or VCF. Not all Container workloads are stateless. Some of them like object storage or NoSQL are stateful applications. Until now, separate replication mechanisms by the application were necessary. With vSAN Persistence Platform the providers are able to directly use the high availability of vSAN. First providers are for example MinIO, DataStax, Dell EMC ObjectScale or Cloudian.
  • VMware Cloud Foundation Remote Clusters – A feature based on vSAN HCI Mesh. With this feature vSAN Datastores of other clusters can be integrated. This is especially interesting for remote locations.
  • vVols in workload domains – Now you can deploy workload domains on vVol enabled storage. Supported protocols are FC, iSCSI and NFS.
  • Automatic deployment of vRealize Suite 8.1 – vRealize Suite Life Cycle Manager (vRSLCM) now integrates with SDDC manager. You can deploy and update vRealize products from vRSLCM.
  • New features and bugfixes in SDDC manager.
  • VMware Skyline support for VCF.

The update is expected in early October during or shortly after VMworld2020.

Update error VCSA 7 – vCenter Server not operational

During patching of a vCenter server appliance (VCSA) problems can occur. Maybe contact to the update source was lost or the whole process has been cancelled by an operator. If you try to reapply the patch, you might see an error like in the picture below.

Update Installation failed. VCenter Server is not operational.

In the VAMI interface of vCenter everything looks fine. All services are up and running and ovarall status is green. Even a reboot of the appliance doesn’t help. The source of the problem lies in an interrupted update procedure which leaves a status file behind. We need to fix (remove) that manually.

To do so open a SSH shell to the vCenter server appliance and change to the directory where the file was left.

# cd /etc/applmgmt/appliance

You’ll see a file called software_update_state.conf. Under normal circumstances this file will be removed after an update. But something went wrong and it wasn’t cleaned up. Let’s have a brief look inside the file.

# cat software_update_state.conf
"state": "INSTALL_FAILED",
"version": "",
"latest_query_time": "2020-09-17T11:42:37Z"

We can see that there’s been a failed update to VCSA You can just remove the file.

# rm software_update_state.conf

If you now trigger a new patch installation it will work.

Using more than one dvSwitch for overlay traffic in a VCF 4.0.1 VxRail cluster

SDDC-Manager is the central management tool in a vCloud Foundation (VCF) environment. You can add workload domains, import clusters to workload domains (WLD) or add Kubernetes namespaces. For every task there’s workflow in the GUI of SDDC-Manager.

Currently, as of version VCF 4.0.1, it is not possible to add a cluster with more than two uplinks and more than one vdSwitch to a WLD. If you try to do that in the GUI, you can only define one dvSwitch with two uplinks.

What now?

There’s help inside SDDC-manager.

Continue reading “Using more than one dvSwitch for overlay traffic in a VCF 4.0.1 VxRail cluster”