In the old days of virtualization a vCenter used to be a nice-to-have commodity. But these times are long gone (at least from an IT point of view). In today’s datacenter many services and applications rely heavily on vCenter. Some of the most common use-cases are VDI-environments, cluster balancing mechanisms like DRS or Storage-DRS and even backup software needs vCenter.
The last one is a crucial point. It’s good to have your vCenter Server Appliance (VCSA) backed up regularly and most of you and your customers will likely do so. But think of what would happen if you’d loose your vCenter for like 10 minutes or even an hour.
It’s not just important to have a backup of it – you also need to return to operation fast and minimize your Recovery-Time-Objective (RTO).
vCenter is down
Loosing your VCSA is not hypothetical – it happens. I’ve witnessed two cases in the last two months where VCSA was left broken due to a temporary problem on the storage side. The cluster went into an APD state and became unresponsive. Most of the Windows-VMs just froze, but continued to work after storage paths came back online. Modern Linux VMs – and this includes the VCSA – are more sensitive for these situations. Once you try to reboot, you’ll see most likely a message like that on the console:
fsck failed. Please repair manually and reboot. The rootfile system is currently mounted read-only. To remount it read-write do:
bash# mount -n -o remount,rw
It even happens to the best
To understand this you need to know, that modern VCSA 6.0 and later are using Logical Volume Manager (LVM), which is harder to troubleshoot than oldschool mounts.
There is a good blogpost by Cormac Hogan about LVM and fsck on VCSA partitions, who had to learn it the hard way too.
You might try to fix the filesystem and partitions on your broken VCSA, but don’t expect too much. To execute the mount and fsck you’ll need a bash shell, which is not enabled in VCSA. Enabling it on a broken appliance might work – if you’re lucky.
Restore from backup with obstacles
I usually don’t bother repairing a crashed VCSA. Whenever I’ve tried, it failed for one or the other reason. You have to be aware of the implications of a restored vCenter, but they’re usually a minor problem.
Don’t waste time! Get the backups!
Restoring a VM has become very comfortable with a backup solution like Veeam Backup & Replication. With Veeam you have a bunch of possibilities at your hands.
- Use instant recovery by running the VM from the backup archive.
- Use Quick Restore to restore only changed blocks since the last backup.
- Recover single VM files like a VMX or a VMDK
- Recover entire VM to original or different location
The drawback is that for most of these methods you’ll need a functional vCenter. So restoring a vCenter Appliance is a different task than restoring any other VM. First you need to register a single ESX host to your Veeam-Server and select it as backup target for the restored VCSA. Make sure to overwrite the broken one or at least delete it after successful recovery.
There’s nothing wrong with this method. But it costs precious time and you can only do full restores.
Better with replication
If disaster hasn’t struck yet, it’s a smart idea to plan in advance.
Veeam Backup & Replication offers a second option (as the name indicates) to create a replica-VM. To do so, create a replication job for your VCSA within the Veeam console.It will create a VM with a defined number of restore points. Each representing a point in time of the replication job.
If your vCenter breaks for whatever reason you can startup the replica in seconds. The VCSA replica will take over the role of the original VCSA.
Once the vCenter is online again, the pressure will be gone. Users can login their VDI again, vSphere-Clusters become manageable and can be balanced by DRS.
The presence of a functional vCenter offers you now a couple of strategies.
- Use the replica to restore your VCSA from backup (without the need to register a single host).
- Perform a permanent failover to the replica. Veeam Backup & Replication removes snapshots (restore points) of the VM-replica from the snapshot chain and deletes associated files from the datastore. Veeam Backup & Replication removes the VM replica from the list of replicas in the Veeam Backup & Replication console. It also modifies the replication job to exclude the original VM from processing.