Storage 101- The Synchronous Mirror Dilemma

A brief introduction into High Availabilty

Keeping data identical at two locations is becoming increasingly important in a highly available IT world. A couple of years back in time it used to be an expensive enterprise level luxury. But recently that demand can be found in SMB environments too. The method is called mirroring which can be implemented in two ways.

  • Asynchronous – Data is being synchronized in defined intervals. In between there is a difference (delta) between source and target.
  • Synchronous – Data transfer is transaction consistent. I.e. the data is identical on both sides at all times. A write operation is only considered complete when source and target site have confirmed the write.

A prerequisite for high availability is mirroring of data (synchronous or asynchronous). If the data is available at two locations (data centers), a further design question arises: Should the storage target act as a fallback copy in case of emergency (Active-Passive), or should the data be actively used in both locations (Active-Active)?

  • Active-Passive – Only the active side works and data is transferred to the passive side (synchronous or asynchronous). In case of a failiure, the system switches automatically or manually and the previously passive side becomes active. It remains so until a failback is triggered. This method guarantees full performance even in the event of a total site failure. Resources must be equal on both sides. The disadvantage is that only a maximum of 50% of the total resources may be used.
  • Active-Active – Resources of both sides can be used in parallel and the hardware is utilized more efficiently. However, this means that in the event of a failure, half of the resources are lost and full performance cannot be guaranteed. Active-Active designs require a synchronous mirror, as both sides have to work with identical data.

Active-Active clusters do exist in many different forms. There’s classic SAN storage with integrated mirroring, or software defined storage (sds) where the mirroring is not in hardware but in the software layer. One example is DataCore SANsymphony. VMware vSAN Stretched Cluster plays a special role and will not be covered in this post.

In the following section I will discuss a special pitfall of LUN based active-active constructs, which is often neglected, but can lead to data loss in case of an error. VMware vSAN is not affected because its stretched cluster is based on a different design which prevents the following issue.

Continue reading “Storage 101- The Synchronous Mirror Dilemma”

Host Upgrade fails with “Cannot execute upgrade script on host”

I recently had the pleasue to time-warp a dinosaur upgrade an old ESXi 6.0 host to ESXi 6.7. Right after I triggered remediation with a current ESXi 6.7 iso image, I got an error message:

Cannot execute upgrade script on host

That message isn’t really specific. If you google it you’ll probably find a dozen possible reasons tor the failure. That can be:

None of the issues above did fit my observed problem. A good startpoint should be a look into vua.log on the affected host.

less /var/log/vua.log

Unfortunately that didn’t help either. So we had (again) a closer look at the VMware upgrade path matrix. A direct host upgrade from ESXi 6.0 to ESXi 6.7U3 is supported but while we re-checked the matrix our attention was drawn to a little footnote.

KB 76555 says there’s an issue with expired VIB certificates on hosts below a specific build numer.

  • ESXi 6.0 GA before build 9239799
  • ESXi 6.5 GA before build 8294253

In fact our ESXi host 6.0 had a build level of 7967664 (U3e) which is in the critical range. So we had to install some patches up to July 2018 (ESXi600-201807001). After that the upgrade to ESXI 6.7U3 went flawlessly.

What went wrong?

Of course we did check the matrix during the planning phase in early March 2020. That’s a standard operating procedure. Unfortunately something has changed in the meantime (the footnote was added). KB 76555 was updated in May 2020 and the issue affects upgrades to versions of ESXi 6.7 beyond April 28th 2020.

Take home message: Check your design and matrices again right before the projects starts.

VeeamON Tour 2020 Virtual Event

22. June 2020 9:30 – 13:30 CEST

The annual Veeam Roadshow will be held as Virtual Event this year.

Topics

  • New Veeam Availability Suite™ v10
  • Live-Stream and On-Demand-Content
  • Tecnical deep dive sessions by Veeam Systems Engineers
  • Information about Cloud as scalable and efficient backup target
  • Networking with other IT-Professionals and experts

Speakers

  • Anton Gostev – Senior Vice President, Product Management
  • Danny Allan – CTO and SVP Product Strategy
  • Stefano Heisig – Senior Systems Engineer, DE
  • David Bewernick – Senior Systems Engineer, DE
  • Benedikt Däumling – Senior Systems Engineer, DE
  • Marco Horstmann – Senior Systems Engineer, DE
  • Stephan Herzig – Systems Engineer, CH
  • Herbert Szumovski – Systems Engineer, AT
  • Ivan Cioffi – Systems Engineer, CH
  • Andreas Lesslhumer – Systems Engineer, AT

VMUG GermanyVirtual Events 2020

Forced by the Corona Crisis we had to postpone our German VMUG UserCon 2020 to December 11th 2020. Meanwhile we’ll provide short bi-weekly virtual events. One hour, one speaker, one topic.

First speaker will be Niels Hagoort (VMware), co-author of Host Resources Deep Dive und Clustering Deep Dive.

You can join the Zoom session for free (VMUG registration required).