New Datacore Witness against Split-Brain scenario

DataCore SANsymphony offers software defined storage with transparent mirror in active/active mode.

Recently released version 10 PSP7 now supports a witness to avoid split-brain scenarios.

The Problem

In cases where both DataCore hosts (DC1, DC2) lose mirror (MIR) paths and LAN-connection, a split brain scenario occurs.

Both hosts remain functional and have a fully intact set of data on their storage. Both hosts can handle I/O from initiators in their (split) region. Both datastores receive writes that cannot be mirrored to the opposite site. Those changes cannot be synced if the mirror comes up again.

To rebuild the mirror one site has to be declared master and all changes will be copied in a full recovery to the mirror site. Which means all writes that have been made to the mirror side before, will be lost.

Witness

Here comes the witness. As soon as one DataCore host loses SAN-mirror and LAN-connection to its mirror partner, it will try to reach the witness. If there’s no communication to the witness the host will refuse all I/O to its datastore.

Scenario 1 – without witness

Below I’ve outlined a scenario with two sites in active/active mode. Hosts (H1-H4) are using their preferred local DataCore Servers, but can also switch to the remote site in case of a fault condition.

By breaking the fibre connection between site1 and site2, the Inter-Switch-Link (ISL) of the SAN, the mirror-link (MIR) and the LAN-connection are interrupted. Both datacenters now continue to work autonomously without data synchronization.

As a result both sides of the mirror are divergent and can’t be resynchronized by a log recovery. The only chance to get in sync again, is to declare one side as master and discard the changes on the other side. Dataloss is the result.

Scenario 2 – with witness

Same scenario, but this time with a witness (W) on site1 close to DataCore Server DC1.

Right after breaking the fibre links between sites, DC1 is still able to contact the witness. It will continue to present LUNs to all initiators. Because the SAN ISL is broken, DC1 can only be reached by its local hosts from site1. DataCore host 2 (DC2) has lost all connections to DC1 and also to the witness. It will instantly stop access to its datastores.

Hosts on site2 will face an APD, but data on DC2’s storage will remain frozen and consistent. Once reestablishing the fibre links between site1 and site2, both sides of the mirror can be resynced by a log recovery (green delta arrow) .

Setup

SANsymphony version 10 PSP7 is a prerequisite for having a witness. After installation/update to PSP7 you need to do blind activation of your license keys. I.E. reactivating the cluster without adding new keys. After successful re-activation you’ll see the new witness feature on the license tab.

To define a witness you need PowerShell cmdlets. There’s no GUI yet. Open a Powershell on one of the DataCore hosts and connect to the DataCore server. In the example below the names of both datacore servers are sds1 and sds2 respectively. Witness is a physical server (Veeam Backup proxy on site1)

The first step is essential and wasn’t well documented.

Connect-DCSserver sds1

Now you can define the witness. Choose a name for it. The witness must reply to ping requests.

Add-DcsWitness -Name "witness 1" -Address "172.22.7.110"

Now let’s check if both hosts can reach the witness.

Invoke-DcsWitnessContact

The cmdlet will ask for the given name of the witness (witness 1).

cmdlet Invoke-DcsWitnessContact at command pipeline position 1
Supply values for the following parameters:
Witness: witness 1

ServerId WitnessId ResponseStatus
-------- --------- --------------
7233CB41-D6A1-4730-B9DB... 79506f86bfa748779b71eb1... Success
1503818E-E8E2-4370-9130... 79506f86bfa748779b71eb1... Success

Both DataCore hosts can reach the witness.

We can check an existing witness with the command below.

PS C:\Program Files\DataCore\SANsymphony> Get-DcsWitness
Alias : witness 1
IPAddress : 172.22.7.110
SequenceNumber : 97879
Id : 79506f86bfa748779b71eb12c1ad45b6
Caption : witness 1
ExtendedCaption :
Internal : False

Use witness as default for all vDisks

Now the witness is defined, you can add it to each vDisk individually. But usually it is enough to set it as a default for all vDisks.

Set-DcsServerGroupDefaultWitnessProperties -Address "172.22.7.110"

OurGroup : True
Alias : Server Group
Description :
State : Present
SmtpSettings : DataCore.Executive.SmtpServerSettings
LicenseSettings : DataCore.Executive.LicenseSettings
LicenseType : Regular
ContactData : DataCore.Executive.ContactData
StorageUsed : 49.82 TB
BulkStorageUsed : 0 B
MaxStorage : 100 TB
RecoverySpeed : 32
ExistingProductKeys : {, , , ...}
DataCoreStorageUsed : 0 B
SupportBundleRelayAddress :
MirrorTrunkMappingEnabled : False
SelfHealingDelay : 480
DefaultWitness : f33551cf965e4dc887abe750130db21a
DefaultWitnessOption : Automatic
WitnessAllowed : True
SequenceNumber : 103302
Id : 1f929299-c9f9-419c-a7e9-8c7c694102b5
Caption : Server Group
ExtendedCaption : Server Group
Internal : False

Conclusion

Customers have long asked for a witness feature to prevent split-brain scenarios. We know witnesses from other high available solutions like vSAN or vCenter-HA. DataCore finally made an important step towards data resiliency. Next topics on a wish list would be the ability to setup the witness from the DataCore console and to improve documentation for this new feature.

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *