Teach-The-Expert: vSAN Diskgroup Management on CLI

As part of my work as a trainer, I often come across questions on topics that are only covered in passing or not at all in the course. This series of articles provides trainee IT experts with tools for everyday use.

Intro – What are Diskgroups?

VMware vSAN OSA (original storage architecture) structures the vSAN datastore into disk groups (DG). Each vSAN node can contain up to 5 disk groups. Each of these disk groups consists of exactly one cache device (SSD) and at least one to a maximum of 7 capacity devices per group. These may be either magnetic disks or SSDs, but no combination of the two. We differentiate between cache tier and capacity tier.

Disk groups can be managed using the graphical user interface (GUI). However, there are situations where disk group management on the command line interface (CLI) is necessary or more appropriate.

UUID

Each disk device of a vSAN cluster (OSA) has a universally unique identifier (UUID).

We can list all devices of a vSAN node on the CLI with this command:

esxcli vsan storage list

The sheer amount of information may be a bit too much and we only want to display the lines containing the UUID.

esxcli vsan storage list | grep UUID

We receive a list of all disk devices in the vSAN node. We also receive the UUID of the disk group to which the device is assigned.

If you take a closer look at the output, you will notice that there are some devices whose UUID is identical to the UUID of the diskgroup. Is this a contradiction to the statement that the UUID is unique? No. These are cache devices. Each diskgroup in vSAN OSA consists of exactly one cache device. The disk group adopts the UUID of its cache device. In this way, we can quickly distinguish a cache device from a capacity device.

Delete Diskgroup

A disk group can cope with the loss of a capacity device. Missing information will then be reconstructed by disk groups of other hosts. The loss of a cache device, on the other hand, is equivalent to the loss of the entire disk group. This means: if we remove the cache device of a disk group, we will destroy the disk group.

Attention! Before executing the next command, you should be absolutely sure that this disk group may or should be removed.

esxcli vsan storage remove -u VSAN-Disk-Group-UUID

Replace VSAN-Disk-Group-UUID with the corresponding UUID.

Create Diskgroup

We can create a new disk group on the esxcli. This requires that the selected devices (1 cache and 1-7 capacity) are not assigned to any other DG. They must be in ‘unclaimed‘ status.

esxcli vsan storage add -s naa.xxxx -d naa.xxxx -d naa.xxxx -d naa.xxxx

This reveals the origin of vSAN. SSDs were used as the cache device and magnetic disks as the capacity tier. Nowadays, so-called all-flash configurations are used almost exclusively. However, the parameter -s or –ssd (SSD) is still used for the cache device and -d or –disk for the capacity device. Even if they should all be flash devices. It is advisable to use a text editor to develop this command, as the names of the individual devices can be very long and cryptic. In the case of NVMe devices, the name does not begin with ‘naa.’ but with the prefix ‘t10.

For example:

t10.NVMe____INTEL_SSDPE2KX080T8_____________________ABCD1234EFGHIJKLM56789__00000001

Particularly the underscores are a minefield when entered manually!

With a few lines of CLI code, the entire disk group configuration of a host can be accomplished in a few seconds.

Troubleshooting

Tag as Flash

Occasionally the command to create a new disk group fails.

t10.NVMe____INTEL_SSDPE2KX080T8_____________________PHLJ015501AW8P0HGN__00000001 is an SSD and can not be used as a vSAN HDD

If you are now wondering about this message, as all devices in an all-flash cluster are SSDs, you need to take a look back to the early days of vSAN. Back then, SSDs were only used for caching. We therefore have to label our devices with a tag that identifies them as a cache or capacity device.

Mark as Capacity

esxcli vsan storage tag add -d [Device-Identifier] -t capacityFlash

Command using the example of a real NVMe disk (Capacity):

esxcli vsan storage tag add -disk t10.NVMe____INTEL_SSDPE2KX080T8_____________________BTLJ90810H918P0HGN__00000001 -t capacityFlash

Mark as Cache

esxcli vsan storage tag add -s [Device-Identifier] 

The disk group can now be created with the correct tag on all devices.

Leave a Reply

Your email address will not be published. Required fields are marked *