Manage ESXi Coredump Files

Okay, admit it, this is not a new topic, but it cost me some time in a client project. Since this blog also acts as a swap partition of my brain, I wrote it down for future reference. It is important to follow the steps correctly so that the changes are preserved after a reboot.

Why a Coredump-File?

Modern ESXi installations starting with version 7 use a new partition layout of the boot device. Coredumps are also located there. But only when the boot medium is not a USB flash medium and not an SD card. In such cases the coredump is relocated to a VMFS datastore with at least 32GB capacity.

This is exactly the case I found in a customer environment. The system was migrated from vSphere 6.7 and therefore still had the old boot layout on a ( at that time still fully supported) SD-Card RAID1. We found a vmkdump folder with files for each host on one of the shared VMFS datastores. This (VMFS5) datastore was supposed to be decommissioned and replaced with a VMFS6 datastore. (Side note from the VCI: there is no online migration path from VMFS5 to VMFS6) 😉 So the vmkdump files had to be removed from there.

Procedure

First, we get an inventory of the coredump files.

esxcli system coredump file list

All coredump files of all ESXi hosts are listed here. Each line contains the path and the Active and Configured (true or false) states. Active means that this is the current coredump file of this host. It is important that the value for Configured also has the status ‘true’. Otherwise the setting will not survive a reboot. Only the coredump file of the current host has the status ‘active’. All other files belong to other hosts and are therefore active=false.

By default, the host chooses the first matching VMFS datastore. This is not necessarily the desired one.

Remove the current Coredump-File

First we delete the active coredump file of the host. We have to force the removal because it is set as active=true.

esxcli system coredump file remove --force

If we execute the list command from above again, there should be one line less.

Add a new Coredump File

The next command creates a new coredump file at the destination. If it does not already exist, a vmkdump folder is created and the dumpfile is created in it. We specify the desired file name without extension, because it will be created automatically (.dumpfile).

esxcli system coredump file add -d <Name | UUID> -f <filename>

Example: Name of the host is “ESX-01” and the VMFS datastore has the name “Service”. The datastore may be specified as either DisplayName or Datastore_UUID.

esxcli system coredump file add -d Service -f ESX-01

A folder vmkdump will be created on the designated datastore and a file named ESX-01.dumpfile will be created in it. We can check this using the list command.

esxcli system coredump file list

A new line will appear with the full path to the new dumpfile. However, the status is still active=false and configured=false. It might be useful to copy this full path to the clipboard, because it is required in the next step.

Activate Dumpfile

In the following step, we set the created dumpfile to active. This way, the setting is retained even after a host reboot. We specify the complete path to the dumpfile. The copy from the clipboard is helpful here and avoids typos.

esxcli system coredump file set -p <path_to_dumpfile>

Example:

esxcli system coredump file set -p /vmfs/volumes/<UUID>/vmkdump/ESX-01.dumpfile

A final List command validates the result.

Links

ExaGrid Time-Lock – Who’s (still) afraid of ransomware?

Introduction

Ransomware currently represents one of the most prominent threats to IT infrastructures. Reports of successful attacks are accumulating, the attacks are getting closer. More than 30% of all companies, institutes, universities or public authorities in Germany have already dealt with such attacks. In some cases, a ransom was paid to get access to their own data again.

Even with payment, success is never certain. After all, one negotiates with criminals. Authorities therefore advise against payment.

The essential protective measure against the consequences of such an attack is an up-to-date and consistent backup.

Ransomware vs. Backup

Unfortunately, attackers also know about the importance of backups. The currently circulating malware, such as Emotet or Ryuk, contain code that actively searches for backups on the net. Using previously obtained access data for Active Directory accounts or by attacking via RDP exploits or using the brand-new Zerologon exploit an attempt could be made to take over the systems that operate the data backup in the company or hold the backup data.

The automatic attack is often followed by hackers in the flesh who actively browse the net and try to destroy all backups. This is often an easy task, since backups today are preferably held on hard disk systems, permanently connected to the infrastructure.

The reason is obvious: If all backups are deleted or also encrypted, the compliance of the “customer” to pay his ransom increases by far.

Many approaches have therefore already been conceived to store the backup data out of reach of an attacker. A very simple and secure variant is an Air-Gap – a physical separation of the backup media from the system. For example, LTO tapes can be physically removed from the library.

Without this kind of time-consuming manual extraction – which would also have to be performed daily – the data remains latently vulnerable. It doesn’t matter whether it is stored on disk systems, dedup appliances, tapes in a library or even in an S3 cloud repository.

S3 cloud providers have therefore proposed an API extension called “Immutability” some time ago. With this, at least the backups in the cloud layer can be made immune to changes for a certain time.

Some of these solutions are natively supported by Veeam. Amazon AWS is one of them. Microsoft Azure is currently still missing. Furthermore S3 memory is not suitable for every application. A primary backup with Veeam on S3 is for example not directly possible. The S3 layer is only available as an extension of a scale-out backup repository.

Continue reading “ExaGrid Time-Lock – Who’s (still) afraid of ransomware?”

Veeam Storage Plugin for DataCore – Deepdive

Any questions, remarks or additions: melter[at]idicos.de

SANsymphony meets Veeam Backup and Replication – true love in the end!

In December 2019 the plugin for the popular DataCore SANsymphony SDS was finally released. And it is done in the only proper way: With full support and validation by Veeam.

In this article series we will cover several aspects of the plugin:

Continue reading “Veeam Storage Plugin for DataCore – Deepdive”

Storage 101- The Synchronous Mirror Dilemma

A brief introduction into High Availabilty

Keeping data identical at two locations is becoming increasingly important in a highly available IT world. A couple of years back in time it used to be an expensive enterprise level luxury. But recently that demand can be found in SMB environments too. The method is called mirroring which can be implemented in two ways.

  • Asynchronous – Data is being synchronized in defined intervals. In between there is a difference (delta) between source and target.
  • Synchronous – Data transfer is transaction consistent. I.e. the data is identical on both sides at all times. A write operation is only considered complete when source and target site have confirmed the write.

A prerequisite for high availability is mirroring of data (synchronous or asynchronous). If the data is available at two locations (data centers), a further design question arises: Should the storage target act as a fallback copy in case of emergency (Active-Passive), or should the data be actively used in both locations (Active-Active)?

  • Active-Passive – Only the active side works and data is transferred to the passive side (synchronous or asynchronous). In case of a failiure, the system switches automatically or manually and the previously passive side becomes active. It remains so until a failback is triggered. This method guarantees full performance even in the event of a total site failure. Resources must be equal on both sides. The disadvantage is that only a maximum of 50% of the total resources may be used.
  • Active-Active – Resources of both sides can be used in parallel and the hardware is utilized more efficiently. However, this means that in the event of a failure, half of the resources are lost and full performance cannot be guaranteed. Active-Active designs require a synchronous mirror, as both sides have to work with identical data.

Active-Active clusters do exist in many different forms. There’s classic SAN storage with integrated mirroring, or software defined storage (sds) where the mirroring is not in hardware but in the software layer. One example is DataCore SANsymphony. VMware vSAN Stretched Cluster plays a special role and will not be covered in this post.

In the following section I will discuss a special pitfall of LUN based active-active constructs, which is often neglected, but can lead to data loss in case of an error. VMware vSAN is not affected because its stretched cluster is based on a different design which prevents the following issue.

Continue reading “Storage 101- The Synchronous Mirror Dilemma”