Strange thermal Issue after Update to ESXi 7.0b

Patch build 16324942 for ESXi 7.0 has been released on June 23rd 2020. It will raise ESXi 7.0 GA to ESXi 7.0b. As usual I’m patching my homelab systems ASAP. As all hosts are fully compliant with HCL, I chose a fully automated cluster remediation by vSphere Lifecycle Manager (vLCM).

The specs

ServerSuperMicro SYS-E300-9D-8CN8TP
BIOS1.3
ESXi7.0 GA build 15843807 (before) / 7.0b build 16324942 (after)
HCL compliantyes

During host reboot I realized a temperature warning LED on the chassis. A look into IPMI revealed a critical CPU temperature state. Also the fans responsible for CPU airflow ran at maximum speed.

As you can see, system temperature was moderate and fans usually run at low to medium speed under these conditions. Air intake temperature was 25°C.

My ESXi nodes rebooted with the new build 16324942 and there were no errors in vLCM. But I could hear there’s somethin wrong. A fan running at speed over 8000 RPM will tell you there IS something to look after. Also the boot procedure took much longer than usual.

I quickly shut down the whole cluster in order to avoid a core meltdown.

Continue reading “Strange thermal Issue after Update to ESXi 7.0b”

Quiet please! – Silent fans for the Homelab

Servers and switches are built for use in data centers where noise pollution is only a minor issue. The focus is on maximum performance and cooling. In the Homelab, however, things look different. Server rooms in private households are probably the exception and so most homelabs are located somewhere near the desk. A case fan with high speed can be very annoying.

For my vSAN cluster I use a Netgear XS716T 10 Gigabit switch. During system startup the fans rotate at maximum speed and then settle down a bit in normal mode. But even the lower noise level is still annoying.

We need new fans

As part of a handicraft experiment, I tried to get the noise problem under control and bought some Noctua fans which are popular in the homlab scene. The Netgear switch is equipped with two 40 mm fans. These will be replaced by two Noctua NF-A4x20 fans. A simple exchange would be somewhat unsatisfactory, though. There should be at least some kind of quantification (just a science habit).

In the picture below you can see the original fans of the Netgear 10G switch. The 16-port model is equipped with two fans while the 8-port model has just one.

Disclaimer No.1: Before removing the casing cover, the power supply must be disconnected!

Disclaimer No.2: Opening the casing may void your warranty.

Continue reading “Quiet please! – Silent fans for the Homelab”

VMware vExpert 2020 application (2nd round)

The VMware vExpert program is VMware’s global evangelism and advocacy program. The vExpert program was designed by VMware to reward community members for evangelizing VMware’s products and services. Each year the title vExpert is awarded to people who have contributed to the community in an outstanding way. That can be bloggers, book authors, public speakers, VMUG leaders, VMTN contributors, VCDX and other IT professionals who share their knowledge.

Application

Application opens twice a year. Currently the second half application is open from June 1st to June 25th.

Why to become a vExpert

Yes there are benefits (I will come back to that later), but that’s not the point. Being a vExpert is not about what to get, but what you can give. Many vExperts put a lot of their spare time into the community. Preparing a blog post, a VMUG presentation or organizing a VMUG meeting consumes a lot of time. For those community warriors is the vExpert program.

Since I’ve joined the vExpert program I made a lot of friends in the community. I also witnessed a very warm welcome as a newcomer by seasoned vExperts. To name just a few there was Ather Beg from Britain, Andreas Lesslhumer from Austria and Vladan Seget from Reunion Island.

Continue reading “VMware vExpert 2020 application (2nd round)”

Storage 101- The Synchronous Mirror Dilemma

A brief introduction into High Availabilty

Keeping data identical at two locations is becoming increasingly important in a highly available IT world. A couple of years back in time it used to be an expensive enterprise level luxury. But recently that demand can be found in SMB environments too. The method is called mirroring which can be implemented in two ways.

  • Asynchronous – Data is being synchronized in defined intervals. In between there is a difference (delta) between source and target.
  • Synchronous – Data transfer is transaction consistent. I.e. the data is identical on both sides at all times. A write operation is only considered complete when source and target site have confirmed the write.

A prerequisite for high availability is mirroring of data (synchronous or asynchronous). If the data is available at two locations (data centers), a further design question arises: Should the storage target act as a fallback copy in case of emergency (Active-Passive), or should the data be actively used in both locations (Active-Active)?

  • Active-Passive – Only the active side works and data is transferred to the passive side (synchronous or asynchronous). In case of a failiure, the system switches automatically or manually and the previously passive side becomes active. It remains so until a failback is triggered. This method guarantees full performance even in the event of a total site failure. Resources must be equal on both sides. The disadvantage is that only a maximum of 50% of the total resources may be used.
  • Active-Active – Resources of both sides can be used in parallel and the hardware is utilized more efficiently. However, this means that in the event of a failure, half of the resources are lost and full performance cannot be guaranteed. Active-Active designs require a synchronous mirror, as both sides have to work with identical data.

Active-Active clusters do exist in many different forms. There’s classic SAN storage with integrated mirroring, or software defined storage (sds) where the mirroring is not in hardware but in the software layer. One example is DataCore SANsymphony. VMware vSAN Stretched Cluster plays a special role and will not be covered in this post.

In the following section I will discuss a special pitfall of LUN based active-active constructs, which is often neglected, but can lead to data loss in case of an error. VMware vSAN is not affected because its stretched cluster is based on a different design which prevents the following issue.

Continue reading “Storage 101- The Synchronous Mirror Dilemma”