VMware Bitfusion and Tanzu – Part 2 : Bitfusion server setup

This will be a multi-part post focused on the VMware Bitfusion product. I will give an introduction to the technology, how to set up a Bitfusion server and how to use its services from Kubernetes pods.

Bitfusion Server setup preparation

A Bitfusion Server Cluster must meet the following requirements:

  • vSphere 7 or later
  • 10 GBit LAN at least for the Bitfusion data traffic for smaller or PoC deployments. High bandwidth and low latency are essential. 40 Gbit or even 100 Gbit are recommended.
  • Nvidia GPU with CUDA functionality and DirectPath I/O support:
    • Pascal P40
    • Tesla V100
    • T4 Tensor
    • A100 Tensor
  • At least 3 Bitfusion server per cluster for high availability

This setup guide assumes that the graphics cards have been deployed to the ESXi 7+ servers and the hosts have joined a cluster in vCenter.

Continue reading “VMware Bitfusion and Tanzu – Part 2 : Bitfusion server setup”

VMware Bitfusion and Tanzu – Part 1: A primer to Bitfusion

This will be a multi-part post focused on the VMware Bitfusion product. I will give an introduction to the technology, how to set up a Bitfusion server and how to use its services from Kubernetes pods.

What is Bitfusion?

In August 2019, VMware acquired BitFusion, a leader in GPU virtualization. Bitfusion provides a software platform that decouples specific physical resources from compute servers. It is not designed for graphics rendering, but rather for machine learning (ML) and artificial intelligence (AI). Bitfusion systems (client and server) only run on selected Linux platforms as of today and support ML applications such as TensorFlow.

Why are GPUs so important for ML/AI applications?

Processors (Central Processing Unit / CPU) in current systems are optimized to process serial tasks in the shortest possible time and to switch quickly between tasks. GPUs (Graphics Processor Units), on the other hand, can process a large number of computing operations in parallel. The original intended application is in the name of the GPU. The CPU was to be offloaded by GPU in graphics rendering by outsourcing all rendering and polygon calculations to the GPU. In the mid-90s, some 3D games could still choose to render with CPU or GPU. Even then, it was a difference like night and day. GPU could calculate the necessary polygon calculations much faster and smoother.

A fine comparison of GPU and CPU architecture is described by Niels Hagoort in his blog post “Exploring the GPU Architecture“.

However, due to their architecture, GPUs are not only ideal for graphics applications, but for all applications where a very large number of arithmetic operations have to be executed in parallel. This includes blockchain, ML, AI and any kind of data analysis (number crunching).

Continue reading “VMware Bitfusion and Tanzu – Part 1: A primer to Bitfusion”

Monitor Tanzu K8s Compliance with Runecast Analyzer

Checking the cluster’s compliance for security or hidden problems is meanwhile a standard task. There are automated tools to do the job such as VMware Skyline or Runecast Analyzer. In addition to standard vSphere clusters, the latter can also check vSAN, NSX-T, AWS, Kubernetes and, since version 5.0, Azure for compliance.

In this blog post I’d like to outline how to connect a vSphere with Tanzu [*] environment to Runcast Analyzer. [* native Kubernetes Pods and TKG on vSphere]

Some steps are simplified because it is a Lab environment. I will point this out at the given point.

Before we can register Tanzu in Runecast Analyzer, we need some information.

  • IP address or FQDN of the SupervisorControlPlane
  • Service account with access to the SupervisorControlPlane
  • Service account access token
Continue reading “Monitor Tanzu K8s Compliance with Runecast Analyzer”

Using ESXi on Arm as a tiny Kubernetes cluster

ESXi on Intel x86 architecture has been a commodity for many years now. In recent years and during VMworld for example we’ve seen early alpha versions of ESXi running on Arm architecture like smart NICs or even Raspberry Pi. Meanwhile VMware developers published a Fling named ESXi Arm Edition to deploy ESXi on Arm architecture. Of course this is a lab project and not supported by VMware for production workloads. But anyway, it’s a great opportunity to play around with ESXi on a cheap and tiny computer like Raspberry Pi. I will not explain how to deploy ESXi on Arm. Check the detailed documentation on the Fling project page (PDF). I will focus on day-2 operation.

I would like to thank William Lam for providing a lot of background information, hacks and tricks around PhotonOS and ESXionArm.

Now I’ve got an ESXi host on my Raspi. What can I do with it?

Just a few remarks before we start:

You can’t run any workload on the ESXi on Arm platform. As the project name says, it’s an Arm architecture, So you can’t run operating systems based on Intel architecture. All guest VMs need to be made for Arm architecture. That will rule out Windows guest systems and also most Linux distributions. But luckily there are a couple of Linux distributions made specific for Arm architecture like Ubuntu Server for Arm, or Photon OS. For my demonstration I chose the latest Photon OS (version 4 beta). As host hardware I’m using the “big” Raspberry Pi 4 with 8 GB RAM. You can imagine that 8 GB of RAM isn’t very much for host OS and guest VMs. We have to use resources sparingly.

Our aim is to deploy a 3 node Kubernets cluster on an ESXi on Arm host on Raspberry Pi with just 8 GB RAM and 4 cores. Sounds crazy, but it’s possible. Thanks to K3s lightweight Kubernetes on Arm.

Hardware used

  • Raspberry Pi 4, Broadcom BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz
  • Heat sink for Raspberry Pi4 (your Raspi will become hot without)
  • SD-card (only for UEFI BIOS)
  • USB stick for ESXi installation
  • USB 3 hub with external power supply (Raspi doesn’t provide reliable power on USB port for an NVMe SSD)
  • USB 3 NVMe M.2 case
  • Samsung NVMe EvoPlus 250 GB M.2

Using ESXi on Arm in standalone mode

Although I have joined my ESXi on Raspi to my vCenter 7, I will not use any vCenter features. All works are done like on a standalone ESXi (with all the shortcomings and limitations).

First we need 3 VMs for the 3 K3s nodes. It’s a good idea to build a VM with everything we need except K3s and then clone it. Well, if you think cloning a VM on a standalone ESXi on Arm host is just a mouse click in the UI, then welcome to the real world. 😉 I will come to that point later. Let’s build our first Photon OS VM.

Continue reading “Using ESXi on Arm as a tiny Kubernetes cluster”