Upgrade of a K3s Lightweight Kubernetes Cluster

K3s is a lightweight, highly available open source Kubernetes cluster platform designed for easy and resource-efficient installation. K3s is provided in a package of less than 60 MB. The package is optimized for ARM platforms and can therefore also be run on hardware such as a Raspberry Pi, or as a guest VM on ESXi-on-ARM.

Prerequisites and collection of information

K3s is a cluster solution. That is why the order in which the nodes are updated is important. The update starts on the master node. So first we need to find out which node has this role. The easiest way to do this is with a kubectl command:

kubectl get node

NAME STATUS ROLES AGE VERSION
k3node1.lab.local Ready master 2y43d v1.19.3+k3s3
k3node2.lab.local Ready none 2y42d v1.19.3+k3s3
k3node3.lab.local Ready none 2y42d v1.19.3+k3s3

From the output above we see my three K3s nodes with FQDN, status, role, age and version. So here k3node1 has the master role.

As an alternative, you can also execute the command in verbose mode:

kubectl get node -o wide

8. July 20215. September 2021

VMware Bitfusion and Tanzu – Part 3: Utilize GPU from Kubernetes Pods and TKGS

This will be a multi-part post focused on the VMware Bitfusion product. I will give an introduction to the technology, how to set up a Bitfusion server and how to use its services from Kubernetes pods.

Part 1 : A primer to Bitfusion
Part 2 : Bitfusion server setup
Part 3 : Using Bitfusion from Kubernetes pods and TKGS. (this article)

We saw in parts 1 and 2 what Bitfusion is and how to set up a Bitfusion Server cluster. The challenging part is to make this Bitfusion cluster usable from Kubernetes pods.

In order for containers to access Bitfusion GPU resources, a few general conditions must be met.

I assume in this tutorial that we have a configured vSphere-Tanzu cluster available, as well as a namespace, a user, a storage class and the Kubernetes CLI tools. The network can be organized with either NSX-T or distributed vSwitches and a load balancer such as the AVI load balancer.

In the PoC described, Tanzu on vSphere was used without NSX-T for simplicity. The AVI load balancer, now officially called NSX-Advanced load balancer, was used.

We also need a Linux system with access to Github or a mirror to prepare the cluster.

The procedure in a nutshell:

Create TKGS cluster
Get Bitfusion baremetal token laden and create K8s secret
Load Git project and modify makefile
Deploy device-plugin to TKGS-cluster
Pod deployment

8. July 20215. September 2021

VMware Bitfusion and Tanzu – Part 2 : Bitfusion server setup

Part 1 : A primer to Bitfusion
Part 2 : Bitfusion server setup (this article)
Part 3 : Using Bitfusion from Kubernetes pods

Bitfusion Server setup preparation

A Bitfusion Server Cluster must meet the following requirements:

vSphere 7 or later
10 GBit LAN at least for the Bitfusion data traffic for smaller or PoC deployments. High bandwidth and low latency are essential. 40 Gbit or even 100 Gbit are recommended.
Nvidia GPU with CUDA functionality and DirectPath I/O support:
- Pascal P40
- Tesla V100
- T4 Tensor
- A100 Tensor
At least 3 Bitfusion server per cluster for high availability

This setup guide assumes that the graphics cards have been deployed to the ESXi 7+ servers and the hosts have joined a cluster in vCenter.

28. June 20215. September 2021

VMware Bitfusion and Tanzu – Part 1: A primer to Bitfusion

Part 1 : A primer to Bitfusion (this article)
Part 2 : Bitfusion server setup
Part 3 : Using Bitfusion from Kubernetes pods

What is Bitfusion?

In August 2019, VMware acquired BitFusion, a leader in GPU virtualization. Bitfusion provides a software platform that decouples specific physical resources from compute servers. It is not designed for graphics rendering, but rather for machine learning (ML) and artificial intelligence (AI). Bitfusion systems (client and server) only run on selected Linux platforms as of today and support ML applications such as TensorFlow.

Why are GPUs so important for ML/AI applications?

Processors (Central Processing Unit / CPU) in current systems are optimized to process serial tasks in the shortest possible time and to switch quickly between tasks. GPUs (Graphics Processor Units), on the other hand, can process a large number of computing operations in parallel. The original intended application is in the name of the GPU. The CPU was to be offloaded by GPU in graphics rendering by outsourcing all rendering and polygon calculations to the GPU. In the mid-90s, some 3D games could still choose to render with CPU or GPU. Even then, it was a difference like night and day. GPU could calculate the necessary polygon calculations much faster and smoother.

A fine comparison of GPU and CPU architecture is described by Niels Hagoort in his blog post “Exploring the GPU Architecture“.

However, due to their architecture, GPUs are not only ideal for graphics applications, but for all applications where a very large number of arithmetic operations have to be executed in parallel. This includes blockchain, ML, AI and any kind of data analysis (number crunching).