VMware Bitfusion and Tanzu – Part 3: Utilize GPU from Kubernetes Pods and TKGS

This will be a multi-part post focused on the VMware Bitfusion product. I will give an introduction to the technology, how to set up a Bitfusion server and how to use its services from Kubernetes pods.

We saw in parts 1 and 2 what Bitfusion is and how to set up a Bitfusion Server cluster. The challenging part is to make this Bitfusion cluster usable from Kubernetes pods.

In order for containers to access Bitfusion GPU resources, a few general conditions must be met.

I assume in this tutorial that we have a configured vSphere-Tanzu cluster available, as well as a namespace, a user, a storage class and the Kubernetes CLI tools. The network can be organized with either NSX-T or distributed vSwitches and a load balancer such as the AVI load balancer.

In the PoC described, Tanzu on vSphere was used without NSX-T for simplicity. The AVI load balancer, now officially called NSX-Advanced load balancer, was used.

We also need a Linux system with access to Github or a mirror to prepare the cluster.

The procedure in a nutshell:

  • Create TKGS cluster
  • Get Bitfusion baremetal token laden and create K8s secret
  • Load Git project and modify makefile
  • Deploy device-plugin to TKGS-cluster
  • Pod deployment
Continue reading “VMware Bitfusion and Tanzu – Part 3: Utilize GPU from Kubernetes Pods and TKGS”

VMware Bitfusion and Tanzu – Part 2 : Bitfusion server setup

This will be a multi-part post focused on the VMware Bitfusion product. I will give an introduction to the technology, how to set up a Bitfusion server and how to use its services from Kubernetes pods.

Bitfusion Server setup preparation

A Bitfusion Server Cluster must meet the following requirements:

  • vSphere 7 or later
  • 10 GBit LAN at least for the Bitfusion data traffic for smaller or PoC deployments. High bandwidth and low latency are essential. 40 Gbit or even 100 Gbit are recommended.
  • Nvidia GPU with CUDA functionality and DirectPath I/O support:
    • Pascal P40
    • Tesla V100
    • T4 Tensor
    • A100 Tensor
  • At least 3 Bitfusion server per cluster for high availability

This setup guide assumes that the graphics cards have been deployed to the ESXi 7+ servers and the hosts have joined a cluster in vCenter.

Continue reading “VMware Bitfusion and Tanzu – Part 2 : Bitfusion server setup”

VMware Bitfusion and Tanzu – Part 1: A primer to Bitfusion

This will be a multi-part post focused on the VMware Bitfusion product. I will give an introduction to the technology, how to set up a Bitfusion server and how to use its services from Kubernetes pods.

What is Bitfusion?

In August 2019, VMware acquired BitFusion, a leader in GPU virtualization. Bitfusion provides a software platform that decouples specific physical resources from compute servers. It is not designed for graphics rendering, but rather for machine learning (ML) and artificial intelligence (AI). Bitfusion systems (client and server) only run on selected Linux platforms as of today and support ML applications such as TensorFlow.

Why are GPUs so important for ML/AI applications?

Processors (Central Processing Unit / CPU) in current systems are optimized to process serial tasks in the shortest possible time and to switch quickly between tasks. GPUs (Graphics Processor Units), on the other hand, can process a large number of computing operations in parallel. The original intended application is in the name of the GPU. The CPU was to be offloaded by GPU in graphics rendering by outsourcing all rendering and polygon calculations to the GPU. In the mid-90s, some 3D games could still choose to render with CPU or GPU. Even then, it was a difference like night and day. GPU could calculate the necessary polygon calculations much faster and smoother.

A fine comparison of GPU and CPU architecture is described by Niels Hagoort in his blog post “Exploring the GPU Architecture“.

However, due to their architecture, GPUs are not only ideal for graphics applications, but for all applications where a very large number of arithmetic operations have to be executed in parallel. This includes blockchain, ML, AI and any kind of data analysis (number crunching).

Continue reading “VMware Bitfusion and Tanzu – Part 1: A primer to Bitfusion”

vSphere with Kubernetes

What’s new in v7U1?

VMware will release vSphere 7 Update 1 shortly. Once update 1 is released users will be able to run Kubernetes workloads natively on vSphere. So far that was only possible for installations with VMware Cloud Foundation 4 (VCF). Beginning with update 1 there will be two kinds of Kubernetes on vSphere:

  • VCF with Tanzu
  • vSphere with Tanzu

VCF offers the full stack but has some constraints regarding your choices. For example VCF requires vSAN as storage and NSX-T networking. NSX-T offers loadbalancer functionality for the supervisor cluster and Tanzu Kubernetes Grid (TKG). Additionally it provides overlay networks for PodVMs. These are container pods that can run on the hypervisor by means of a micro-VM.

In contrast to VCF with Tanzu, vSphere with Tanzu has less constraints. There’s no requirement to utilize vSAN as storage layer and also NSX-T is optional. Networking can be done with normal distributed switches (vDS). It’s possible to use HA-proxy as loadbalancer for supervisor control plane API and TKG cluster API. The downside of this freedom comes with reduced functionality. Without NSX-T it is not possible to run PodVMs. Without PodVMs you cannot use Harbor Image Registry, which relies on PodVMs. In other words: if you want to use Harbor Image Registry together with vSphere with Tanzu, you have to deploy NSX-T.

VCF with TanzuvSphere with Tanzu
NSX-Trequiredoptional, vDS
vSANrequiredoptional
PodVMsyesonly with NSX-T
Harbor Registryyesonly with PodVM, NSX-T
LoadbalancerNSX-THA-proxy
CNICalicoAntrea or Calico
Overlay NWNSX-T

Tanzu Editions

In the future there will be 4 editions of vSphere with Tanzu:

  • Tanzu Basic – Run basic Kubernetes-clusters in vSphere. Available as license bundle together with vSphere7 EnterprisePlus.
  • Tanzu Standard – Same as Tanzu Basic but with multi cloud support. Addon license for vSphere7 or VCF.
  • Tanzu Advanced – Available later.
  • Tanzu Enterprise – Available later.

Links

vSphere Blog – What’s New with VMware vSphere 7 Update 1

vSphere Blog – Announcing VMware vSphere with Tanzu

Cormac Hogan – Getting started with vSphere with Tanzu

VMware Tanzu – Simplify Your Approach to Application Modernization with 4 Simple Editions for the Tanzu Portfolio