TKGS Archives - ElasticSky

Bitfusion Server setup preparation

A Bitfusion Server Cluster must meet the following requirements:

vSphere 7 or later

10 GBit LAN at least for the Bitfusion data traffic for smaller or PoC deployments. High bandwidth and low latency are essential. 40 Gbit or even 100 Gbit are recommended.

Nvidia GPU with CUDA functionality and DirectPath I/O support:

Pascal P40
Tesla V100
T4 Tensor
A100 Tensor

At least 3 Bitfusion server per cluster for high availability

This setup guide assumes that the graphics cards have been deployed to the ESXi 7+ servers and the hosts have joined a cluster in vCenter.

This will be a multi-part post focused on the VMware Bitfusion product. I will give an introduction to the technology, how to set up a Bitfusion server and how to use its services from Kubernetes pods.

Part 1 : A primer to Bitfusion (this article)
Part 2 : Bitfusion server setup
Part 3 : Using Bitfusion from Kubernetes pods

What is Bitfusion?

In August 2019, VMware acquired BitFusion, a leader in GPU virtualization. Bitfusion provides a software platform that decouples specific physical resources from compute servers. It is not designed for graphics rendering, but rather for machine learning (ML) and artificial intelligence (AI). Bitfusion systems (client and server) only run on selected Linux platforms as of today and support ML applications such as TensorFlow.

Why are GPUs so important for ML/AI applications?

Processors (Central Processing Unit / CPU) in current systems are optimized to process serial tasks in the shortest possible time and to switch quickly between tasks. GPUs (Graphics Processor Units), on the other hand, can process a large number of computing operations in parallel. The original intended application is in the name of the GPU. The CPU was to be offloaded by GPU in graphics rendering by outsourcing all rendering and polygon calculations to the GPU. In the mid-90s, some 3D games could still choose to render with CPU or GPU. Even then, it was a difference like night and day. GPU could calculate the necessary polygon calculations much faster and smoother.

A fine comparison of GPU and CPU architecture is described by Niels Hagoort in his blog post “Exploring the GPU Architecture“.

However, due to their architecture, GPUs are not only ideal for graphics applications, but for all applications where a very large number of arithmetic operations have to be executed in parallel. This includes blockchain, ML, AI and any kind of data analysis (number crunching).

Tag: TKGS

VMware Bitfusion and Tanzu – Part 2 : Bitfusion server setup

Bitfusion Server setup preparation

VMware Bitfusion and Tanzu – Part 1: A primer to Bitfusion

What is Bitfusion?

Why are GPUs so important for ML/AI applications?