Operating an AI workload on a GPU machine requires the set up of kernel drivers and person house libraries from GPU distributors comparable to AMD and NVIDIA. As soon as the motive force and software program are put in, to make use of AI frameworks comparable to PyTorch and TensorFlow, one wants to make use of the correct framework constructed towards the GPU goal. Normally, the AI functions run on high of widespread AI frameworks and as such disguise the tedious set up steps. This text highlights the significance of the {hardware}, driver, software program, and frameworks for operating AI functions or workloads.
This text offers with the Linux working system, ROCm software program stack for AMD GPU, CUDA software program stack for NVIDIA GPU, and PyTorch for AI frameworks. Docker performs a vital half in citing all the stack permitting the launch of assorted workloads in parallel.
The above diagram represents the AI software program stack on an 8*AMD GPU node.
The {hardware} layer consists of a node with the standard CPU, reminiscence, and so on. + the GPU units. A node can have a single GPU system. Larger AI fashions require a number of GPU reminiscence to load, and therefore, it’s common to make use of a couple of GPU in a node. The GPUs are interconnected by means of XGMI and NVLink. A cluster can have a number of such nodes and GPUs on one node can work together with GPUs on one other node. This interconnect is often by means of InfiniBand, Ethernet/ROCe. The GPU interconnect for use relies upon upon the underlying GPU {hardware}.
Set up of Kernel Driver
On the software program layer, the AMD GPU driver or NVIDIA GPU driver must be put in. It isn’t unusual to put in all the ROCm or CUDA software program bundle on the native host OS which incorporates the kernel driver. Since we’re going to use a Docker container to launch the AI workload, the person house ROCm or CUDA software program is redundant on the native host OS; however this enables us to check if the underlying kernel driver works properly or not by means of the person house instruments.
Launching ROCm or CUDA-Primarily based Docker Container
As soon as the GPU drivers are put in, ROCm or CUDA-based Docker photos can be utilized respectively for AMD and NVIDIA GPU nodes.
Numerous Linux-flavor Docker photos are launched periodically by AMD and NVIDIA. This is without doubt one of the benefits of Dockerized functions as an alternative of operating functions on a local OS. We will have Ubuntu 22.04 host OS with GPU drivers put in after which launch Centos, Ubuntu 20.04-based Docker containers with completely different ROCm variations in parallel.
Launching ROCm-Primarily based Docker Container
ROCm Docker photos can be found right here. Verify for dev-ubuntu-22.04 right here.
docker run -it --rm --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined rocm/dev-ubuntu-22.04
The above command maps all of the GPU units to the container. You can too entry particular GPUs (extra data at “Running ROCm Docker containers”).
As soon as the container is operating, test if GPUs are listed.
You may obtain the PyTorch code and construct it for AMD GPU. Extra directions on GitHub or you’ll be able to run any workload that has ROCm assist.
Launching ROCm-Primarily based PyTorch Docker Container
If PyTorch will not be required to be constructed from the supply (most often, it isn’t required to construct the PyTorch from the supply), one can instantly obtain the ROCm based mostly PyTorch Docker picture. Simply ensure the ROcm kernel drivers are put in after which launch the PyTorch-based containers.
PyTorch with ROCm assist Docker photos will be discovered right here.
docker run -it --rm --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined rocm/pytorch
As soon as the container is operating, test if GPUs are listed as described earlier.
Let’s strive just a few code snippets from the PyTorch framework to test GPUs, ROCm/hip model, and so on.
root@node:/var/lib/jenkins# python3
Python 3.10.14 (principal, Mar 21 2024, 16:24:04) [GCC 11.2.0] on linux
Kind "help", "copyright", "credits" or "license" for extra data.
>>> import torch
>>> torch.__version__
'2.1.2+git70dfd51'
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
8
>>> torch.model.hip
'6.1.40091-a8dbc0c19'
Conclusion
In conclusion, this text highlights the significance of software program stack compatibility with the underlying GPU {hardware}. A improper collection of software program stack on a selected GPU kind may result in the utilization of the default system (i.e., CPU), thereby underutilizing compute energy of the GPU.
Comfortable GPU programming!