Setup Docker and Nvidia Container Toolkit
I have started experimenting with Docker and one of the things I wanted to test is the use of Docker with Nvidia for TensorFlow containers. A few things were a little bit different since I realized there was an update on the Nvidia runtime and its setup. The official documentation is currently inaccurate in a few details, so I decided to write a guide since from time to time I format my playground server. I managed to create a few easy steps that can help anyone with a somewhat recent Nvidia GPU to run a container with the latest versions under Ubuntu 20.04 LTS.
Step 1 Preparing the system
Since this is a new OS install there are a few things that should be done before starting to install Docker and the Nvidia runtime. We need to install the CUDA enabled drivers since the nouveau driver is not enabled to give us the proper functionality.
Usually, I run a headless server unless otherwise required since this is the case I will install the headless Nvidia driver and utils. It is not very different from installing the usual driver, you just have to do it with the correct meta package.
install sudo apt install nvidia-headless-460 nvidia-utils-460
Once it is done, we make sure we got some required tools, so we can add repositories.
sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
Since the driver may not be loaded, you need to reboot with following command:
sudo reboot
Step 2 Install Docker CE
The next step is to install Docker CE, it is simple to install and keep up to date on Ubuntu by adding the official repository. For further reference, check the install instructions described by the official documentation
Now, once you have made sure you installed the basic utils from step 1, can add the Docker Repository Key.
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
The output from the fingerprint command information should show that it belongs to the Docker Release repository and its fingerprint should match “9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88”, once you make sure it is correct, add the repository.
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
Install Docker CE
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
Once the install is done, make sure that the service is up by running the Hello World container
sudo docker run --rm hello-world
Since it is the first time the image is run it will be fetched from Docker hub.
Step 3 Install Nvidia Container Toolkit
The installed Linux distribution is a LTS version of Ubuntu, it is quite probable that the current support will be working just fine but as new LTS releases or even regular ones come out, it is a good practice to make sure things are ok on the official documentation.
Setup repository and install
Run the following script to setup the Nvidia Docker Reporsitory and install the Nvidia container toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install nvidia-container-toolkit
sudo systemctl restart docker
Now everything should be ready to test the base CUDA container.
sudo docker run --rm --gpus all nvidia/cuda:11.1.1-base nvidia-smi
After it downloads you should see the nvidia-smi output from your system, meaning you are done and have a running Docker service with CUDA support.