Docker is the easiest way to run TensorFlow on a GPU since the host machine only requires the NVIDIA® driver (the NVIDIA® CUDA® Toolkit is not required).
- Use TensorFlow with GPU support on Ubuntu with Docker
- Ubuntu 18.04.1
- NVRM 435.21
- GCC 7.5.0
- Docker 19.03.8
Install Docker Engine - Community (using the repository)
- To allow apt to use a repository over HTTPS:
$ sudo apt-get update $ sudo apt-get install \ apt-transport-https \ ca-certificates \ curl \ gnupg-agent \ software-properties-common - Add Docker’s official GPG key:
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - $ sudo apt-key fingerprint 0EBFCD88 - Set up the stable repository:
$ sudo add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable"
- Install the latest version of Docker Engine - Community and containerd:
$ sudo apt-get update $ sudo apt-get install docker-ce docker-ce-cli containerd.io
- Verify that Docker Engine - Community is installed correctly:
$ sudo docker run hello-world
The Docker daemon binds to a Unix socket instead of a TCP port. By default that Unix socket is owned by the user root and other users can only access it using sudo. The Docker daemon always runs as the root user. If you don’t want to preface the docker command with sudo, create a Unix group called docker and add users to it. When the Docker daemon starts, it creates a Unix socket accessible by members of the docker group.
- Create the
dockergroup:$ sudo groupadd docker
- Add your user to the docker group:
$ sudo usermod -aG docker $USER - Log out and log back in so that your group membership is re-evaluated. On Linux, you can also run the following command to activate the changes to groups:
$ newgrp docker
- Verify that you can run docker commands without sudo:
$ docker run hello-world
For GPU support on Linux, install NVIDIA Docker support
Make sure you have installed the NVIDIA driver and Docker 19.03 for your Linux distribution Note that you do not need to install the CUDA toolkit on the host, but the driver needs to be installed.
- Verify driver version:
This could be
$ /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 435.21 Sun Aug 25 08:17:57 CDT 2019 GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
- Verify the CUDA Toolkit version:
This could be
$ nvcc -V
Command 'nvcc' not found.
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart dockersudo if permission denied:
- Test nvidia-smi with the latest official CUDA image
$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
- Start a GPU enabled container on two GPUs
$ docker run --gpus 2 nvidia/cuda:10.0-base nvidia-smi
- Starting a GPU enabled container on specific GPUs
$ docker run --gpus '"device=1,2"' nvidia/cuda:10.0-base nvidia-smi $ docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:10.0-base nvidia-smi - Specifying a capability (graphics, compute, ...) for my container
Note this is rarely if ever used this way.
$ docker run --gpus all,capabilities=utility nvidia/cuda:10.0-base nvidia-smi
$ docker pull tensorflow/tensorflow:latest-gpu-py3-jupyterTo check what images are on the machine:
$ docker image lsOne should see the following output if he follows this gist:
REPOSITORY TAG IMAGE ID CREATED SIZE
tensorflow/tensorflow latest-gpu-py3-jupyter ce8f7398433c 2 months ago 4.26GB
nvidia/cuda 10.0-base 841d44dd4b3c 4 months ago 110MB
hello-world latest fce289e99eb9 15 months ago 1.84kBNote if one attempts to run:
$ docker run -it --rm tensorflow/tensorflow \
python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"a new image tensorflow/tensorflow:latest will be downloaded:
REPOSITORY TAG IMAGE ID CREATED SIZE
tensorflow/tensorflow latest-gpu-py3-jupyter ce8f7398433c 2 months ago 4.26GB
tensorflow/tensorflow latest 9bf93bf90865 2 months ago 2.47GB
nvidia/cuda 10.0-base 841d44dd4b3c 4 months ago 110MB
hello-world latest fce289e99eb9 15 months ago 1.84kBIn general:
$ docker run [-it] [--rm] [-p hostPort:containerPort] tensorflow/tensorflow[:tag] [command]- Start a bash shell session within a TensorFlow-configured container:
$ docker run -it tensorflow/tensorflow bash
- To run a TensorFlow program developed on the host machine within a container, mount the host directory and change the container's working directory (
-v hostDir:containerDir -w workDir):$ docker run -it --rm -v $PWD:/tmp -w /tmp tensorflow/tensorflow python ./script.py - Start a Jupyter Notebook server using TensorFlow's nightly build with Python 3 support:
$ docker run -it -p 8888:8888 tensorflow/tensorflow:nightly-py3-jupyter
- Check if a GPU is available:
$ lspci | grep -i nvidia - Verify your
nvidia-dockerinstallation:$ docker run --gpus all --rm nvidia/cuda nvidia-smi
- Download and run a GPU-enabled TensorFlow image:
The output is commented below.
$ docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu \ python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))" - Use the latest TensorFlow GPU image to start a bash shell session in the container:
$ dock run --gpus all -it tensorflow/tensorflow:latest-gpu bash
Consider a case where you have a directory source and that when you build the source code, the artifacts are saved into another directory, source/target/. You want the artifacts to be available to the container at /app/, and you want the container to get access to a new build each time you build the source on your development host. Use the following command to bind-mount the target/ directory into your container at /app/. Run the command from within the source directory. The $(pwd) sub-command expands to the current working directory on Linux or macOS hosts.
$ docker run -d \
-it \
--name msc2 \
--mount type=bind,source="$(pwd)"/.,target=/mounteddir \
tensorflow/tensorflow:latest-gpu-py3-d means detached. To bring it to foreground, use docker attach CONTAINER, where CONTAINER is a custom name. Note this is more complex than needed to provide a more complete usage. The simplified command I use is, with the port specified for Jupyter notebook:
$ docker run -it \
--name msc2 \
-p 8888:8888 \
--mount type=bind,source="$(pwd)"/.,target=/mounteddir \
tensorflow/tensorflow:latest-gpu-py3Run docker rename CONTAINER NEW_NAME to rename the container, and use docker rm CONTAINER to delete the unwanted ones.
The docker run command first creates a writeable container layer over the specified image, and then starts it using the specified command. That is, docker run is equivalent to the API /containers/create then /containers/(id)/start. A stopped container can be restarted with all its previous changes intact using docker start. See docker ps -a to view a list of all containers.[*]
Next time when you want to use the it, with the status being the same as when it exited [*]:
$ docker restart msc2
$ docker attach msc2Now, in the docker container msc2, you can do pip install notebook and launch Jupyter notebook using [*]:
$ jupyter notebook --ip 0.0.0.0 --port 8888 --no-browser --allow-rootThe jupyter notebook software will give an URL which can be used in browsers outside the Docker container since we specified port with -p.
$ sudo docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu \ python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"gives