The first 2 sections are from Get Docker Engine - Community for Ubuntu, Post-installation steps for Linux, NVIDIA Container Toolkit and validation steps at TensorFlow Docker, so there’s not really anything new in those sections. However, the subsequent 5 sections contain notes about using Tensorflow with GPU support in a Docker container interactively, building a Docker image, running an IDE within the container, running Jupyter Notebooks from the container, and moving the data directory, which might be of more interest.
sudo apt-get update sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo apt-key fingerprint 0EBFCD88 sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" sudo apt-get update sudo apt-get install docker-ce docker-ce-cli containerd.io
sudo docker run hello-world
Add your userid to the docker group so you don’t have to run commands with sudo:
sudo usermod -aG docker $USER newgrp docker
docker run hello-world
Install NVIDIA Container Toolkit and Tensorflow
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker
docker run --gpus all nvidia/cuda:9.0-base nvidia-smi docker run --gpus all --rm nvidia/cuda nvidia-smi docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
Note: Documentation says to run:
docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
But as per AttributeError: module ‘tensorflow’ has no attribute ‘enable_eager_execution’ with the TensorFlow 2 this gives an error, so use the command above.
To confirm which version of TensorFlow is installed:
python -c 'import tensorflow as tf; print(tf.__version__)'
Using Tensorflow and Docker
For dev and testing purposes, you can simply start a container with a bind mount to a working directory where it can read the git managed python and source files and save the trained model to somewhere which will persist once the container is stopped.
Dev sessions are started via:
docker run --gpus all -it -u 1000:1000 -p 8888:8888 --mount type=bind,src=/home/<username>/projects,dst=/home/projects --env HOME=/home/projects -w /home/projects tensorflow/tensorflow:latest-gpu-py3 bash
- 1000 and 1000 are the user the group IDs I want to run as (so files written by the container have the correct permissions outside the container)
- /home/<username>/projects is the real file system and /home/projects the virtual one (so I have access to the local git repo, noting that in this container git isn’t installed so git pull/push etc. have to be performed outside the container)
- the $HOME environment variable is set to /home/projects rather than the default / (this is for the Visual Studio Code Remote - Containers extension)
- the working directory is set to /home/projects
- the Tensorflow image with GPU support and Python 3 (tensorflow/tensorflow:latest-gpu-py3) is used
- a bash shell is started
Building an image
If you want to install additional packages, it’ll be easier to build an image. In my case I want to install Jupyter Notebook, so I’ll create a basic Dockerfile like:
FROM tensorflow/tensorflow:latest-gpu-py3 WORKDIR /home/projects RUN pip install notebook
docker build --tag tfdev .
And start subsequent sessions with a simplified version of the original command, i.e.:
docker run --gpus all -it -u 1000:1000 -p 8888:8888 --mount type=bind,src=/home/<username>/projects,dst=/home/projects --env HOME=/home/projects tfdev bash
Of course this is still running it interactively, as a user that can directly read and write the source. If you want to build a completely self-contained image which you can move from dev through to production, you’ll need additional steps such as copying in the source code.
Running Visual Studio Code from the Tensorflow Docker image
This is so that code completion etc. works in the IDE. I use Visual Studio Code, but there’s probably similar approaches for other IDEs. Visual Studio Code uses the Remote - Containers extension which in turn needs Docker Compose.
sudo curl -L "https://github.com/docker/compose/releases/download/1.25.5/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose sudo chmod +x /usr/local/bin/docker-compose
To install Visual Studio Code Remote Development Extension Pack, go to Go > Go to File and enter:
ext install ms-vscode-remote.vscode-remote-extensionpack
Once set up, and a docker is running, go to Remote Explorer in the Visual Studio Code Activity Bar, select the container that is running, right click, and “Attach to container”. Note that if –env HOME is not set in the original docker run command (or equivalent), it will try to create .vscode-server in /root/ which will give a “Command in container failed: mkdir -p /root/.vscode-server/” error due to insufficient permissions.
Running Jupyter Notebooks from within Docker
If you do a
pip list within the default tensorflow/tensorflow:latest-gpu image you’ll see that Jupyter Notebooks isn’t installed by default. It can be installed with a custom Docker image as per the “Building an image” section above.
To start Jupyter Notebook within the interactive Docker container:
jupyter notebook --ip 0.0.0.0 --no-browser
where –ip specifies the localhost and –no-browser tells it not to try to launch a browser session as it normally does. You can then access outside the container via http://localhost:8888, noting that you’ll be prompted to enter the token shown when the jupyter notebook process started.
Moving the Docker data directory
When installing some large models via Docker, e.g. GPT-2, I found my root partition, i.e. /, became full and the system barely usable, despite having separate partitions for home and data. This is because Docker on Ubuntu defaults to using /var/lib/docker.
It was possible to recover enough space on / for the system to become useable again by clearing unused Docker files via:
docker system prune
There were a couple of ways of moving the Docker data directory listed online, e.g. DOCKER_OPTS="-g /new/dir/" in /etc/default/docker, but these didn’t seem to work on Ubuntu, so I simply used a symlink instead:
service docker stop sudo mv /var/lib/docker ~/docker.bak sudo mkdir /var/lib/docker sudo chmod go-r /var/lib/docker sudo ln -s /data/docker /var/lib/ service docker start