Recently I’ve decided to build a simple deep learning machine with single GTX 1080Ti GPU and based on Ubuntu 16.04. The machine’s assembling process was quite straightforward. But while deploying required software, a few minor issues had arisen. That would be helpful to have an instruction with the list of performed actions in case if the setup system would ever require re-deployment.
Post contents
Optional Step: Prepare Bootable USB and Install OS
Of course, the first step in deploying deep learning machine based on Ubuntu OS, you should install that operating system onto your computer. This step is thoroughly described in Ubuntu Tutorials and is not mentioned here.
Briefly, you just need download ISO image and “burn” it onto flash drive. If your host machine works under macOS, the following tutorial describes how to create bootable stick.
Disabling Graphical Interface and CLI Mode Preparations
If desktop version of Ubuntu was installed, then by default the user will be logged into graphical mode. That is not what one would like to have if wants to use created machine as a deep learning computations host. To prepare system for a remote control and CLI mode, one need to setup OpenSSH server, and disable UI desktop to log into terminal.
Warning: After applying the following steps and system's reboot, UI will not be available anymore. The UI daemon should be re-enabled, and GRUB configuration file changed back to video mode to restore graphical desktop.
First of all, check if openssh-server
is installed on the machine. Otherwise,
install it (and adjust configuration, if required):
$ sudo apt install openssh-server
$ sudo service ssh status
$ sudo vim /etc/ssh/sshd_config
To get an IP address which is assigned to the host by router, the command
ifconfig -a
could be used. It should return a list of network devices and
local IP address.
When everything is installed, and the machine is ready to be used in “headless”
mode, one should change GRUB loader configuration a bit and disable lightdm
.
To update boot loader configuration, edit /etc/default/grub
file:
$ sudo apt install vim # nano editor could be used instead
$ sudo vim /etc/default/grub
$ sudo update-grub
Here is an example of configuration file’s content when the host was completely prepared for “headless” run (a couple of commented out strings were used when desktop UI was enabled):
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
# info -f grub -n 'Simple configuration'
GRUB_DEFAULT=0
GRUB_HIDDEN_TIMEOUT=0
GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
# GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX_DEFAULT="text"
# GRUB_CMDLINE_LINUX="nomodeset"
GRUB_CMDLINE_LINUX="text"
GRUB_TERMINAL=console
And, the final step:
$ sudo systemctl disable lightdm
$ reboot
Generate SSH Key and Upload to the Host
When you connect to your deep learning machine via SSH, each time you need to enter your credentials. To simplify this process a bit, you can generate public/private keys pair to access without typing password and username.
To do it, on your machine which you’re going to use to control the deep
learning host, generate an SSH keypair with ssh-keygen
, copy public key to host,
and add private key to ssh-agent
(don’t forget to replace placeholders in the
following script with actual values):
> cd ~/.ssh
> ssh-keygen -t rsa # follow keygen instructions to generate key
> cat YOUR-KEY-NAME.pub | ssh USERNAME@IP_ADDRESS 'cat >> .ssh/authorized_keys && echo "Key copied"'
> ssh-add -K YOUR-KEY-NAME >/dev/null 2>&1 # or add this line to your .bashrc, .zshrc, etc.
Now you’re able to connect to your host just typing something
like ssh username@$192.186.0.10
without entering credentials or providing
path to the key.
Installing Anaconda
When everything is prepared, we’re going to install Anancoda Python’s distribution. It is not required to use Anaconda to install TensorFlow or any other “scientific” Python package, but this package management system makes process a bit easier sometimes.
To install Anaconda distribution on your host, run the following commands (adjust the first line as needed to pick another version):
$ export conda_version="Anaconda3-5.0.0.1-Linux-x86_64"
$ wget "https://repo.continuum.io/archive/${conda_version}.sh" -O anaconda3.sh
$ chmod +x anaconda3.sh
$ ./anaconda3.sh
$ python
Python 3.6.2 |Anaconda, Inc.| (default, Sep 30 2017, 18:42:57)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
Installing CUDA (+cuDNN) Drivers and TensorFlow
To run TensorFlow on GPU, CUDA driver and cuDNN libraries should be installed. All required software is available from NVIDIA developers platform. The CUDA driver could be downloaded via terminal:
wget "https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda_8.0.44_linux-run" -O cuda_8.0.44_linux.run
But to download cuDNN library, you need navigate to portal using your browser and sing up. You’ll get a confirmation email and then will be able to sing in. Afterwards, go to cuDNN download page and get an appropriate version of drivers.
Note: In general, different versions of TensorFlow require different versions of cuDNN headers. For example, v1.0 (and lower) requires cuDNN v5.1, but version v1.3 requires cuDNN v6.0. You'll see an error message when importing library if versions don't match.
As soon as drivers downloaded, install run CUDA installer. Note that it asks you if you want to install video driver. You don’t need to do it, because a prepacked driver could be outdated, and install driver yourself via Ubuntu package manager:
$ sudo sh cuda_8.0.44_linux.run --override
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt-get update
$ sudo apt-get purge nvidia-*
$ sudo apt-get purge nvidia-cuda*
$ sudo apt-get install nvidia-384 # or any other recent version of driver
$ which nvcc
$ nvidia-smi
By default, CUDA drivers installed into /usr/local/cuda-8.0/
directory. Unpack
downloaded cuDNN archive and copy library and headers into CUDA folder:
tar xvzf cudnn-8.0-linux-x64-v6.0.tgz
sudo cp -P cuda/include/libcudnn* /usr/local/cuda-8.0/include
sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda-8.0/lib64
sudo chmod a+r /usr/local/cuda-8.0/include/cudnn.h /usr/local/cuda-8.0/lib64/libcudnn*
A key -P
in cp
command is required to copy symlinks in .so
files.
That’s all. Now drivers and library are ready. Next, add the following environment
variables to your .bashrc
so TensorFlow loader can find installed software:
# CUDA/cuDNN
export CUDA_HOME="/usr/local/cuda-8.0"
export PATH="/usr/local/cuda-8.0/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH"
# Suppress TF debugging info
# TF_CPP_MIN_LOG_LEVEL=3
The final step - create Python environment and install GPU version of TensorFlow:
$ conda create -n deep python=3.6
$ source activate deep
$ pip install tensorflow-gpu==1.3 # change to newer/older versions if required
$ python -c "import tensforflow as tf; print(tf.__version__)"
If there are no errors - everything was done right, and now you can train deep models on GPU. As a final check, run the following script:
import tensorflow as tf
const = tf.Constant('Hello World!')
with tf.Session() as session:
output = session.run(const)
print(output)
If you don’t suppress TF debugging info output, you’ll see a notification about available GPU/GPUs and memory after library’s import.