Skip to content

Instantly share code, notes, and snippets.

@gasparian
Last active October 25, 2022 14:38
Show Gist options
  • Save gasparian/648b1db21ce4f2cb47883cc47157560a to your computer and use it in GitHub Desktop.
Save gasparian/648b1db21ce4f2cb47883cc47157560a to your computer and use it in GitHub Desktop.
setting up ec2 machine with GPU for DL

EC2 Cent OS machine with GPUs setup guide

All suggestions here are valid for Cent OS.

Pyenv

Install pyenv via installer, first:

curl -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash

Then add these to your env variables:

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc
exec "$SHELL" # or source ~/.bashrc

Install dependencies for python installation:

sudo yum install gcc zlib-devel bzip2 bzip2-devel readline-devel sqlite sqlite-devel openssl-devel tk-devel libffi-devel xz-devel

Then install python itself:

pyenv install 3.8.9

Then optionally you can set installed python version to be used as the default one:

pyenv global 3.8.9

Docker

Run docker engine if it's not started yet:

sudo service docker status
sudo service docker start

Add user to the docker group to use docker without sudo.

Poetry

Install poetry:

curl -sSL https://install.python-poetry.org | python3 -

Add artifactory credentials to env:

export ARTIFACTORY_USER=andrey.g
export ARTIFACTORY_PASS=1234567890qwerty

And add them to poetry config:

poetry config http-basic.miro-artifactory $ARTIFACTORY_USER $ARTIFACTORY_PASS

GPUs

If after running your app you see the following error in logs:

Could not load dynamic library 'libcudnn.so.8'

Then you can just find libcuda and create symlink to it:

find / -name "*libcuda.so*"
cd /usr/lib64/
ls -al | grep cuda
ln -s libcuda.so.470.57.02 libcudnn.so.8

In case you also see that error:

Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice

Yuo can just pass the following env. variable while running the script:

XLA_FLAGS=--xla_gpu_cuda_data_dir=<your cuda file location>

To look at GPU utilization stat and cuda lib versions run this:

nvidia-smi

In order to monitor GPU utilization, you can do it with the same command as well:

watch -d -n 0.5 nvidia-smi

So in the end you'll have updated info with 0.5 s. period. But don't set this number too low - this way of monitoring GPU utilization is not the most performant one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment