- Install all required software:
docker,nvidia-docker,gitlab-ci-multi-runner - Execute: curl -s http://localhost:3476/docker/cli
- Use that data to fill devices/volumes/volume_driver fields in /etc/gitlab-runner/config.toml
-
-
Save Hopobcn/e38726fac4da272341b0e36ef464c744 to your computer and use it in GitHub Desktop.
| concurrent = 1 | |
| check_interval = 0 | |
| [[runners]] | |
| name = "Docker runner <---complete-me--->" | |
| url = "https://<---complete-me---->" | |
| token = "28ce17edc8ea7437f3e49969c86341" | |
| executor = "docker" | |
| [runners.docker] | |
| tls_verify = false | |
| image = "nvidia/cuda" | |
| privileged = false | |
| disable_cache = false | |
| devices = ["/dev/nvidiactl", "/dev/nvidia-uvm", "/dev/nvidia-uvm-tools", "/dev/nvidia3", "/dev/nvidia2", "/dev/nvidia1", "/dev/nvidia0"] | |
| volumes = ["/cache", "nvidia_driver_384.81:/usr/local/nvidia:ro"] | |
| volume_driver = "nvidia-docker" | |
| shm_size = 0 | |
| [runners.cache] |
Is there a newer method ?
I have tried by installing the nvidia-docker, + docker + the runner itself. Then I set only the runtime parameter of the runner to be "nvidia" and the executor to be "docker" but tensorflow for example doesn't detect the GPUs at all.
The following config.toml provides GPU support (notice the runtime parameter).
concurrent = 1
check_interval = 0
[[runners]]
name = "Docker runner <---complete-me--->"
url = "https://<---complete-me---->"
token = "28ce17edc8ea7437f3e49969c86341"
executor = "docker"
[runners.docker]
tls_verify = false
image = "nvidia/cuda"
privileged = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
runtime = "nvidia"
[runners.cache]Yet, is it not clear to me how to restrict the GPUs assigned to the runner, on a multi-GPU server. This functionality is named "GPU isolation".
The docker run command for GPU isolation follows: please notice the -e NVIDIA_VISIBLE_DEVICES=0. how can this be set for the runner in config.toml?
docker run --runtime=nvidia --rm -e NVIDIA_VISIBLE_DEVICES=0 nvidia/cuda:9.0-base nvidia-smiIn the [[runers]] section there's and environment keyword to define environment vars. But I guess that it wont work because you have to specify that environment var to docker.
So the only way I see is to specify NVIDIA_VISIBLE_DEVICES directly in the Dockerfile
https://github.com/NVIDIA/nvidia-docker/wiki/Usage#dockerfiles
It seems that environment in [[runners]] section is exactly what we were looking for.
Actually, whatever environment variable setting that happens before running the script section of the .gitlab-ci.yml configuration file is ok. See the following two examples: both of them worked for me.
Example 1: using gitlab-runner configuration only
In /etc/gitlab-runner/config.toml:
[[runners]]
name = "runner-gpu0-test"
url = "<url>"
token = "<token>"
executor = "docker"
environment = ["NVIDIA_VISIBLE_DEVICES=0"] # <== Notice this
[runners.docker]
runtime = "nvidia" # <== Notice this
tls_verify = false
image = "nvidia/cuda:9.0-base"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[[runners]]
name = "runner-gpu1-test"
url = "<url>"
token = "<token>"
executor = "docker"
environment = ["NVIDIA_VISIBLE_DEVICES=1"] # <== Notice this
[runners.docker]
runtime = "nvidia" # <== Notice this
tls_verify = false
image = "nvidia/cuda:9.0-base"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]The .gitlab-ci.yml file.
image: nvidia/cuda:9.0-base
test:run_on_gpu0:
stage: test
script:
- echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
- nvidia-smi
- sleep 10s
tags:
- docker
- gpu0
test:run_on_gpu1:
stage: test
script:
- echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
- nvidia-smi
- sleep 7s
tags:
- docker
- gpu1The two runners have been tagged with docker, gpu0 and docker, gpu1 respectively.
Example2: using Gitlab CI custom environment variables
Gitlab CI custom environment variables
/etc/gitlab-runner/config.toml same as Example 1.
The .gitlab-ci.yml file.
image: nvidia/cuda:9.0-base
variables:
NVIDIA_VISIBLE_DEVICES: "3" # This is going to override definition(s) in /etc/gitlab-runner/config.toml
test:run_on_gpu0:
stage: test
script:
- echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
- nvidia-smi
- sleep 10s
tags:
- docker
- gpu0
test:run_on_gpu1:
stage: test
script:
- echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
- nvidia-smi
- sleep 7s
tags:
- docker
- gpu1Do you guys know how to make it work with docker v19.03.2 which integrates native support for nvidia gpus?
The runtime = "nvidia"does not work anymore, containers should be executed with --gpus flag now.
docker run -it --rm --gpus all ubuntu nvidia-smi
it is an open issue and, looking at the comments, it does not seem to be fixed soon.
I am using Docker 19.03 together with nvidia-docker2. This provides the new --gpu switch, while keeping the compatibility with the old --runtime switch (refer to https://github.com/NVIDIA/nvidia-docker/tree/master#upgrading-with-nvidia-docker2-deprecated).
This method is outdated.