Last active
July 12, 2019 16:40
-
-
Save chexov/e3aa4e1edaf4460c96ce5e0cdd15e9d6 to your computer and use it in GitHub Desktop.
Revisions
-
chexov revised this gist
Jul 12, 2019 . No changes.There are no files selected for viewing
-
chexov revised this gist
Jul 12, 2019 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -34,7 +34,7 @@ apt-mark hold kubelet kubeadm kubectl ## Prepare docker vim /etc/systemd/system/multi-user.target.wants/docker.service //# ExecStart=/usr/bin/dockerd -H fd:// --exec-opt native.cgroupdriver=systemd ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime # Reload docker -
chexov revised this gist
Jul 12, 2019 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -88,6 +88,7 @@ vim /etc/docker/daemon.json "runtimeArgs": [] } }, "insecure-registries" : ["blender.local:5000"], "default-runtime": "nvidia" } -
chexov revised this gist
Jul 9, 2019 . 1 changed file with 3 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -107,6 +107,9 @@ git clone https://github.com/NVIDIA/gpu-monitoring-tools.git cd exporters/prometheus-dcgm/k8s kubectl -n monitoring create -f node-exporter/gpu-only-node-exporter-daemonset.yaml ## Dashboard kubectl --context=vgk8s apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta1/aio/deploy/recommended.yaml ### How to join new node -
chexov revised this gist
Jul 2, 2019 . 1 changed file with 12 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -107,3 +107,15 @@ git clone https://github.com/NVIDIA/gpu-monitoring-tools.git cd exporters/prometheus-dcgm/k8s kubectl -n monitoring create -f node-exporter/gpu-only-node-exporter-daemonset.yaml ### How to join new node ``` kubeadm join 10.0.1.111:6443 --token h1g0io.rkumkap1hg0f3lo3 \ --discovery-token-ca-cert-hash sha256:f30c5cc979f72da8a68a2e32b2faff4adfe0c2e49b3dddff32f5c6045415f7e3 ``` ``` sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config ``` -
chexov created this gist
Jul 2, 2019 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,109 @@ ## Install docker-ce // from https://gist.github.com/Brainiarc7/a8ab5f89494d053003454efc3be2d2ef For starters, ensure that you've installed the latest Docker Community edition by following the steps below: curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo apt-key fingerprint 0EBFCD88 sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" sudo apt-get update sudo apt-get install docker-ce sudo service docker restart ## Install nvidia-docker2 First, if you have older nvidia-docker installations, purge the installation and all associated GPU containers: docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f sudo apt-get purge -y nvidia-docker ### sudo apt-get install nvidia-container-runtime sudo apt-get install nvidia-docker2 ## Install kubeadm, kubelet apt-get update && apt-get install -y apt-transport-https curl curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - cat <<EOF >/etc/apt/sources.list.d/kubernetes.list deb https://apt.kubernetes.io/ kubernetes-xenial main EOF apt-get update apt-get install -y kubelet kubeadm kubectl apt-mark hold kubelet kubeadm kubectl ## Prepare docker vim /etc/systemd/system/multi-user.target.wants/docker.service # ExecStart=/usr/bin/dockerd -H fd:// --exec-opt native.cgroupdriver=systemd ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime # Reload docker sudo systemctl daemon-reload sudo systemctl restart docker ## Join the cluster party swapoff -a kubeadm join 10.0.1.111:6443 --token h1g0io.rkumkap1hg0f3lo3 --discovery-token-ca-cert-hash sha256:f30c5cc979f72da8a68a2e32b2faff4adfe0c2e49b3dddff32f5c6045415f7e3 ## Drain the node kubectl drain <node name> --delete-local-data --force --ignore-daemonsets kubectl drain tanh --delete-local-data --force --ignore-daemonsets ## Install prometheus ``` git clone https://github.com/coreos/kube-prometheus.git cd kube-prometheus/ kubectl create -f manifests/ kubectl apply -f manifests/; sleep 4.2; kubectl apply -f manifests/ ``` ### Monitoring graphs 252 kubectl --namespace monitoring port-forward svc/grafana 3000 254 kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090 ## Install MetalLB `kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.7.3/manifests/metallb.yaml` ## Install cert-manager `kubectl apply -f kube/cert-manager/0.7-cert-manager.yaml` ## Install nginx-ingress `kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yaml` ## GPU support sudo vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf Environment="KUBELET_EXTRA_ARGS=--feature-gates=DevicePlugins=true" vim /etc/docker/daemon.json { "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }, "default-runtime": "nvidia" } sudo systemctl daemon-reload sudo systemctl restart kubelet.service sudo systemctl restart docker # Check if docker runs via nvidia runtime by default docker run --rm --security-opt=no-new-privileges --cap-drop=ALL --network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins nvidia/k8s-device-plugin:1.0.0-beta # GPU kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml # nvidia gpu monitoring git clone https://github.com/NVIDIA/gpu-monitoring-tools.git cd exporters/prometheus-dcgm/k8s kubectl -n monitoring create -f node-exporter/gpu-only-node-exporter-daemonset.yaml