Skip to content

Instantly share code, notes, and snippets.

@chexov
Last active July 12, 2019 16:40
Show Gist options
  • Save chexov/e3aa4e1edaf4460c96ce5e0cdd15e9d6 to your computer and use it in GitHub Desktop.
Save chexov/e3aa4e1edaf4460c96ce5e0cdd15e9d6 to your computer and use it in GitHub Desktop.

Revisions

  1. chexov revised this gist Jul 12, 2019. No changes.
  2. chexov revised this gist Jul 12, 2019. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion inhouse-kubernetes.md
    Original file line number Diff line number Diff line change
    @@ -34,7 +34,7 @@ apt-mark hold kubelet kubeadm kubectl

    ## Prepare docker
    vim /etc/systemd/system/multi-user.target.wants/docker.service
    # ExecStart=/usr/bin/dockerd -H fd:// --exec-opt native.cgroupdriver=systemd
    //# ExecStart=/usr/bin/dockerd -H fd:// --exec-opt native.cgroupdriver=systemd
    ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime

    # Reload docker
  3. chexov revised this gist Jul 12, 2019. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions inhouse-kubernetes.md
    Original file line number Diff line number Diff line change
    @@ -88,6 +88,7 @@ vim /etc/docker/daemon.json
    "runtimeArgs": []
    }
    },
    "insecure-registries" : ["blender.local:5000"],
    "default-runtime": "nvidia"
    }

  4. chexov revised this gist Jul 9, 2019. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions inhouse-kubernetes.md
    Original file line number Diff line number Diff line change
    @@ -107,6 +107,9 @@ git clone https://github.com/NVIDIA/gpu-monitoring-tools.git
    cd exporters/prometheus-dcgm/k8s
    kubectl -n monitoring create -f node-exporter/gpu-only-node-exporter-daemonset.yaml

    ## Dashboard
    kubectl --context=vgk8s apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta1/aio/deploy/recommended.yaml


    ### How to join new node

  5. chexov revised this gist Jul 2, 2019. 1 changed file with 12 additions and 0 deletions.
    12 changes: 12 additions & 0 deletions inhouse-kubernetes.md
    Original file line number Diff line number Diff line change
    @@ -107,3 +107,15 @@ git clone https://github.com/NVIDIA/gpu-monitoring-tools.git
    cd exporters/prometheus-dcgm/k8s
    kubectl -n monitoring create -f node-exporter/gpu-only-node-exporter-daemonset.yaml


    ### How to join new node

    ```
    kubeadm join 10.0.1.111:6443 --token h1g0io.rkumkap1hg0f3lo3 \
    --discovery-token-ca-cert-hash sha256:f30c5cc979f72da8a68a2e32b2faff4adfe0c2e49b3dddff32f5c6045415f7e3
    ```

    ```
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config
    ```
  6. chexov created this gist Jul 2, 2019.
    109 changes: 109 additions & 0 deletions inhouse-kubernetes.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,109 @@
    ## Install docker-ce
    // from https://gist.github.com/Brainiarc7/a8ab5f89494d053003454efc3be2d2ef

    For starters, ensure that you've installed the latest Docker Community edition by following the steps below:

    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    sudo apt-key fingerprint 0EBFCD88
    sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
    sudo apt-get update
    sudo apt-get install docker-ce
    sudo service docker restart

    ## Install nvidia-docker2
    First, if you have older nvidia-docker installations, purge the installation and all associated GPU containers:

    docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
    sudo apt-get purge -y nvidia-docker


    ###
    sudo apt-get install nvidia-container-runtime
    sudo apt-get install nvidia-docker2

    ## Install kubeadm, kubelet
    apt-get update && apt-get install -y apt-transport-https curl
    curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
    cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
    deb https://apt.kubernetes.io/ kubernetes-xenial main
    EOF
    apt-get update
    apt-get install -y kubelet kubeadm kubectl
    apt-mark hold kubelet kubeadm kubectl


    ## Prepare docker
    vim /etc/systemd/system/multi-user.target.wants/docker.service
    # ExecStart=/usr/bin/dockerd -H fd:// --exec-opt native.cgroupdriver=systemd
    ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime

    # Reload docker
    sudo systemctl daemon-reload
    sudo systemctl restart docker

    ## Join the cluster party
    swapoff -a
    kubeadm join 10.0.1.111:6443 --token h1g0io.rkumkap1hg0f3lo3 --discovery-token-ca-cert-hash sha256:f30c5cc979f72da8a68a2e32b2faff4adfe0c2e49b3dddff32f5c6045415f7e3


    ## Drain the node
    kubectl drain <node name> --delete-local-data --force --ignore-daemonsets
    kubectl drain tanh --delete-local-data --force --ignore-daemonsets




    ## Install prometheus
    ```
    git clone https://github.com/coreos/kube-prometheus.git
    cd kube-prometheus/
    kubectl create -f manifests/
    kubectl apply -f manifests/; sleep 4.2; kubectl apply -f manifests/
    ```
    ### Monitoring graphs
    252 kubectl --namespace monitoring port-forward svc/grafana 3000
    254 kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090



    ## Install MetalLB
    `kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.7.3/manifests/metallb.yaml`

    ## Install cert-manager
    `kubectl apply -f kube/cert-manager/0.7-cert-manager.yaml`

    ## Install nginx-ingress
    `kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yaml`


    ## GPU support
    sudo vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
    Environment="KUBELET_EXTRA_ARGS=--feature-gates=DevicePlugins=true"

    vim /etc/docker/daemon.json
    {
    "runtimes": {
    "nvidia": {
    "path": "/usr/bin/nvidia-container-runtime",
    "runtimeArgs": []
    }
    },
    "default-runtime": "nvidia"
    }

    sudo systemctl daemon-reload
    sudo systemctl restart kubelet.service
    sudo systemctl restart docker


    # Check if docker runs via nvidia runtime by default
    docker run --rm --security-opt=no-new-privileges --cap-drop=ALL --network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins nvidia/k8s-device-plugin:1.0.0-beta

    # GPU
    kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml

    # nvidia gpu monitoring
    git clone https://github.com/NVIDIA/gpu-monitoring-tools.git
    cd exporters/prometheus-dcgm/k8s
    kubectl -n monitoring create -f node-exporter/gpu-only-node-exporter-daemonset.yaml