Skip to content

Instantly share code, notes, and snippets.

@josemarcosrf
Last active November 13, 2023 20:11
Show Gist options
  • Save josemarcosrf/2a7e124eb1b5d05217d7e7ebf67a664f to your computer and use it in GitHub Desktop.
Save josemarcosrf/2a7e124eb1b5d05217d7e7ebf67a664f to your computer and use it in GitHub Desktop.

Revisions

  1. josemarcosrf revised this gist Nov 13, 2023. 1 changed file with 34 additions and 34 deletions.
    68 changes: 34 additions & 34 deletions nvidia-time-slicing.md
    Original file line number Diff line number Diff line change
    @@ -2,46 +2,46 @@

    a. Create a file called `time-slicing-config-all.yaml` with the following contents:

    ```yaml
    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: time-slicing-config-all
    data:
    any: |-
    version: v1
    flags:
    migStrategy: none
    sharing:
    timeSlicing:
    resources:
    - name: nvidia.com/gpu
    replicas: 4
    ```
    ```yaml
    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: time-slicing-config-all
    data:
    any: |-
    version: v1
    flags:
    migStrategy: none
    sharing:
    timeSlicing:
    resources:
    - name: nvidia.com/gpu
    replicas: 4
    ```
    b. Create the namespace for the operator and add the config map
    ```sh
    kubectl create ns gpu-operator
    kubectl create -n gpu-operator -f time-slicing-config-all.yaml
    ```
    ```sh
    kubectl create ns gpu-operator
    kubectl create -n gpu-operator -f time-slicing-config-all.yaml
    ```

    2. Add nvidia's helm repository

    > See: [docs.nvidia](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html)
    > See: [docs.nvidia](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html)

    ```sh
    # helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
    helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
    helm repo update
    helm install gpu-operator nvidia/gpu-operator \
    -n gpu-operator \
    --set devicePlugin.config.name=time-slicing-config-all
    ```
    ```sh
    # helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
    helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
    helm repo update
    helm install gpu-operator nvidia/gpu-operator \
    -n gpu-operator \
    --set devicePlugin.config.name=time-slicing-config-all
    ```

    3. Apply the time-slicing configuration

    > See [docs.nvidia](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html#applying-one-cluster-wide-configuration)
    > See [docs.nvidia](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html#applying-one-cluster-wide-configuration)

    ```sh
    kubectl patch clusterpolicy/cluster-policy \
    @@ -53,9 +53,9 @@ kubectl label node <node-name> nvidia.com/device-plugin.config=any

    4. Verify time-slicing

    > see [docs.nvidia](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html#time-slicing-verify)
    > see [docs.nvidia](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html#time-slicing-verify)

    ```sh
    kubectl describe node <node-name>
    ```
    ```sh
    kubectl describe node <node-name>
    ```

  2. josemarcosrf created this gist Nov 13, 2023.
    61 changes: 61 additions & 0 deletions nvidia-time-slicing.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,61 @@
    1. Define the time-slicing configuration.

    a. Create a file called `time-slicing-config-all.yaml` with the following contents:

    ```yaml
    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: time-slicing-config-all
    data:
    any: |-
    version: v1
    flags:
    migStrategy: none
    sharing:
    timeSlicing:
    resources:
    - name: nvidia.com/gpu
    replicas: 4
    ```

    b. Create the namespace for the operator and add the config map

    ```sh
    kubectl create ns gpu-operator
    kubectl create -n gpu-operator -f time-slicing-config-all.yaml
    ```

    2. Add nvidia's helm repository

    > See: [docs.nvidia](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html)
    ```sh
    # helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
    helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
    helm repo update
    helm install gpu-operator nvidia/gpu-operator \
    -n gpu-operator \
    --set devicePlugin.config.name=time-slicing-config-all
    ```

    3. Apply the time-slicing configuration

    > See [docs.nvidia](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html#applying-one-cluster-wide-configuration)

    ```sh
    kubectl patch clusterpolicy/cluster-policy \
    -n gpu-operator --type merge \
    -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config-all", "default": "any"}}}}'
    kubectl label node <node-name> nvidia.com/device-plugin.config=any
    ```

    4. Verify time-slicing

    > see [docs.nvidia](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html#time-slicing-verify)

    ```sh
    kubectl describe node <node-name>
    ```