Skip to content

Instantly share code, notes, and snippets.

@FreesoSaiFared
Forked from pangyuteng/README.md
Created June 16, 2024 01:26
Show Gist options
  • Save FreesoSaiFared/70fbccd79eff7f6e70e7626c4cc89c4d to your computer and use it in GitHub Desktop.
Save FreesoSaiFared/70fbccd79eff7f6e70e7626c4cc89c4d to your computer and use it in GitHub Desktop.

Revisions

  1. @pangyuteng pangyuteng revised this gist Oct 21, 2023. 1 changed file with 6 additions and 0 deletions.
    6 changes: 6 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -126,4 +126,10 @@ sudo modprobe nvidia

    + run `nvidia-smi` to confirm presense of gpu.

    + install nvidia container toolkit
    https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

    + confirm install success `docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi`



  2. @pangyuteng pangyuteng revised this gist Aug 9, 2022. 1 changed file with 13 additions and 12 deletions.
    25 changes: 13 additions & 12 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -46,22 +46,23 @@ Fan at 9480RPM
    GPU Temp 60 C per nvidia-smi
    Fan1 | 9720 RPM | ok
    Fan2 | 9600 RPM | ok
    Fan3 | 9480 RPM | ok
    Fan4 | 9480 RPM | ok
    Fan5 | 9840 RPM | ok
    Fan6 | 9360 RPM | ok
    Inlet Temp | 29 degrees C | ok
    Exhaust Temp | 34 degrees C | ok
    Temp | 47 degrees C | ok
    Fan1 | 8520 RPM | ok
    Fan2 | 8400 RPM | ok
    Fan3 | 8520 RPM | ok
    Fan4 | 9360 RPM | ok
    Fan5 | 10560 RPM | ok
    Fan6 | 9960 RPM | ok
    Inlet Temp | 33 degrees C | ok
    Exhaust Temp | 37 degrees C | ok
    Temp | 55 degrees C | ok
    Temp | 46 degrees C | ok
    Current 1 | 3 Amps | ok
    Current 1 | 2.60 Amps | ok
    Current 2 | no reading | ns
    Voltage 1 | 118 Volts | ok
    Voltage 1 | 110 Volts | ok
    Voltage 2 | no reading | ns
    Pwr Consumption | 294 Watts | ok
    Chassis Temp 34C
    Chassis Temp 38C
    ```

  3. @pangyuteng pangyuteng revised this gist Aug 6, 2022. 1 changed file with 12 additions and 2 deletions.
    14 changes: 12 additions & 2 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -46,10 +46,20 @@ Fan at 9480RPM
    GPU Temp 60 C per nvidia-smi
    Fan1 | 9720 RPM | ok
    Fan2 | 9600 RPM | ok
    Fan3 | 9480 RPM | ok
    Fan4 | 9480 RPM | ok
    Fan5 | 9840 RPM | ok
    Fan6 | 9360 RPM | ok
    Inlet Temp | 29 degrees C | ok
    Exhaust Temp | 34 degrees C | ok
    Temp | 54 degrees C | ok
    Temp | 48 degrees C | ok
    Temp | 47 degrees C | ok
    Temp | 46 degrees C | ok
    Current 1 | 3 Amps | ok
    Current 2 | no reading | ns
    Voltage 1 | 118 Volts | ok
    Voltage 2 | no reading | ns
    Chassis Temp 34C
  4. @pangyuteng pangyuteng revised this gist Aug 6, 2022. 1 changed file with 27 additions and 0 deletions.
    27 changes: 27 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -30,6 +30,33 @@
    http://www.righteoushack.net/dell-poweredge-13th-gen-fan-noise/
    https://www.reddit.com/r/Proxmox/comments/uf2d7l/proxmox_tesla_m40_passthrough_ubuntu_server_vm/iif2en3/?context=3

    + current settings, likely not optimal.
    ```
    ssh idrac
    racadm set system.thermalsettings.AirExhaustTemp 255
    racadm set system.thermalsettings.FanSpeedOffset 0
    racadm set system.thermalsettings.ThermalProfile 0
    racadm set system.thermalsettings.ThirdPartyPCIFanResponse 1
    racadm get system.thermalsettings
    ```
    ```
    Fan at 9480RPM
    GPU Temp 60 C per nvidia-smi
    Inlet Temp | 29 degrees C | ok
    Exhaust Temp | 34 degrees C | ok
    Temp | 54 degrees C | ok
    Temp | 48 degrees C | ok
    Chassis Temp 34C
    ```



    ### proxmox

    + follow below link and stop prior section "Configuring the VM (Windows 10)", note the modifications listed below.
  5. @pangyuteng pangyuteng revised this gist Aug 6, 2022. 1 changed file with 21 additions and 5 deletions.
    26 changes: 21 additions & 5 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -1,20 +1,36 @@

    + hardware

    ```
    Dell Poweredge R720
    Nvidia Tesla P40 24GB
    GPU pass-through via Proxmox
    ```
    ### psu

    + ? am using only one 1100w psu, the other 1100w is not plugged in.

    + dont have a UPS yet.

    ### idrac

    + upgrade idrac firmware, so you get to power on/off via http.

    ### risers

    + if installing at riser3, ensure the pci-e is x16 - some servers may come with a riser with 2 slots each with x8.

    + enable "3rd Party Card fan behavior" - this allowed the GPU temp to hover at around 60C - while being used, Fan hovers in the range of 4200 to 7000 RPM. (typical GPU temp in a server room with AC will be around 55C under load). The Inlet and Exhaust Temp at 31 and 38C respectively (server is placed in the garage with no AC, with outdoor temp at 29C).
    ### gpu power cables

    + Please be mindful when purchasing the gpu power cable, there are 2 kinds of gpu power cables for dell servers PCI risers, one for the Nvidia Telsas and one for consumer "general purpose" GPUs. This is a step that SHALL NOT GO WRONG or else you may fry your server&GPU! read more here: https://kenmoini.com/post/2021/03/fun-with-servers-and-gpus

    ### fan control

    + determine if you should enable/disable "3rd Party Card fan behavior"
    + (?) enabling this allowed the GPU temp to hover at around 60C - while being used, Fan hovers in the range of 4200 to 7000 RPM. (typical GPU temp in a server room with AC will be around 55C under load). The Inlet and Exhaust Temp at 31 and 38C respectively (server is placed in the garage with no AC, with outdoor temp at 29C).

    http://www.righteoushack.net/dell-poweredge-13th-gen-fan-noise/
    https://www.reddit.com/r/Proxmox/comments/uf2d7l/proxmox_tesla_m40_passthrough_ubuntu_server_vm/iif2en3/?context=3

    + Lastly, please be mindful when purchasing the gpu power cable, there are 2 kinds of gpu power cables for dell servers PCI risers, one for the Nvidia Telsas and one for consumer "general purpose" GPUs. This is a step that SHALL NOT GO WRONG or else you may fry your server&GPU! read more here: https://kenmoini.com/post/2021/03/fun-with-servers-and-gpus

    ### proxmox

    + follow below link and stop prior section "Configuring the VM (Windows 10)", note the modifications listed below.
    https://gist.github.com/qubidt/64f617e959725e934992b080e677656f
  6. @pangyuteng pangyuteng revised this gist Aug 5, 2022. 1 changed file with 4 additions and 1 deletion.
    5 changes: 4 additions & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -6,10 +6,13 @@
    Nvidia Tesla P40 24GB
    ```

    from https://www.reddit.com/r/Proxmox/comments/uf2d7l/proxmox_tesla_m40_passthrough_ubuntu_server_vm/iif2en3/?context=3


    + enable "3rd Party Card fan behavior" - this allowed the GPU temp to hover at around 60C - while being used, Fan hovers in the range of 4200 to 7000 RPM. (typical GPU temp in a server room with AC will be around 55C under load). The Inlet and Exhaust Temp at 31 and 38C respectively (server is placed in the garage with no AC, with outdoor temp at 29C).

    http://www.righteoushack.net/dell-poweredge-13th-gen-fan-noise/
    https://www.reddit.com/r/Proxmox/comments/uf2d7l/proxmox_tesla_m40_passthrough_ubuntu_server_vm/iif2en3/?context=3

    + Lastly, please be mindful when purchasing the gpu power cable, there are 2 kinds of gpu power cables for dell servers PCI risers, one for the Nvidia Telsas and one for consumer "general purpose" GPUs. This is a step that SHALL NOT GO WRONG or else you may fry your server&GPU! read more here: https://kenmoini.com/post/2021/03/fun-with-servers-and-gpus


  7. @pangyuteng pangyuteng revised this gist Aug 5, 2022. 1 changed file with 7 additions and 7 deletions.
    14 changes: 7 additions & 7 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -6,6 +6,13 @@
    Nvidia Tesla P40 24GB
    ```

    from https://www.reddit.com/r/Proxmox/comments/uf2d7l/proxmox_tesla_m40_passthrough_ubuntu_server_vm/iif2en3/?context=3

    + enable "3rd Party Card fan behavior" - this allowed the GPU temp to hover at around 60C - while being used, Fan hovers in the range of 4200 to 7000 RPM. (typical GPU temp in a server room with AC will be around 55C under load). The Inlet and Exhaust Temp at 31 and 38C respectively (server is placed in the garage with no AC, with outdoor temp at 29C).

    + Lastly, please be mindful when purchasing the gpu power cable, there are 2 kinds of gpu power cables for dell servers PCI risers, one for the Nvidia Telsas and one for consumer "general purpose" GPUs. This is a step that SHALL NOT GO WRONG or else you may fry your server&GPU! read more here: https://kenmoini.com/post/2021/03/fun-with-servers-and-gpus


    + follow below link and stop prior section "Configuring the VM (Windows 10)", note the modifications listed below.
    https://gist.github.com/qubidt/64f617e959725e934992b080e677656f

    @@ -62,11 +69,4 @@ sudo modprobe nvidia

    + run `nvidia-smi` to confirm presense of gpu.

    --

    from https://www.reddit.com/r/Proxmox/comments/uf2d7l/proxmox_tesla_m40_passthrough_ubuntu_server_vm/iif2en3/?context=3

    + enable "3rd Party Card fan behavior" - this allowed the GPU temp to hover at around 60C - while being used, Fan hovers in the range of 4200 to 7000 RPM. (typical GPU temp in a server room with AC will be around 55C under load). The Inlet and Exhaust Temp at 31 and 38C respectively (server is placed in the garage with no AC, with outdoor temp at 29C).

    + Lastly, please be mindful when purchasing the gpu power cable, there are 2 kinds of gpu power cables for dell servers PCI risers, one for the Nvidia Telsas and one for consumer "general purpose" GPUs. This is a step that SHALL NOT GO WRONG or else you may fry your server&GPU! read more here: https://kenmoini.com/post/2021/03/fun-with-servers-and-gpus

  8. @pangyuteng pangyuteng revised this gist Aug 5, 2022. 1 changed file with 10 additions and 1 deletion.
    11 changes: 10 additions & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -60,4 +60,13 @@ sudo modprobe nvidia
    # https://unix.stackexchange.com/questions/219059/remove-nouveau-driver-nvidia-without-rebooting
    ```

    + run `nvidia-smi` to confirm presense of gpu.
    + run `nvidia-smi` to confirm presense of gpu.

    --

    from https://www.reddit.com/r/Proxmox/comments/uf2d7l/proxmox_tesla_m40_passthrough_ubuntu_server_vm/iif2en3/?context=3

    + enable "3rd Party Card fan behavior" - this allowed the GPU temp to hover at around 60C - while being used, Fan hovers in the range of 4200 to 7000 RPM. (typical GPU temp in a server room with AC will be around 55C under load). The Inlet and Exhaust Temp at 31 and 38C respectively (server is placed in the garage with no AC, with outdoor temp at 29C).

    + Lastly, please be mindful when purchasing the gpu power cable, there are 2 kinds of gpu power cables for dell servers PCI risers, one for the Nvidia Telsas and one for consumer "general purpose" GPUs. This is a step that SHALL NOT GO WRONG or else you may fry your server&GPU! read more here: https://kenmoini.com/post/2021/03/fun-with-servers-and-gpus

  9. @pangyuteng pangyuteng revised this gist Aug 5, 2022. 1 changed file with 4 additions and 3 deletions.
    7 changes: 4 additions & 3 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -10,11 +10,12 @@
    https://gist.github.com/qubidt/64f617e959725e934992b080e677656f

    + in proxmox web interface, select vm, for hostpci - check `All Functions`,`ROM-Bar`,`PCI-Express`

    + edit your `/etc/pve/qemu-server/${VM_ID}.conf` per below

    + for vm BIOS, use `Default (SeaBIOS)`.
    + for vm Machine, use `q35`.
    + edit vm conf `/etc/pve/qemu-server/${VM_ID}.conf` per below

    ```
    machine: q35
    cpu: host,hidden=1,flags=+pcid
    args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
    ```
  10. @pangyuteng pangyuteng revised this gist Aug 5, 2022. 1 changed file with 9 additions and 0 deletions.
    9 changes: 9 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -18,6 +18,7 @@ machine: q35
    cpu: host,hidden=1,flags=+pcid
    args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
    ```

    ```
    # below is based on needs
    cores: 10
    @@ -27,23 +28,31 @@ boot: order=scsi0;net0
    scsihw: virtio-scsi-pci
    hostpci0: 0000:42:00,pcie=1
    ```

    + turn on the vm.

    + when installing ubuntu, don't install the driver.

    + boot up vm

    + check if gpu is present

    ```
    lspci | grep 01:00
    ```

    + install driver

    ```
    sudo apt-add-repository -r ppa:graphics-drivers/ppa
    sudo apt update
    sudo apt remove nvidia*
    sudo apt autoremove
    sudo ubuntu-drivers autoinstall
    ```

    + run `nvidia-smi` and get complaints, run below.

    ```
    sudo rmmod nouveau
    sudo modprobe nvidia
  11. @pangyuteng pangyuteng revised this gist Aug 5, 2022. 1 changed file with 21 additions and 12 deletions.
    33 changes: 21 additions & 12 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -6,39 +6,48 @@
    Nvidia Tesla P40 24GB
    ```

    + follow below link and stop prior section "Configuring the VM (Windows 10)".
    + follow below link and stop prior section "Configuring the VM (Windows 10)", note the modifications listed below.
    https://gist.github.com/qubidt/64f617e959725e934992b080e677656f

    ```
    in proxmox web interface, select vm, for hostpci - check `All Functions`,`ROM-Bar`,`PCI-Express`
    + in proxmox web interface, select vm, for hostpci - check `All Functions`,`ROM-Bar`,`PCI-Express`

    edit your `/etc/pve/qemu-server/${VM_ID}.conf` per below
    + edit your `/etc/pve/qemu-server/${VM_ID}.conf` per below

    ```
    machine: q35
    cpu: host,hidden=1,flags=+pcid
    args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
    ```
    ```
    # below is based on needs
    cores: 10
    memory: 262144
    scsi0: local-lvm:vm-100-disk-0,size=768G
    boot: order=scsi0;net0
    scsihw: virtio-scsi-pci
    hostpci0: 0000:42:00,pcie=1
    ```
    + turn on the vm.
    + when installing ubuntu, don't install the driver.
    + boot up vm

    --
    + check if gpu is present
    ```
    lspci | grep 01:00
    ```
    + install driver
    ```
    sudo apt-add-repository -r ppa:graphics-drivers/ppa
    sudo apt update
    sudo apt remove nvidia*
    sudo apt autoremove
    sudo ubuntu-drivers autoinstall
    https://unix.stackexchange.com/questions/219059/remove-nouveau-driver-nvidia-without-rebooting
    + run `nvidia-smi` and get complaints, run below.
    ```
    sudo rmmod nouveau
    sudo modprobe nvidia
    # https://unix.stackexchange.com/questions/219059/remove-nouveau-driver-nvidia-without-rebooting
    ```

    + run `nvidia-smi` to confirm presense of gpu.
  12. @pangyuteng pangyuteng revised this gist Aug 5, 2022. 1 changed file with 14 additions and 6 deletions.
    20 changes: 14 additions & 6 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -10,16 +10,24 @@
    https://gist.github.com/qubidt/64f617e959725e934992b080e677656f

    ```
    cpu: host,hidden=1
    in proxmox web interface, select vm, for hostpci - check `All Functions`,`ROM-Bar`,`PCI-Express`
    edit your `/etc/pve/qemu-server/${VM_ID}.conf` per below
    BIOS: default SeaBIOS
    Machine: default i440fx+
    OS: ubuntu 20.04
    for pci device i only checked "all functions" and "ROM-Bar"
    machine: q35
    cpu: host,hidden=1,flags=+pcid
    args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
    for driver installation, i only had to do the following.
    # below is based on needs
    cores: 10
    memory: 262144
    scsi0: local-lvm:vm-100-disk-0,size=768G
    boot: order=scsi0;net0
    scsihw: virtio-scsi-pci
    hostpci0: 0000:42:00,pcie=1
    --
    sudo apt-add-repository -r ppa:graphics-drivers/ppa
    sudo apt update
  13. @pangyuteng pangyuteng revised this gist Aug 5, 2022. 1 changed file with 24 additions and 1 deletion.
    25 changes: 24 additions & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -9,5 +9,28 @@
    + follow below link and stop prior section "Configuring the VM (Windows 10)".
    https://gist.github.com/qubidt/64f617e959725e934992b080e677656f

    + hola
    ```
    cpu: host,hidden=1
    BIOS: default SeaBIOS
    Machine: default i440fx+
    OS: ubuntu 20.04
    for pci device i only checked "all functions" and "ROM-Bar"
    for driver installation, i only had to do the following.
    sudo apt-add-repository -r ppa:graphics-drivers/ppa
    sudo apt update
    sudo apt remove nvidia*
    sudo apt autoremove
    sudo ubuntu-drivers autoinstall
    https://unix.stackexchange.com/questions/219059/remove-nouveau-driver-nvidia-without-rebooting
    sudo rmmod nouveau
    sudo modprobe nvidia
    ```

  14. @pangyuteng pangyuteng revised this gist Aug 4, 2022. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -6,7 +6,7 @@
    Nvidia Tesla P40 24GB
    ```

    + follow below all the way down, and stop prior section Configuring the VM (Windows 10)".
    + follow below link and stop prior section "Configuring the VM (Windows 10)".
    https://gist.github.com/qubidt/64f617e959725e934992b080e677656f

    + hola
  15. @pangyuteng pangyuteng created this gist Aug 4, 2022.
    13 changes: 13 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,13 @@

    + hardware

    ```
    Dell Poweredge R720
    Nvidia Tesla P40 24GB
    ```

    + follow below all the way down, and stop prior section Configuring the VM (Windows 10)".
    https://gist.github.com/qubidt/64f617e959725e934992b080e677656f

    + hola