Skip to content

Instantly share code, notes, and snippets.

@denji
Forked from zrruziev/NUMA node problem.md
Created May 7, 2024 01:25
Show Gist options
  • Save denji/ece738b03d44f8fa64da23e47f6c5e39 to your computer and use it in GitHub Desktop.
Save denji/ece738b03d44f8fa64da23e47f6c5e39 to your computer and use it in GitHub Desktop.

Revisions

  1. @zrruziev zrruziev revised this gist Apr 27, 2022. 1 changed file with 18 additions and 13 deletions.
    31 changes: 18 additions & 13 deletions NUMA node problem.md
    Original file line number Diff line number Diff line change
    @@ -5,38 +5,43 @@ In other words, it is a technology to increase memory access efficiency while us

    ## 1. Check Nodes
    ```bash
    $ lspci | grep -i nvidia
    01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 12GB] (rev a1)
    01:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)
    lspci | grep -i nvidia

    01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 12GB] (rev a1)
    01:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)
    ```
    The first line shows the address of the VGA compatible device, NVIDIA Geforce, as **01:00** . Each one will be different, so let's change this part carefully.
    ## 2. Check and change NUMA setting values
    If you go to `/sys/bus/pci/devicecs/`, you can see the following list:
    ```bash
    $ ls /sys/bus/pci/devices/
    0000:00:00.0 0000:00:06.0 0000:00:15.0 0000:00:1c.0 0000:00:1f.3 0000:00:1f.6 0000:02:00.0
    0000:00:01.0 0000:00:14.0 0000:00:16.0 0000:00:1d.0 0000:00:1f.4 0000:01:00.0
    0000:00:02.0 0000:00:14.2 0000:00:17.0 0000:00:1f.0 0000:00:1f.5 0000:01:00.1
    ls /sys/bus/pci/devices/

    0000:00:00.0 0000:00:06.0 0000:00:15.0 0000:00:1c.0 0000:00:1f.3 0000:00:1f.6 0000:02:00.0
    0000:00:01.0 0000:00:14.0 0000:00:16.0 0000:00:1d.0 0000:00:1f.4 0000:01:00.0
    0000:00:02.0 0000:00:14.2 0000:00:17.0 0000:00:1f.0 0000:00:1f.5 0000:01:00.1
    ```
    01:00.0 checked above is visible. However, 0000: is attached in front.

    ## 3. Check if it is connected.
    ```bash
    $ cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
    -1
    cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node

    -1
    ```
    -1 means no connection, 0 means connected.

    ## 4. Fix it with the command below.
    ```bash
    $ sudo echo 0 | sudo tee -a /sys/bus/pci/devices/0000\:01\:00.0/numa_node
    0
    sudo echo 0 | sudo tee -a /sys/bus/pci/devices/0000\:01\:00.0/numa_node

    0
    ```
    It shows 0 which means connected!

    ## 5. Check again:
    ```bash
    $ cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
    0
    cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node

    0
    ```
    That's it!
  2. @zrruziev zrruziev revised this gist Apr 7, 2022. 1 changed file with 4 additions and 2 deletions.
    6 changes: 4 additions & 2 deletions NUMA node problem.md
    Original file line number Diff line number Diff line change
    @@ -29,12 +29,14 @@ $ cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node

    ## 4. Fix it with the command below.
    ```bash
    echo 0 | sudo tee -a /sys/bus/pci/devices/0000\:01\:00.0/numa_node
    $ sudo echo 0 | sudo tee -a /sys/bus/pci/devices/0000\:01\:00.0/numa_node
    0
    ```
    It shows 0 which means connected!

    ## 5. Check again:
    ```bash
    $ cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
    0
    ```
    It shows 0 which means connected!
    That's it!
  3. @zrruziev zrruziev revised this gist Apr 7, 2022. No changes.
  4. @zrruziev zrruziev created this gist Apr 7, 2022.
    40 changes: 40 additions & 0 deletions NUMA node problem.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,40 @@
    # What is NUMA (Non-Uniformed Memory Access)

    **Non-Uniform Memory Access (NUMA)** is one of the computer memory design methods used in multiprocessor systems, and the time to access the memory varies depending on the relative position between the memory and the processor. In the NUMA architecture, when a processor accesses its local memory, it is faster than when it accesses the remote memory. Remote memory refers to memory that is connected to another processor, and local memory refers to memory that is connected to its own processor.
    In other words, it is a technology to increase memory access efficiency while using multiple processors on one motherboard. When a specific processor runs out of memory, it monopolizes the bus by itself, so other processors have to play. , and designate 'access only here', and call it a NUMA node.

    ## 1. Check Nodes
    ```bash
    $ lspci | grep -i nvidia
    01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 12GB] (rev a1)
    01:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)
    ```
    The first line shows the address of the VGA compatible device, NVIDIA Geforce, as **01:00** . Each one will be different, so let's change this part carefully.
    ## 2. Check and change NUMA setting values
    If you go to `/sys/bus/pci/devicecs/`, you can see the following list:
    ```bash
    $ ls /sys/bus/pci/devices/
    0000:00:00.0 0000:00:06.0 0000:00:15.0 0000:00:1c.0 0000:00:1f.3 0000:00:1f.6 0000:02:00.0
    0000:00:01.0 0000:00:14.0 0000:00:16.0 0000:00:1d.0 0000:00:1f.4 0000:01:00.0
    0000:00:02.0 0000:00:14.2 0000:00:17.0 0000:00:1f.0 0000:00:1f.5 0000:01:00.1
    ```
    01:00.0 checked above is visible. However, 0000: is attached in front.

    ## 3. Check if it is connected.
    ```bash
    $ cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
    -1
    ```
    -1 means no connection, 0 means connected.

    ## 4. Fix it with the command below.
    ```bash
    echo 0 | sudo tee -a /sys/bus/pci/devices/0000\:01\:00.0/numa_node
    ```

    ## 5. Check again:
    ```bash
    $ cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
    0
    ```
    It shows 0 which means connected!