Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save liemle3893/a3e2dcf931f8aa537f0cf3dc9066b55a to your computer and use it in GitHub Desktop.
Save liemle3893/a3e2dcf931f8aa537f0cf3dc9066b55a to your computer and use it in GitHub Desktop.

Revisions

  1. @angrycub angrycub revised this gist Aug 29, 2018. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions Nomad_Data_Dir_Cleanup.md
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,7 @@

    ## Issue

    When attemting to remove all of the data in the Nomad data directory, several directories and files are unable to be deleted. Many messages are logged to the console like:
    When attempting to remove all of the data in the Nomad data directory, several directories and files are unable to be deleted. Many messages are logged to the console like:

    ```
    rm: cannot remove ‘alloc/736f61b9-d7dc-cb73-0dd1-76b1b2ba032d/nomad-ui/secrets’: Device or resource busy
    @@ -40,7 +40,7 @@ The HTTP API's `system/gc` endpoint can be used to tell Nomad to immediately mak
    curl -XPUT http://127.0.0.1:4646/v1/system/gc
    ```

    This porcess does require that the Nomad client process be up and availible and that the nodes are drained of all running allocations. Draining the node is necessary to ensure that all of the allocations are stopped and eligible for garbage collation when you run it manually.
    This process does require that the Nomad client process be up and availible and that the nodes are drained of all running allocations. Draining the node is necessary to ensure that all of the allocations are stopped and eligible for garbage collation when you run it manually.

    Monitor the `alloc` directory in the Nomad data directory to verify that all of the allocations have been garbage-collected before stopping Nomad.

  2. @angrycub angrycub revised this gist Aug 8, 2017. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions Nomad_Data_Dir_Cleanup.md
    Original file line number Diff line number Diff line change
    @@ -13,6 +13,8 @@ rm: cannot remove ‘alloc/ddcf5a78-5497-f4a4-a101-221fc4e0180b/fabio/proc/7464/
    rm: cannot remove ‘alloc/ddcf5a78-5497-f4a4-a101-221fc4e0180b/fabio/proc/7464/timers’: Read-only file system
    ```

    ## Process

    Run `ps aux | grep nomad | grep executor | grep -v grep` to verify that there are no running executor processes on the node. Any executors listed by the previous command should be stopped before deleting the allocation state.

    When Nomad creates the environment for the allocation, several folders can be are mounted into the allocation to provide state storage.
  3. @angrycub angrycub created this gist Aug 8, 2017.
    54 changes: 54 additions & 0 deletions Nomad_Data_Dir_Cleanup.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,54 @@
    # HOWTO: Clean Up Nomad Data Directory

    ## Issue

    When attemting to remove all of the data in the Nomad data directory, several directories and files are unable to be deleted. Many messages are logged to the console like:

    ```
    rm: cannot remove ‘alloc/736f61b9-d7dc-cb73-0dd1-76b1b2ba032d/nomad-ui/secrets’: Device or resource busy
    rm: cannot remove ‘alloc/ddcf5a78-5497-f4a4-a101-221fc4e0180b/fabio/alloc’: Device or resource busy
    rm: cannot remove ‘alloc/ddcf5a78-5497-f4a4-a101-221fc4e0180b/fabio/secrets’: Device or resource busy
    rm: cannot remove ‘alloc/ddcf5a78-5497-f4a4-a101-221fc4e0180b/fabio/proc/7464/projid_map’: Read-only file system
    rm: cannot remove ‘alloc/ddcf5a78-5497-f4a4-a101-221fc4e0180b/fabio/proc/7464/setgroups’: Read-only file system
    rm: cannot remove ‘alloc/ddcf5a78-5497-f4a4-a101-221fc4e0180b/fabio/proc/7464/timers’: Read-only file system
    ```

    Run `ps aux | grep nomad | grep executor | grep -v grep` to verify that there are no running executor processes on the node. Any executors listed by the previous command should be stopped before deleting the allocation state.

    When Nomad creates the environment for the allocation, several folders can be are mounted into the allocation to provide state storage.

    - nomad/alloc/«task-alloc-id»/«task-name»/secrets
    - nomad/alloc/«task-alloc-id»/«task-name»/dev
    - nomad/alloc/«task-alloc-id»/«task-name»/proc
    - nomad/alloc/«task-alloc-id»/«task-name»/alloc

    These folders remain mounted in stopped instances by design. Nomad uses a garbage collection process to unmount these folders during its cleanup.
    This allows an operator to inspect the state of an allocation that has yet to be garbage collected even if the operator chooses to stop the Nomad process.

    Ordinarily they are cleaned up by Nomad's garbage collection process and would not cause any issues; however, an operator has two options for removing these directories beyond Nomad's internally scheduled garbage collection.

    * Stimulate a garbage collection run using the `system/gc` API endpoint
    * Unmount the folders manually

    ### Stimulate Garbage Collection

    The HTTP API's `system/gc` endpoint can be used to tell Nomad to immediately make a garbage collection pass. In the case of a completely drained client node, this will server to remove any remaining allocation state data and would prevent your encountering the read-only tmpfs mounts.

    ```
    curl -XPUT http://127.0.0.1:4646/v1/system/gc
    ```

    This porcess does require that the Nomad client process be up and availible and that the nodes are drained of all running allocations. Draining the node is necessary to ensure that all of the allocations are stopped and eligible for garbage collation when you run it manually.

    Monitor the `alloc` directory in the Nomad data directory to verify that all of the allocations have been garbage-collected before stopping Nomad.

    ### Manual Unmounting

    Since these read-only file systems are regular Linux mounts, you can use the `umount` command to unmount them. This could be done in a one-liner similar to the following:

    ```
    export NOMAD_DATA_ROOT=«Path to your Nomad data_dir»
    for ALLOC in `ls -d $NOMAD_DATA_ROOT/alloc/*`; do for JOB in `ls ${ALLOC}| grep -v alloc`; do umount ${ALLOC}/${JOB}/secrets; umount ${ALLOC}/${JOB}/dev; umount ${ALLOC}/${JOB}/proc; umount ${ALLOC}/${JOB}/alloc; done; done
    ```

    Once the directories are unmounted, the remaining files and directories can be deleted.