Skip to content

Instantly share code, notes, and snippets.

@sukhitashvili
Last active January 26, 2023 13:54
Show Gist options
  • Save sukhitashvili/41569b090a782305fd839e733326ef8d to your computer and use it in GitHub Desktop.
Save sukhitashvili/41569b090a782305fd839e733326ef8d to your computer and use it in GitHub Desktop.

Revisions

  1. sukhitashvili revised this gist Jan 26, 2023. 1 changed file with 1 addition and 6 deletions.
    7 changes: 1 addition & 6 deletions tip_for_multi_GPU_training.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1 @@
    as a tip for multi GPU training, I want to say that there is rule called Linear Scaling Rule
    which claims that: When the minibatch size is multiplied by k, multiply the learning rate by k. Here is the paper for
    full explanation and my shorter explanation is that with k number of GPUs your batch size is increased by k and thus
    you do k times less number of training iterations with given number of epochs.
    So, they find that you can simply scale your LR to overcome this Issue and train model K times faster to
    achieve the same accuracy performance as you could on single GPU machine with given number of epochs (but K times slower).
    As a tip for multi GPU training, I want to say that there is rule called **Linear Scaling Rule** which claims that: _When the minibatch size is multiplied by k, multiply the learning rate by k._ . Here is [the paper](https://arxiv.org/abs/1706.02677) for full explanation and my shorter explanation is that with `k` number of GPUs your batch size is increased by `k` and thus you do `k` times less number of training iterations with given number of epochs. So, they find that you can simply scale your LR to overcome this Issue and train model K times faster to achieve the same accuracy performance as you could on single GPU machine with given number of epochs (but K times slower).
  2. sukhitashvili created this gist Jan 26, 2023.
    6 changes: 6 additions & 0 deletions tip_for_multi_GPU_training.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,6 @@
    as a tip for multi GPU training, I want to say that there is rule called Linear Scaling Rule
    which claims that: When the minibatch size is multiplied by k, multiply the learning rate by k. Here is the paper for
    full explanation and my shorter explanation is that with k number of GPUs your batch size is increased by k and thus
    you do k times less number of training iterations with given number of epochs.
    So, they find that you can simply scale your LR to overcome this Issue and train model K times faster to
    achieve the same accuracy performance as you could on single GPU machine with given number of epochs (but K times slower).