Skip to content

Instantly share code, notes, and snippets.

@sukhitashvili
Last active January 26, 2023 13:54
Show Gist options
  • Save sukhitashvili/41569b090a782305fd839e733326ef8d to your computer and use it in GitHub Desktop.
Save sukhitashvili/41569b090a782305fd839e733326ef8d to your computer and use it in GitHub Desktop.
Tip for multi GPU training

As a tip for multi GPU training, I want to say that there is rule called Linear Scaling Rule which claims that: When the minibatch size is multiplied by k, multiply the learning rate by k. . Here is the paper for full explanation and my shorter explanation is that with k number of GPUs your batch size is increased by k and thus you do k times less number of training iterations with given number of epochs. So, they find that you can simply scale your LR to overcome this Issue and train model K times faster to achieve the same accuracy performance as you could on single GPU machine with given number of epochs (but K times slower).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment