sukhitashvili/tip_for_multi_GPU_training.md

Last active January 26, 2023 13:54

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/sukhitashvili/41569b090a782305fd839e733326ef8d.js"></script>
Save sukhitashvili/41569b090a782305fd839e733326ef8d to your computer and use it in GitHub Desktop.

Tip for multi GPU training

Raw

As a tip for multi GPU training, I want to say that there is rule called Linear Scaling Rule which claims that: When the minibatch size is multiplied by k, multiply the learning rate by k. . Here is the paper for full explanation and my shorter explanation is that with k number of GPUs your batch size is increased by k and thus you do k times less number of training iterations with given number of epochs. So, they find that you can simply scale your LR to overcome this Issue and train model K times faster to achieve the same accuracy performance as you could on single GPU machine with given number of epochs (but K times slower).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment