As a tip for multi GPU training, I want to say that there is rule called Linear Scaling Rule which claims that: When the minibatch size is multiplied by k, multiply the learning rate by k. . Here is the paper for full explanation and my shorter explanation is that with k number of GPUs your batch size is increased by k and thus you do k times less number of training iterations with given number of epochs. So, they find that you can simply scale your LR to overcome this Issue and train model K times faster to achieve the same accuracy performance as you could on single GPU machine with given number of epochs (but K times slower).
Last active
January 26, 2023 13:54
-
-
Save sukhitashvili/41569b090a782305fd839e733326ef8d to your computer and use it in GitHub Desktop.
Tip for multi GPU training
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment