Last active
January 26, 2023 13:54
-
-
Save sukhitashvili/41569b090a782305fd839e733326ef8d to your computer and use it in GitHub Desktop.
Revisions
-
sukhitashvili revised this gist
Jan 26, 2023 . 1 changed file with 1 addition and 6 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,6 +1 @@ As a tip for multi GPU training, I want to say that there is rule called **Linear Scaling Rule** which claims that: _When the minibatch size is multiplied by k, multiply the learning rate by k._ . Here is [the paper](https://arxiv.org/abs/1706.02677) for full explanation and my shorter explanation is that with `k` number of GPUs your batch size is increased by `k` and thus you do `k` times less number of training iterations with given number of epochs. So, they find that you can simply scale your LR to overcome this Issue and train model K times faster to achieve the same accuracy performance as you could on single GPU machine with given number of epochs (but K times slower). -
sukhitashvili created this gist
Jan 26, 2023 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,6 @@ as a tip for multi GPU training, I want to say that there is rule called Linear Scaling Rule which claims that: When the minibatch size is multiplied by k, multiply the learning rate by k. Here is the paper for full explanation and my shorter explanation is that with k number of GPUs your batch size is increased by k and thus you do k times less number of training iterations with given number of epochs. So, they find that you can simply scale your LR to overcome this Issue and train model K times faster to achieve the same accuracy performance as you could on single GPU machine with given number of epochs (but K times slower).