sukhitashvili · January 26, 2023 13:54 · Jan 26, 2023 · Jan 26, 2023
diff --git a/tip_for_multi_GPU_training.md b/tip_for_multi_GPU_training.md
@@ -1,6 +1 @@
-as a tip for multi GPU training, I want to say that there is rule called Linear Scaling Rule 
-which claims that: When the minibatch size is multiplied by k, multiply the learning rate by k. Here is the paper for 
-full explanation and my shorter explanation is that with k number of GPUs your batch size is increased by k and thus 
-you do k times less number of training iterations with given number of epochs. 
-So, they find that you can simply scale your LR to overcome this Issue and train model K times faster to 
-achieve the same accuracy performance as you could on single GPU machine with given number of epochs (but K times slower).
+As a tip for multi GPU training, I want to say that there is rule called **Linear Scaling Rule** which claims that: _When the minibatch size is multiplied by k, multiply the learning rate by k._ . Here is [the paper](https://arxiv.org/abs/1706.02677) for full explanation and my shorter explanation is that with `k` number of GPUs your batch size is increased by `k` and thus you do `k` times less number of training iterations with given number of epochs. So, they find that you can simply scale your LR to overcome this Issue and train model K times faster to achieve the same accuracy performance as you could on single GPU machine with given number of epochs (but K times slower).
diff --git a/tip_for_multi_GPU_training.md b/tip_for_multi_GPU_training.md
@@ -0,0 +1,6 @@
+as a tip for multi GPU training, I want to say that there is rule called Linear Scaling Rule 
+which claims that: When the minibatch size is multiplied by k, multiply the learning rate by k. Here is the paper for 
+full explanation and my shorter explanation is that with k number of GPUs your batch size is increased by k and thus 
+you do k times less number of training iterations with given number of epochs. 
+So, they find that you can simply scale your LR to overcome this Issue and train model K times faster to 
+achieve the same accuracy performance as you could on single GPU machine with given number of epochs (but K times slower).