Last active
          August 18, 2024 08:34 
        
      - 
      
- 
        Save Arunprakash-A/c27ebe06e6c8fbd21263fc54013bbf49 to your computer and use it in GitHub Desktop. 
Revisions
- 
        Arunprakash-A revised this gist Jul 19, 2024 . 1 changed file with 4 additions and 5 deletions.There are no files selected for viewingThis file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1388,11 +1388,10 @@ }, "source": [ "* Note that the GPU memory required to train the model is 7 GB (as if we used SGD).\n", "* This approach gives us a better test performance.\n", "* BS:1, GAS:10 then in 100 iterations, # of weight updates will be 10 \n", "* BS:2, GAS:10 then in 50 iterations, # of weight updates will be 5 \n", "* BS:10, GAS:10 then in 1 iterations, # of weight updates will be 1 \n" ] }, { 
- 
        Arunprakash-A revised this gist Jul 19, 2024 . 1 changed file with 5 additions and 1 deletion.There are no files selected for viewingThis file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1388,7 +1388,11 @@ }, "source": [ "* Note that the GPU memory required to train the model is 7 GB (as if we used SGD).\n", "* This approach gives us a better test performance.", "* BS:1, GAS:10 then in 100 iterations, # of weight updates will be 10", "* BS:2, GAS:10 then in 50 iterations, # of weight updates will be 5", "* BS:10, GAS:10 then in 1 iterations, # of weight updates will be 1" ] }, { 
- 
        Arunprakash-A revised this gist Jul 17, 2024 . 1 changed file with 12 additions and 1 deletion.There are no files selected for viewingThis file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,15 @@ { "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "<a href=\"https://colab.research.google.com/gist/Arunprakash-A/c27ebe06e6c8fbd21263fc54013bbf49/gradientaccumulation-for-continual-pretraining.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" ] }, { "cell_type": "markdown", "metadata": { @@ -1394,7 +1404,8 @@ "metadata": { "colab": { "provenance": [], "gpuType": "T4", "include_colab_link": true }, "kernelspec": { "display_name": "Python(hf)", 
- 
        Arunprakash-A created this gist Jul 17, 2024 .There are no files selected for viewing