Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Arunprakash-A/c27ebe06e6c8fbd21263fc54013bbf49 to your computer and use it in GitHub Desktop.
Save Arunprakash-A/c27ebe06e6c8fbd21263fc54013bbf49 to your computer and use it in GitHub Desktop.

Revisions

  1. Arunprakash-A revised this gist Jul 19, 2024. 1 changed file with 4 additions and 5 deletions.
    9 changes: 4 additions & 5 deletions gradientaccumulation-for-continual-pretraining.ipynb
    Original file line number Diff line number Diff line change
    @@ -1388,11 +1388,10 @@
    },
    "source": [
    "* Note that the GPU memory required to train the model is 7 GB (as if we used SGD).\n",
    "* This approach gives us a better test performance.",

    "* BS:1, GAS:10 then in 100 iterations, # of weight updates will be 10",
    "* BS:2, GAS:10 then in 50 iterations, # of weight updates will be 5",
    "* BS:10, GAS:10 then in 1 iterations, # of weight updates will be 1"
    "* This approach gives us a better test performance.\n",
    "* BS:1, GAS:10 then in 100 iterations, # of weight updates will be 10 \n",
    "* BS:2, GAS:10 then in 50 iterations, # of weight updates will be 5 \n",
    "* BS:10, GAS:10 then in 1 iterations, # of weight updates will be 1 \n"
    ]
    },
    {
  2. Arunprakash-A revised this gist Jul 19, 2024. 1 changed file with 5 additions and 1 deletion.
    6 changes: 5 additions & 1 deletion gradientaccumulation-for-continual-pretraining.ipynb
    Original file line number Diff line number Diff line change
    @@ -1388,7 +1388,11 @@
    },
    "source": [
    "* Note that the GPU memory required to train the model is 7 GB (as if we used SGD).\n",
    "* This approach gives us a better test performance."
    "* This approach gives us a better test performance.",

    "* BS:1, GAS:10 then in 100 iterations, # of weight updates will be 10",
    "* BS:2, GAS:10 then in 50 iterations, # of weight updates will be 5",
    "* BS:10, GAS:10 then in 1 iterations, # of weight updates will be 1"
    ]
    },
    {
  3. Arunprakash-A revised this gist Jul 17, 2024. 1 changed file with 12 additions and 1 deletion.
    13 changes: 12 additions & 1 deletion gradientaccumulation-for-continual-pretraining.ipynb
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,15 @@
    {
    "cells": [
    {
    "cell_type": "markdown",
    "metadata": {
    "id": "view-in-github",
    "colab_type": "text"
    },
    "source": [
    "<a href=\"https://colab.research.google.com/gist/Arunprakash-A/c27ebe06e6c8fbd21263fc54013bbf49/gradientaccumulation-for-continual-pretraining.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {
    @@ -1394,7 +1404,8 @@
    "metadata": {
    "colab": {
    "provenance": [],
    "gpuType": "T4"
    "gpuType": "T4",
    "include_colab_link": true
    },
    "kernelspec": {
    "display_name": "Python(hf)",
  4. Arunprakash-A created this gist Jul 17, 2024.
    6,554 changes: 6,554 additions & 0 deletions gradientaccumulation-for-continual-pretraining.ipynb
    6,554 additions, 0 deletions not shown because the diff is too large. Please use a local Git client to view these changes.