Skip to content

Instantly share code, notes, and snippets.

@awdemos
Last active February 1, 2025 18:53
Show Gist options
  • Save awdemos/be193e66f211255237df25d212002340 to your computer and use it in GitHub Desktop.
Save awdemos/be193e66f211255237df25d212002340 to your computer and use it in GitHub Desktop.

Revisions

  1. awdemos revised this gist Feb 1, 2025. 1 changed file with 29 additions and 1 deletion.
    30 changes: 29 additions & 1 deletion How many GPUs needed for GPT-3.md
    Original file line number Diff line number Diff line change
    @@ -26,33 +26,61 @@ While 10 GPUs exceed the minimum requirement, this provides headroom for larger

    Citations:
    [1] https://ai.stackexchange.com/questions/22877/how-much-computing-power-does-it-cost-to-run-gpt-3

    [2] https://news.ycombinator.com/item?id=33881504

    [3] https://hackernoon.com/a-deep-dive-into-how-many-gpus-it-takes-to-run-chatgpt

    [4] https://www.reddit.com/r/OpenAI/comments/10aocxc/does_anyone_have_any_hard_numbers_on_the_gpu/

    [5] https://llmgpuhelper.com/en/blog/optimizing-gpt3-multi-gpu-training

    [6] https://developer.nvidia.com/blog/scaling-language-model-training-to-a-trillion-parameters-using-megatron/

    [7] https://lambdalabs.com/blog/demystifying-gpt-3

    [8] https://ai.gopubby.com/multi-gpu-model-training-made-easy-with-distributed-data-parallel-ddp-453ba9f6846e?gi=a737dc56a3e4
    [9] https://blog.spheron.network/how-much-gpu-memory-is-required-to-run-a-large-language-model-find-out-here

    [10] https://developer.nvidia.com/blog/openai-presents-gpt-3-a-175-billion-parameters-language-model/

    [11] https://developer.nvidia.com/blog/deploying-a-1-3b-gpt-3-model-with-nvidia-nemo-megatron/

    [12] https://news.ycombinator.com/item?id=37674913

    [13] https://blogs.oracle.com/research/post/oracle-first-to-finetune-gpt3-sized-ai-models-with-nvidia-a100-gpu

    [14] https://en.wikipedia.org/wiki/GPT-3

    [15] https://www.reddit.com/r/nvidia/comments/113euip/openai_trained_chat_gpt_on_10k_a100s/

    [16] https://www.fierceelectronics.com/sensors/chatgpt-runs-10k-nvidia-training-gpus-potential-thousands-more

    [17] https://www.lesswrong.com/posts/HBisQEDajGwhWirky/how-feasible-costly-would-it-be-to-train-a-very-large-ai

    [18] https://developer.nvidia.com/blog/efficiently-scale-llm-training-across-a-large-gpu-cluster-with-alpa-and-ray/

    [19] https://lambdalabs.com/blog/demystifying-gpt-3

    [20] https://www.reddit.com/r/singularity/comments/inp025/if_you_want_to_run_your_own_full_gpt3_instance/
    [21] https://techcommunity.microsoft.com/blog/machinelearningblog/unlocking-the-power-of-large-scale-training-in-ai/4303390

    [22] https://hackernoon.com/a-deep-dive-into-how-many-gpus-it-takes-to-run-chatgpt

    [23] https://rethinkpriorities.org/research-area/gpt-3-like-models-are-now-much-easier-to-access-and-deploy-than-to-develop/

    [24] https://company.hpc-ai.com/blog/train-18-billion-parameter-gpt-models-with-a-single-gpu-on-your-personal-computer
    [25] https://arstechnica.com/civis/threads/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi.1490659/page-2

    [25] https://arstechnica.com/civis/threads/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberrypi.1490659/page-2

    [26] https://www.weka.io/blog/gpu/gpu-for-ai/

    [27] https://www.reddit.com/r/MachineLearning/comments/gzb5uv/d_what_would_it_take_to_run_openais_gpt3_on/

    [28] https://www.reddit.com/r/OpenAI/comments/11fwfjg/what_kind_of_pc_would_you_need_to_have_a_model/

    [29] https://www.reddit.com/r/GPT3/comments/zufeg9/how_long_before_we_can_run_gpt3_locally/

    [30] https://rethinkpriorities.org/research-area/the-replication-and-emulation-of-gpt-3/

    [31] https://arxiv.org/pdf/2104.04473.pdf
  2. awdemos renamed this gist Feb 1, 2025. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  3. awdemos created this gist Feb 1, 2025.
    58 changes: 58 additions & 0 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,58 @@
    GPT-3's computational requirements depend heavily on model size, precision, and hardware optimization. For the full 175B-parameter model, **running inference requires at least 5 NVIDIA A100 GPUs (80GB each)** to meet the 350GB memory requirement when using FP16 precision[4][9][22]. However, practical implementations often use 8 GPUs for improved parallelism and throughput[3][22].

    ### Key Technical Requirements:
    - **Memory**:
    - 175B parameters require **350GB of VRAM** at FP16 precision (2 bytes/parameter)[4][9][14].
    - Consumer GPUs like RTX 3090s (24GB VRAM) can technically work in multi-GPU setups (e.g., 8x24GB = 192GB), but require aggressive memory optimizations like 8-bit quantization[2][28].

    - **Hardware Recommendations**:
    - **Data center GPUs**: 5–8 NVIDIA A100/A800 (80GB) GPUs for stable deployment[3][4][13].
    - **Consumer GPUs**: 8x RTX 3090/4090 with PCIe 5.0 and NVLink for reduced communication bottlenecks[2][28].

    ### Performance Considerations:
    - **Inference speed**:
    - A single A100 generates ~1 word every 350ms for GPT-3[3].
    - An 8-GPU cluster achieves **15–20 words/sec** with batch size 1[3][22].
    - **Cost**:
    - Cloud deployment costs ~$6–7/hour for 8xA100 instances[2].
    - On-prem setups with 8xRTX 3090s cost ~$10K for hardware[2].

    ### Optimization Techniques:
    1. **Model Parallelism**: Split layers across GPUs to overcome memory limits[5][6].
    2. **Quantization**: 8-bit weights reduce memory usage to ~1 byte/parameter[2][9].
    3. **KV Caching**: Reuse attention computations to reduce redundant calculations[9].

    While 10 GPUs exceed the minimum requirement, this provides headroom for larger batch sizes or hybrid training/inference workloads. Oracle demonstrated GPT-3-sized model inference with 8xA100 GPUs[13], confirming feasibility for high-end deployments.

    Citations:
    [1] https://ai.stackexchange.com/questions/22877/how-much-computing-power-does-it-cost-to-run-gpt-3
    [2] https://news.ycombinator.com/item?id=33881504
    [3] https://hackernoon.com/a-deep-dive-into-how-many-gpus-it-takes-to-run-chatgpt
    [4] https://www.reddit.com/r/OpenAI/comments/10aocxc/does_anyone_have_any_hard_numbers_on_the_gpu/
    [5] https://llmgpuhelper.com/en/blog/optimizing-gpt3-multi-gpu-training
    [6] https://developer.nvidia.com/blog/scaling-language-model-training-to-a-trillion-parameters-using-megatron/
    [7] https://lambdalabs.com/blog/demystifying-gpt-3
    [8] https://ai.gopubby.com/multi-gpu-model-training-made-easy-with-distributed-data-parallel-ddp-453ba9f6846e?gi=a737dc56a3e4
    [9] https://blog.spheron.network/how-much-gpu-memory-is-required-to-run-a-large-language-model-find-out-here
    [10] https://developer.nvidia.com/blog/openai-presents-gpt-3-a-175-billion-parameters-language-model/
    [11] https://developer.nvidia.com/blog/deploying-a-1-3b-gpt-3-model-with-nvidia-nemo-megatron/
    [12] https://news.ycombinator.com/item?id=37674913
    [13] https://blogs.oracle.com/research/post/oracle-first-to-finetune-gpt3-sized-ai-models-with-nvidia-a100-gpu
    [14] https://en.wikipedia.org/wiki/GPT-3
    [15] https://www.reddit.com/r/nvidia/comments/113euip/openai_trained_chat_gpt_on_10k_a100s/
    [16] https://www.fierceelectronics.com/sensors/chatgpt-runs-10k-nvidia-training-gpus-potential-thousands-more
    [17] https://www.lesswrong.com/posts/HBisQEDajGwhWirky/how-feasible-costly-would-it-be-to-train-a-very-large-ai
    [18] https://developer.nvidia.com/blog/efficiently-scale-llm-training-across-a-large-gpu-cluster-with-alpa-and-ray/
    [19] https://lambdalabs.com/blog/demystifying-gpt-3
    [20] https://www.reddit.com/r/singularity/comments/inp025/if_you_want_to_run_your_own_full_gpt3_instance/
    [21] https://techcommunity.microsoft.com/blog/machinelearningblog/unlocking-the-power-of-large-scale-training-in-ai/4303390
    [22] https://hackernoon.com/a-deep-dive-into-how-many-gpus-it-takes-to-run-chatgpt
    [23] https://rethinkpriorities.org/research-area/gpt-3-like-models-are-now-much-easier-to-access-and-deploy-than-to-develop/
    [24] https://company.hpc-ai.com/blog/train-18-billion-parameter-gpt-models-with-a-single-gpu-on-your-personal-computer
    [25] https://arstechnica.com/civis/threads/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi.1490659/page-2
    [26] https://www.weka.io/blog/gpu/gpu-for-ai/
    [27] https://www.reddit.com/r/MachineLearning/comments/gzb5uv/d_what_would_it_take_to_run_openais_gpt3_on/
    [28] https://www.reddit.com/r/OpenAI/comments/11fwfjg/what_kind_of_pc_would_you_need_to_have_a_model/
    [29] https://www.reddit.com/r/GPT3/comments/zufeg9/how_long_before_we_can_run_gpt3_locally/
    [30] https://rethinkpriorities.org/research-area/the-replication-and-emulation-of-gpt-3/
    [31] https://arxiv.org/pdf/2104.04473.pdf