Last active
February 1, 2025 18:53
-
-
Save awdemos/be193e66f211255237df25d212002340 to your computer and use it in GitHub Desktop.
Revisions
-
awdemos revised this gist
Feb 1, 2025 . 1 changed file with 29 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -26,33 +26,61 @@ While 10 GPUs exceed the minimum requirement, this provides headroom for larger Citations: [1] https://ai.stackexchange.com/questions/22877/how-much-computing-power-does-it-cost-to-run-gpt-3 [2] https://news.ycombinator.com/item?id=33881504 [3] https://hackernoon.com/a-deep-dive-into-how-many-gpus-it-takes-to-run-chatgpt [4] https://www.reddit.com/r/OpenAI/comments/10aocxc/does_anyone_have_any_hard_numbers_on_the_gpu/ [5] https://llmgpuhelper.com/en/blog/optimizing-gpt3-multi-gpu-training [6] https://developer.nvidia.com/blog/scaling-language-model-training-to-a-trillion-parameters-using-megatron/ [7] https://lambdalabs.com/blog/demystifying-gpt-3 [8] https://ai.gopubby.com/multi-gpu-model-training-made-easy-with-distributed-data-parallel-ddp-453ba9f6846e?gi=a737dc56a3e4 [9] https://blog.spheron.network/how-much-gpu-memory-is-required-to-run-a-large-language-model-find-out-here [10] https://developer.nvidia.com/blog/openai-presents-gpt-3-a-175-billion-parameters-language-model/ [11] https://developer.nvidia.com/blog/deploying-a-1-3b-gpt-3-model-with-nvidia-nemo-megatron/ [12] https://news.ycombinator.com/item?id=37674913 [13] https://blogs.oracle.com/research/post/oracle-first-to-finetune-gpt3-sized-ai-models-with-nvidia-a100-gpu [14] https://en.wikipedia.org/wiki/GPT-3 [15] https://www.reddit.com/r/nvidia/comments/113euip/openai_trained_chat_gpt_on_10k_a100s/ [16] https://www.fierceelectronics.com/sensors/chatgpt-runs-10k-nvidia-training-gpus-potential-thousands-more [17] https://www.lesswrong.com/posts/HBisQEDajGwhWirky/how-feasible-costly-would-it-be-to-train-a-very-large-ai [18] https://developer.nvidia.com/blog/efficiently-scale-llm-training-across-a-large-gpu-cluster-with-alpa-and-ray/ [19] https://lambdalabs.com/blog/demystifying-gpt-3 [20] https://www.reddit.com/r/singularity/comments/inp025/if_you_want_to_run_your_own_full_gpt3_instance/ [21] https://techcommunity.microsoft.com/blog/machinelearningblog/unlocking-the-power-of-large-scale-training-in-ai/4303390 [22] https://hackernoon.com/a-deep-dive-into-how-many-gpus-it-takes-to-run-chatgpt [23] https://rethinkpriorities.org/research-area/gpt-3-like-models-are-now-much-easier-to-access-and-deploy-than-to-develop/ [24] https://company.hpc-ai.com/blog/train-18-billion-parameter-gpt-models-with-a-single-gpu-on-your-personal-computer [25] https://arstechnica.com/civis/threads/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberrypi.1490659/page-2 [26] https://www.weka.io/blog/gpu/gpu-for-ai/ [27] https://www.reddit.com/r/MachineLearning/comments/gzb5uv/d_what_would_it_take_to_run_openais_gpt3_on/ [28] https://www.reddit.com/r/OpenAI/comments/11fwfjg/what_kind_of_pc_would_you_need_to_have_a_model/ [29] https://www.reddit.com/r/GPT3/comments/zufeg9/how_long_before_we_can_run_gpt3_locally/ [30] https://rethinkpriorities.org/research-area/the-replication-and-emulation-of-gpt-3/ [31] https://arxiv.org/pdf/2104.04473.pdf -
awdemos renamed this gist
Feb 1, 2025 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
awdemos created this gist
Feb 1, 2025 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,58 @@ GPT-3's computational requirements depend heavily on model size, precision, and hardware optimization. For the full 175B-parameter model, **running inference requires at least 5 NVIDIA A100 GPUs (80GB each)** to meet the 350GB memory requirement when using FP16 precision[4][9][22]. However, practical implementations often use 8 GPUs for improved parallelism and throughput[3][22]. ### Key Technical Requirements: - **Memory**: - 175B parameters require **350GB of VRAM** at FP16 precision (2 bytes/parameter)[4][9][14]. - Consumer GPUs like RTX 3090s (24GB VRAM) can technically work in multi-GPU setups (e.g., 8x24GB = 192GB), but require aggressive memory optimizations like 8-bit quantization[2][28]. - **Hardware Recommendations**: - **Data center GPUs**: 5–8 NVIDIA A100/A800 (80GB) GPUs for stable deployment[3][4][13]. - **Consumer GPUs**: 8x RTX 3090/4090 with PCIe 5.0 and NVLink for reduced communication bottlenecks[2][28]. ### Performance Considerations: - **Inference speed**: - A single A100 generates ~1 word every 350ms for GPT-3[3]. - An 8-GPU cluster achieves **15–20 words/sec** with batch size 1[3][22]. - **Cost**: - Cloud deployment costs ~$6–7/hour for 8xA100 instances[2]. - On-prem setups with 8xRTX 3090s cost ~$10K for hardware[2]. ### Optimization Techniques: 1. **Model Parallelism**: Split layers across GPUs to overcome memory limits[5][6]. 2. **Quantization**: 8-bit weights reduce memory usage to ~1 byte/parameter[2][9]. 3. **KV Caching**: Reuse attention computations to reduce redundant calculations[9]. While 10 GPUs exceed the minimum requirement, this provides headroom for larger batch sizes or hybrid training/inference workloads. Oracle demonstrated GPT-3-sized model inference with 8xA100 GPUs[13], confirming feasibility for high-end deployments. Citations: [1] https://ai.stackexchange.com/questions/22877/how-much-computing-power-does-it-cost-to-run-gpt-3 [2] https://news.ycombinator.com/item?id=33881504 [3] https://hackernoon.com/a-deep-dive-into-how-many-gpus-it-takes-to-run-chatgpt [4] https://www.reddit.com/r/OpenAI/comments/10aocxc/does_anyone_have_any_hard_numbers_on_the_gpu/ [5] https://llmgpuhelper.com/en/blog/optimizing-gpt3-multi-gpu-training [6] https://developer.nvidia.com/blog/scaling-language-model-training-to-a-trillion-parameters-using-megatron/ [7] https://lambdalabs.com/blog/demystifying-gpt-3 [8] https://ai.gopubby.com/multi-gpu-model-training-made-easy-with-distributed-data-parallel-ddp-453ba9f6846e?gi=a737dc56a3e4 [9] https://blog.spheron.network/how-much-gpu-memory-is-required-to-run-a-large-language-model-find-out-here [10] https://developer.nvidia.com/blog/openai-presents-gpt-3-a-175-billion-parameters-language-model/ [11] https://developer.nvidia.com/blog/deploying-a-1-3b-gpt-3-model-with-nvidia-nemo-megatron/ [12] https://news.ycombinator.com/item?id=37674913 [13] https://blogs.oracle.com/research/post/oracle-first-to-finetune-gpt3-sized-ai-models-with-nvidia-a100-gpu [14] https://en.wikipedia.org/wiki/GPT-3 [15] https://www.reddit.com/r/nvidia/comments/113euip/openai_trained_chat_gpt_on_10k_a100s/ [16] https://www.fierceelectronics.com/sensors/chatgpt-runs-10k-nvidia-training-gpus-potential-thousands-more [17] https://www.lesswrong.com/posts/HBisQEDajGwhWirky/how-feasible-costly-would-it-be-to-train-a-very-large-ai [18] https://developer.nvidia.com/blog/efficiently-scale-llm-training-across-a-large-gpu-cluster-with-alpa-and-ray/ [19] https://lambdalabs.com/blog/demystifying-gpt-3 [20] https://www.reddit.com/r/singularity/comments/inp025/if_you_want_to_run_your_own_full_gpt3_instance/ [21] https://techcommunity.microsoft.com/blog/machinelearningblog/unlocking-the-power-of-large-scale-training-in-ai/4303390 [22] https://hackernoon.com/a-deep-dive-into-how-many-gpus-it-takes-to-run-chatgpt [23] https://rethinkpriorities.org/research-area/gpt-3-like-models-are-now-much-easier-to-access-and-deploy-than-to-develop/ [24] https://company.hpc-ai.com/blog/train-18-billion-parameter-gpt-models-with-a-single-gpu-on-your-personal-computer [25] https://arstechnica.com/civis/threads/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi.1490659/page-2 [26] https://www.weka.io/blog/gpu/gpu-for-ai/ [27] https://www.reddit.com/r/MachineLearning/comments/gzb5uv/d_what_would_it_take_to_run_openais_gpt3_on/ [28] https://www.reddit.com/r/OpenAI/comments/11fwfjg/what_kind_of_pc_would_you_need_to_have_a_model/ [29] https://www.reddit.com/r/GPT3/comments/zufeg9/how_long_before_we_can_run_gpt3_locally/ [30] https://rethinkpriorities.org/research-area/the-replication-and-emulation-of-gpt-3/ [31] https://arxiv.org/pdf/2104.04473.pdf