-
-
Save Gyanachand1/0745db94ac4e85ae38cfafb8a5d1a14d to your computer and use it in GitHub Desktop.
Revisions
-
veekaybee revised this gist
Feb 7, 2024 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -153,6 +153,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Transformer Inference Arithmetic](https://kipp.ly/transformer-inference-arithmetic/) + [Which serving technology to use for LLMs?](https://pages.run.ai/hubfs/PDFs/Serving-Large-Language-Models-Run-ai-Benchmarking-Study.pdf) + [Speeding up the K-V cache](https://www.dipkumar.dev/becoming-the-unbeatable/posts/gpt-kvcache/) + [Large Transformer Model Inference Optimization](https://lilianweng.github.io/posts/2023-01-10-inference-optimization/) ## Prompt Engineering and RAG -
veekaybee revised this gist
Feb 7, 2024 . 1 changed file with 8 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -141,11 +141,18 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Against LLM Maximalism](https://explosion.ai/blog/against-llm-maximalism) + [A Guide to Inference and Performance](https://www.baseten.co/blog/llm-transformer-inference-guide/) + [(InThe)WildChat: 570K ChatGPT Interaction Logs In The Wild](https://openreview.net/forum?id=Bl8u7ZRlbM) + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5) + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering) + [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data) ## LLM Inference and K-V Cache + [LLM Inference Performance Engineering: Best Practices](https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices) + [How to Make LLMs go Fast](https://vgel.me/posts/faster-inference/) + [Transformer Inference Arithmetic](https://kipp.ly/transformer-inference-arithmetic/) + [Which serving technology to use for LLMs?](https://pages.run.ai/hubfs/PDFs/Serving-Large-Language-Models-Run-ai-Benchmarking-Study.pdf) + [Speeding up the K-V cache](https://www.dipkumar.dev/becoming-the-unbeatable/posts/gpt-kvcache/) ## Prompt Engineering and RAG -
veekaybee revised this gist
Dec 30, 2023 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -145,6 +145,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5) + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering) + [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data) + [How to Make LLMs go Fast](https://vgel.me/posts/faster-inference/) ## Prompt Engineering and RAG -
veekaybee revised this gist
Dec 30, 2023 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -27,6 +27,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [The Hardware Lottery](https://arxiv.org/abs/2009.06489) + [The Scaling Hypothesis](https://gwern.net/scaling-hypothesis) + [Tokenization](https://github.com/SumanthRH/tokenization) + [LLM Course](https://github.com/mlabonne/llm-course) ## Foundational Deep Learning Papers (in semi-chronological order) -
veekaybee revised this gist
Dec 27, 2023 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -26,6 +26,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [The Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) + [The Hardware Lottery](https://arxiv.org/abs/2009.06489) + [The Scaling Hypothesis](https://gwern.net/scaling-hypothesis) + [Tokenization](https://github.com/SumanthRH/tokenization) ## Foundational Deep Learning Papers (in semi-chronological order) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 1 addition and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -27,9 +27,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [The Hardware Lottery](https://arxiv.org/abs/2009.06489) + [The Scaling Hypothesis](https://gwern.net/scaling-hypothesis) ## Foundational Deep Learning Papers (in semi-chronological order) + [Seq2Seq](https://arxiv.org/abs/1409.3215v3) + [Attention is all you Need](https://arxiv.org/abs/1706.03762) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 0 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -16,7 +16,6 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Fundamental ML Reading List](https://github.com/RoundtableML/ML-Fundamentals-Reading-Lists) ### Building Blocks + [What are embeddings](https://vickiboykis.com/what_are_embeddings/) + [Concepts from Operating Systems that Found their way into LLMS](https://muhtasham.github.io/blog/posts/os-concepts-llm/) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -4,7 +4,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N ## Foundational Concepts <img width="400" alt="Screenshot 2023-12-18 at 10 40 27 PM" src="https://gist.github.com/assets/3837836/4c30ad72-76ee-4939-a5fb-16b570d38cf2"> ### Pre-Transformer Models <img width="500" alt="Screenshot 2023-12-18 at 8 25 42 PM" src="https://gist.github.com/assets/3837836/20d3c630-62b1-4717-84d7-e2f24e3f25c7"> -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 2 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -4,6 +4,8 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N ## Foundational Concepts <img width="400" alt="Screenshot 2023-12-18 at 10 38 06 PM" src="https://gist.github.com/assets/3837836/b3385ca6-f833-4b69-ad92-f9d9f89b6be8"> ### Pre-Transformer Models <img width="500" alt="Screenshot 2023-12-18 at 8 25 42 PM" src="https://gist.github.com/assets/3837836/20d3c630-62b1-4717-84d7-e2f24e3f25c7"> -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 1 addition and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -97,6 +97,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Opt-175B Logbook](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf) ## RLHF and DPO <img width="500" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/1a5cf5af-fd6b-4d11-b3ed-649a4c841f2b"> + [RLHF](https://huggingface.co/blog/rlhf) + [Supervised Fine-tuning](https://huggingface.co/docs/trl/main/en/sft_trainer) @@ -107,8 +108,6 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N ## Fine-Tuning and Compression + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/) + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques + [On the Structural Pruning of Large Language Models](https://arxiv.org/abs/2305.11627) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 2 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -106,7 +106,8 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [RLHF and DPO Compared](https://medium.com/aimonks/rlhf-and-dpo-compared-user-feedback-methods-for-llm-optimization-44f4234ae689) ## Fine-Tuning and Compression <img width="500" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/1a5cf5af-fd6b-4d11-b3ed-649a4c841f2b"> + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/) + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -106,6 +106,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [RLHF and DPO Compared](https://medium.com/aimonks/rlhf-and-dpo-compared-user-feedback-methods-for-llm-optimization-44f4234ae689) ## Fine-Tuning and Compression <img width="500" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="[https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1](https://gist.github.com/assets/3837836/1a5cf5af-fd6b-4d11-b3ed-649a4c841f2b)"> + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/) + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -37,7 +37,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361) + [T5](https://jmlr.org/papers/v21/20-074.html) + [GPT-2: Language Models are Unsupervised Multi-Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) + [InstructGPT: Training Language Models to Follow Instructions](https://arxiv.org/abs/2203.02155) + [GPT-3: Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) ## The Transformer Architecture @@ -150,6 +150,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Prompt Engineering Versus Blind Prompting](https://mitchellh.com/writing/prompt-engineering-vs-blind-prompting) + [Building RAG-Based Applications for Production](https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1) + [Full Fine-Tuning, PEFT, or RAG?](https://deci.ai/blog/fine-tuning-peft-prompt-engineering-and-rag-which-one-is-right-for-you/) + [Prompt Engineering Guide](https://github.com/dair-ai/Prompt-Engineering-Guide) ## GPUs @@ -173,6 +174,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N ### Eval Frameworks + [HELM](https://arxiv.org/pdf/2211.09110.pdf) + [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) + [LmSys Chatbot Arena](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard) ## UX -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -33,7 +33,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Seq2Seq](https://arxiv.org/abs/1409.3215v3) + [Attention is all you Need](https://arxiv.org/abs/1706.03762) + [BERT](https://arxiv.org/abs/1810.04805) + [GPT-1](https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf) + [Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361) + [T5](https://jmlr.org/papers/v21/20-074.html) + [GPT-2: Language Models are Unsupervised Multi-Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 4 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -163,17 +163,17 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N ## Evaluation + [Evaluating ChatGPT](https://ehudreiter.com/2023/04/04/evaluating-chatgpt/) + [ChatGPT: Jack of All Trades, Master of None](https://github.com/CLARIN-PL/chatgpt-evaluation-01-2023) + [What's Going on with the Open LLM Leaderboard](https://huggingface.co/blog/evaluating-mmlu-leaderboard) + [Challenges in Evaluating AI Systems](https://www.anthropic.com/index/evaluating-ai-systems) + [LLM Evaluation Papers](https://github.com/tjunlp-lab/Awesome-LLMs-Evaluation-Papers) + [Evaluating LLMs is a MineField](https://www.cs.princeton.edu/~arvindn/talks/evaluating_llms_minefield/) ### Eval Frameworks + [HELM](https://arxiv.org/pdf/2211.09110.pdf) + [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) ## UX + [Generative Interfaces Beyond Chat (YouTube)](https://www.youtube.com/watch?v=rd-J3hmycQs) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 17 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -30,13 +30,15 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N <img width="500" alt="Screenshot 2023-12-18 at 8 35 18 PM" src="https://gist.github.com/assets/3837836/14d51fdf-1ad5-4807-8ad5-d3c81d16fef4"> + [Seq2Seq](https://arxiv.org/abs/1409.3215v3) + [Attention is all you Need](https://arxiv.org/abs/1706.03762) + [BERT](https://arxiv.org/abs/1810.04805) + [GPT-1](https://mistral.ai/news/mixtral-of-experts/) + [Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361) + [T5](https://jmlr.org/papers/v21/20-074.html) + [GPT-2: Language Models are Unsupervised Multi-Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) + [Training Language Models to Follow Instructions](https://arxiv.org/abs/2203.02155) + [GPT-3: Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) ## The Transformer Architecture @@ -60,9 +62,16 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/) + [My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698) + [Karpathy's The State of GPT (YouTube)](https://www.youtube.com/watch?v=bZQun8Y4L2A) + [OpenAI Cookbook](https://cookbook.openai.com/) ## Significant OSS Models + [Llama2](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/?ref=blog.oxen.ai) + [Mistral7B](https://arxiv.org/abs/2310.06825) + [Mixtral](https://mistral.ai/news/mixtral-of-experts/) + [Phi2](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/) + [Falcon7B](https://huggingface.co/blog/falcon) ### LLMs in 2023 <img width="600" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1"> @@ -154,6 +163,10 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N ## Evaluation ### Frameworks: + [HELM](https://arxiv.org/pdf/2211.09110.pdf) + [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) + [Evaluating ChatGPT](https://ehudreiter.com/2023/04/04/evaluating-chatgpt/) + [ChatGPT: Jack of All Trades, Master of None](https://github.com/CLARIN-PL/chatgpt-evaluation-01-2023) + [What's Going on with the Open LLM Leaderboard](https://huggingface.co/blog/evaluating-mmlu-leaderboard) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 4 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -90,6 +90,8 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N ## RLHF and DPO + [RLHF](https://huggingface.co/blog/rlhf) + [Supervised Fine-tuning](https://huggingface.co/docs/trl/main/en/sft_trainer) + [How Abilities in LLMs Are Affected by SFT](https://arxiv.org/abs/2310.05492) + [Instruction-tuning for LLMs: Survey](https://arxiv.org/abs/2308.10792) + [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290) + [RLHF and DPO Compared](https://medium.com/aimonks/rlhf-and-dpo-compared-user-feedback-methods-for-llm-optimization-44f4234ae689) @@ -107,7 +109,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Fine-tuning with LoRA and QLoRA](https://lightning.ai/pages/community/lora-insights/) + [Adapters](https://arxiv.org/abs/2304.01933) + [Motivation for Parameter-Efficient Fine-tuning](https://www.reddit.com/r/MachineLearning/comments/186ck5k/d_what_is_the_motivation_for_parameterefficient/) # Small and Local LLMs @@ -131,6 +133,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [LLM Inference Performance Engineering: Best Practices](https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices) + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5) + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering) + [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data) ## Prompt Engineering and RAG -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -132,10 +132,12 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5) + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering) ## Prompt Engineering and RAG + [On Prompt Engineering](https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/) + [Prompt Engineering Versus Blind Prompting](https://mitchellh.com/writing/prompt-engineering-vs-blind-prompting) + [Building RAG-Based Applications for Production](https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1) + [Full Fine-Tuning, PEFT, or RAG?](https://deci.ai/blog/fine-tuning-peft-prompt-engineering-and-rag-which-one-is-right-for-you/) ## GPUs -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -66,6 +66,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N ### LLMs in 2023 <img width="600" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1"> + [Catching up on the weird world of LLMS](https://simonwillison.net/2023/Aug/3/weird-world-of-llms) + [How open are open architectures?](https://opening-up-chatgpt.github.io/) + [Building an LLM from Scratch](https://github.com/rasbt/LLMs-from-scratch) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 4 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -64,10 +64,14 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [OpenAI Cookbook](https://cookbook.openai.com/) ### LLMs in 2023 <img width="600" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1"> + [Catching up on the weird world of LLMS](https://simonwillison.net/2023/Aug/3/weird-world-of-llms) + [How open are open architectures?](https://opening-up-chatgpt.github.io/) + [Building an LLM from Scratch](https://github.com/rasbt/LLMs-from-scratch) + [Large Language Models in 2023](https://www.youtube.com/watch?v=dbo3kNKPaUA&feature=youtu.be) and [Slides](https://docs.google.com/presentation/d/1636wKStYdT_yRPbJNrf8MLKpQghuWGDmyHinHhAKeXY/edit#slide=id.g2885e521b53_0_0) + [Timeline of Transformer Models](https://ai.v-gar.de/ml/transformer/timeline/) + [Large Language Model Evolutionary Tree](https://notes.kateva.org/2023/04/large-language-models-evolutionary-tree.html) ## Training Data -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 5 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -60,10 +60,12 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/) + [My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698) + [Karpathy's The State of GPT (YouTube)](https://www.youtube.com/watch?v=bZQun8Y4L2A) + [OpenAI Cookbook](https://cookbook.openai.com/) ### LLMs in 2023 + [Catching up on the weird world of LLMS](https://simonwillison.net/2023/Aug/3/weird-world-of-llms) + [How open are open architectures?](https://opening-up-chatgpt.github.io/) + [Building an LLM from Scratch](https://github.com/rasbt/LLMs-from-scratch) + [Large Language Models in 2023](https://www.youtube.com/watch?v=dbo3kNKPaUA&feature=youtu.be) and [Slides](https://docs.google.com/presentation/d/1636wKStYdT_yRPbJNrf8MLKpQghuWGDmyHinHhAKeXY/edit#slide=id.g2885e521b53_0_0) @@ -91,8 +93,9 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/) + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques + [On the Structural Pruning of Large Language Models](https://arxiv.org/abs/2305.11627) + Quantiztion + [A Gentle Introduction to 8-bit matrix multiplication](https://huggingface.co/blog/hf-bitsandbytes-integration) + [Which Quantization Method is Right for You?](https://maartengrootendorst.substack.com/p/which-quantization-method-is-right) + [Survey of Quantization for Inference](https://arxiv.org/abs/2103.13630) + [PEFT](https://github.com/huggingface/peft) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -24,6 +24,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Eight things to know about large language models](https://arxiv.org/pdf/2304.00612.pdf) + [The Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) + [The Hardware Lottery](https://arxiv.org/abs/2009.06489) + [The Scaling Hypothesis](https://gwern.net/scaling-hypothesis) ## Foundational Deep Learning Papers -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 1 addition and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -130,8 +130,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N ## GPUs <img width="600" alt="Screenshot 2023-12-18 at 10 02 48 PM" src="https://gist.github.com/assets/3837836/655fedc2-dbc8-406a-a583-65b9a91d4ab9"> + [The Best GPUS for Deep Learning 2023](https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/) + [Making Deep Learning Go Brr from First Principles](https://horace.io/brrr_intro.html) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 5 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -22,6 +22,8 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Language Modeling is Compression](https://arxiv.org/abs/2309.10668) + [Vector Search - Long-Term Memory in AI](https://github.com/edoliberty/vector-search-class-notes) + [Eight things to know about large language models](https://arxiv.org/pdf/2304.00612.pdf) + [The Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) + [The Hardware Lottery](https://arxiv.org/abs/2009.06489) ## Foundational Deep Learning Papers @@ -128,6 +130,9 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N ## GPUs <img width="865" alt="Screenshot 2023-12-18 at 10 02 48 PM" src="https://gist.github.com/assets/3837836/655fedc2-dbc8-406a-a583-65b9a91d4ab9"> + [The Best GPUS for Deep Learning 2023](https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/) + [Making Deep Learning Go Brr from First Principles](https://horace.io/brrr_intro.html) + [Everything about Distributed Training and Efficient Finetuning](https://sumanthrh.com/post/distributed-and-efficient-finetuning/) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 16 additions and 5 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -77,14 +77,25 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + Training [Compute-Optimal Large Language Models](https://arxiv.org/abs/2203.15556) + [Opt-175B Logbook](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf) ## RLHF and DPO + [RLHF](https://huggingface.co/blog/rlhf) + [Instruction-tuning for LLMs: Survey](https://arxiv.org/abs/2308.10792) + [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290) + [RLHF and DPO Compared](https://medium.com/aimonks/rlhf-and-dpo-compared-user-feedback-methods-for-llm-optimization-44f4234ae689) ## Fine-Tuning and Compression + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/) + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques + Quantiztion + A Gentle Introduction to 8-bit matrix multiplication](https://huggingface.co/blog/hf-bitsandbytes-integration) + [Which Quantization Method is Right for You?](https://maartengrootendorst.substack.com/p/which-quantization-method-is-right) + [Survey of Quantization for Inference](https://arxiv.org/abs/2103.13630) + [PEFT](https://github.com/huggingface/peft) + [Fine-tuning with LoRA and QLoRA](https://lightning.ai/pages/community/lora-insights/) + [Adapters](https://arxiv.org/abs/2304.01933) + [Motivation for Parameter-Efficient Fine-tuning](https://www.reddit.com/r/MachineLearning/comments/186ck5k/d_what_is_the_motivation_for_parameterefficient/) + [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data) # Small and Local LLMs -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -11,6 +11,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Transformers as Support Vector Machines](https://arxiv.org/abs/2308.16898v1) + [Survey of LLMS](https://arxiv.org/abs/2303.18223) + [Deep Learning Systems](https://dlsyscourse.org/lectures/) + [Fundamental ML Reading List](https://github.com/RoundtableML/ML-Fundamentals-Reading-Lists) ### Building Blocks <img width="300" alt="Screenshot 2023-12-18 at 8 33 35 PM" src="https://gist.github.com/assets/3837836/a92cc13c-105f-4e9a-9b68-dc9b6d4a6e47"> @@ -91,7 +92,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [How is LlamaCPP Possible?](https://finbarr.ca/how-is-llama-cpp-possible/) + [How to beat GPT-4 with a 13-B Model](https://lmsys.org/blog/2023-11-14-llm-decontaminator/) + [Efficient LLM Inference on CPUs](https://arxiv.org/abs/2311.00502v1) + [Tiny Language Models Come of Age](https://www.quantamagazine.org/tiny-language-models-thrive-with-gpt-4-as-a-teacher-20231005/) + [Efficiency LLM Spectrum](https://github.com/tding1/Efficient-LLM-Survey) + [TinyML at MIT](https://efficientml.ai/) @@ -107,6 +108,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [(InThe)WildChat: 570K ChatGPT Interaction Logs In The Wild](https://openreview.net/forum?id=Bl8u7ZRlbM) + [LLM Inference Performance Engineering: Best Practices](https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices) + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5) + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering) ## Prompt Engineering -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -51,7 +51,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Keys, Queries, and Values](https://d2l.ai/chapter_attention-mechanisms-and-transformers/queries-keys-values.html) ## GPT <img width="300" alt="Screenshot 2023-12-18 at 8 37 44 PM" src="https://camo.githubusercontent.com/85d00cf9bca67e33c2d1270b51ff1ac01853b26a8d6bb226b711f859d065b4a6/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f74726c2d696e7465726e616c2d74657374696e672f6578616d706c652d696d616765732f7265736f6c76652f6d61696e2f696d616765732f74726c5f6f766572766965772e706e67"> + [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/) + [My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -51,7 +51,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Keys, Queries, and Values](https://d2l.ai/chapter_attention-mechanisms-and-transformers/queries-keys-values.html) ## GPT <img width="300" alt="Screenshot 2023-12-18 at 8 37 44 PM" src="[https://gist.github.com/assets/3837836/5ada409d-32cf-496e-9572-cb985ec97165](https://camo.githubusercontent.com/85d00cf9bca67e33c2d1270b51ff1ac01853b26a8d6bb226b711f859d065b4a6/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f74726c2d696e7465726e616c2d74657374696e672f6578616d706c652d696d616765732f7265736f6c76652f6d61696e2f696d616765732f74726c5f6f766572766965772e706e67)"> + [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/) + [My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -13,7 +13,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N + [Deep Learning Systems](https://dlsyscourse.org/lectures/) ### Building Blocks <img width="300" alt="Screenshot 2023-12-18 at 8 33 35 PM" src="https://gist.github.com/assets/3837836/a92cc13c-105f-4e9a-9b68-dc9b6d4a6e47"> + [What are embeddings](https://vickiboykis.com/what_are_embeddings/) + [Concepts from Operating Systems that Found their way into LLMS](https://muhtasham.github.io/blog/posts/os-concepts-llm/) @@ -36,7 +36,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N ## The Transformer Architecture <img width="300" alt="Screenshot 2023-12-18 at 8 37 44 PM" src="https://gist.github.com/assets/3837836/5ada409d-32cf-496e-9572-cb985ec97165"> + [Transformers from Scratch](https://e2eml.school/transformers.html) + [Transformer Math](https://blog.eleuther.ai/transformer-math/) -
veekaybee revised this gist
Dec 19, 2023 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,7 +5,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N ## Foundational Concepts ### Pre-Transformer Models <img width="500" alt="Screenshot 2023-12-18 at 8 25 42 PM" src="https://gist.github.com/assets/3837836/20d3c630-62b1-4717-84d7-e2f24e3f25c7"> + [The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning (YouTube)](https://www.youtube.com/watch?v=ISPId9Lhc1g) + [Transformers as Support Vector Machines](https://arxiv.org/abs/2308.16898v1) @@ -24,7 +24,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N ## Foundational Deep Learning Papers <img width="500" alt="Screenshot 2023-12-18 at 8 35 18 PM" src="https://gist.github.com/assets/3837836/14d51fdf-1ad5-4807-8ad5-d3c81d16fef4"> + [BERT](https://arxiv.org/abs/1810.04805) + [Seq2Seq](https://arxiv.org/abs/1409.3215v3)
NewerOlder