Skip to content

Instantly share code, notes, and snippets.

@Gyanachand1
Forked from veekaybee/normcore-llm.md
Created September 22, 2024 15:58
Show Gist options
  • Save Gyanachand1/0745db94ac4e85ae38cfafb8a5d1a14d to your computer and use it in GitHub Desktop.
Save Gyanachand1/0745db94ac4e85ae38cfafb8a5d1a14d to your computer and use it in GitHub Desktop.

Revisions

  1. @veekaybee veekaybee revised this gist Feb 7, 2024. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -153,6 +153,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Transformer Inference Arithmetic](https://kipp.ly/transformer-inference-arithmetic/)
    + [Which serving technology to use for LLMs?](https://pages.run.ai/hubfs/PDFs/Serving-Large-Language-Models-Run-ai-Benchmarking-Study.pdf)
    + [Speeding up the K-V cache](https://www.dipkumar.dev/becoming-the-unbeatable/posts/gpt-kvcache/)
    + [Large Transformer Model Inference Optimization](https://lilianweng.github.io/posts/2023-01-10-inference-optimization/)

    ## Prompt Engineering and RAG

  2. @veekaybee veekaybee revised this gist Feb 7, 2024. 1 changed file with 8 additions and 1 deletion.
    9 changes: 8 additions & 1 deletion normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -141,11 +141,18 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Against LLM Maximalism](https://explosion.ai/blog/against-llm-maximalism)
    + [A Guide to Inference and Performance](https://www.baseten.co/blog/llm-transformer-inference-guide/)
    + [(InThe)WildChat: 570K ChatGPT Interaction Logs In The Wild](https://openreview.net/forum?id=Bl8u7ZRlbM)
    + [LLM Inference Performance Engineering: Best Practices](https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices)
    + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5)
    + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering)
    + [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data)


    ## LLM Inference and K-V Cache

    + [LLM Inference Performance Engineering: Best Practices](https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices)
    + [How to Make LLMs go Fast](https://vgel.me/posts/faster-inference/)
    + [Transformer Inference Arithmetic](https://kipp.ly/transformer-inference-arithmetic/)
    + [Which serving technology to use for LLMs?](https://pages.run.ai/hubfs/PDFs/Serving-Large-Language-Models-Run-ai-Benchmarking-Study.pdf)
    + [Speeding up the K-V cache](https://www.dipkumar.dev/becoming-the-unbeatable/posts/gpt-kvcache/)

    ## Prompt Engineering and RAG

  3. @veekaybee veekaybee revised this gist Dec 30, 2023. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -145,6 +145,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5)
    + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering)
    + [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data)
    + [How to Make LLMs go Fast](https://vgel.me/posts/faster-inference/)

    ## Prompt Engineering and RAG

  4. @veekaybee veekaybee revised this gist Dec 30, 2023. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -27,6 +27,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [The Hardware Lottery](https://arxiv.org/abs/2009.06489)
    + [The Scaling Hypothesis](https://gwern.net/scaling-hypothesis)
    + [Tokenization](https://github.com/SumanthRH/tokenization)
    + [LLM Course](https://github.com/mlabonne/llm-course)

    ## Foundational Deep Learning Papers (in semi-chronological order)

  5. @veekaybee veekaybee revised this gist Dec 27, 2023. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -26,6 +26,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [The Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)
    + [The Hardware Lottery](https://arxiv.org/abs/2009.06489)
    + [The Scaling Hypothesis](https://gwern.net/scaling-hypothesis)
    + [Tokenization](https://github.com/SumanthRH/tokenization)

    ## Foundational Deep Learning Papers (in semi-chronological order)

  6. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 1 addition and 3 deletions.
    4 changes: 1 addition & 3 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -27,9 +27,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [The Hardware Lottery](https://arxiv.org/abs/2009.06489)
    + [The Scaling Hypothesis](https://gwern.net/scaling-hypothesis)

    ## Foundational Deep Learning Papers

    <img width="500" alt="Screenshot 2023-12-18 at 8 35 18 PM" src="https://gist.github.com/assets/3837836/14d51fdf-1ad5-4807-8ad5-d3c81d16fef4">
    ## Foundational Deep Learning Papers (in semi-chronological order)

    + [Seq2Seq](https://arxiv.org/abs/1409.3215v3)
    + [Attention is all you Need](https://arxiv.org/abs/1706.03762)
  7. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 0 additions and 1 deletion.
    1 change: 0 additions & 1 deletion normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -16,7 +16,6 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Fundamental ML Reading List](https://github.com/RoundtableML/ML-Fundamentals-Reading-Lists)

    ### Building Blocks
    <img width="300" alt="Screenshot 2023-12-18 at 8 33 35 PM" src="https://gist.github.com/assets/3837836/a92cc13c-105f-4e9a-9b68-dc9b6d4a6e47">

    + [What are embeddings](https://vickiboykis.com/what_are_embeddings/)
    + [Concepts from Operating Systems that Found their way into LLMS](https://muhtasham.github.io/blog/posts/os-concepts-llm/)
  8. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -4,7 +4,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N

    ## Foundational Concepts

    <img width="400" alt="Screenshot 2023-12-18 at 10 38 06 PM" src="https://gist.github.com/assets/3837836/b3385ca6-f833-4b69-ad92-f9d9f89b6be8">
    <img width="400" alt="Screenshot 2023-12-18 at 10 40 27 PM" src="https://gist.github.com/assets/3837836/4c30ad72-76ee-4939-a5fb-16b570d38cf2">

    ### Pre-Transformer Models
    <img width="500" alt="Screenshot 2023-12-18 at 8 25 42 PM" src="https://gist.github.com/assets/3837836/20d3c630-62b1-4717-84d7-e2f24e3f25c7">
  9. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -4,6 +4,8 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N

    ## Foundational Concepts

    <img width="400" alt="Screenshot 2023-12-18 at 10 38 06 PM" src="https://gist.github.com/assets/3837836/b3385ca6-f833-4b69-ad92-f9d9f89b6be8">

    ### Pre-Transformer Models
    <img width="500" alt="Screenshot 2023-12-18 at 8 25 42 PM" src="https://gist.github.com/assets/3837836/20d3c630-62b1-4717-84d7-e2f24e3f25c7">

  10. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 1 addition and 2 deletions.
    3 changes: 1 addition & 2 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -97,6 +97,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Opt-175B Logbook](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf)

    ## RLHF and DPO
    <img width="500" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/1a5cf5af-fd6b-4d11-b3ed-649a4c841f2b">

    + [RLHF](https://huggingface.co/blog/rlhf)
    + [Supervised Fine-tuning](https://huggingface.co/docs/trl/main/en/sft_trainer)
    @@ -107,8 +108,6 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N

    ## Fine-Tuning and Compression

    <img width="500" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/1a5cf5af-fd6b-4d11-b3ed-649a4c841f2b">

    + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/)
    + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques
    + [On the Structural Pruning of Large Language Models](https://arxiv.org/abs/2305.11627)
  11. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -106,7 +106,8 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [RLHF and DPO Compared](https://medium.com/aimonks/rlhf-and-dpo-compared-user-feedback-methods-for-llm-optimization-44f4234ae689)

    ## Fine-Tuning and Compression
    <img width="500" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="[https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1](https://gist.github.com/assets/3837836/1a5cf5af-fd6b-4d11-b3ed-649a4c841f2b)">

    <img width="500" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/1a5cf5af-fd6b-4d11-b3ed-649a4c841f2b">

    + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/)
    + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques
  12. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -106,6 +106,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [RLHF and DPO Compared](https://medium.com/aimonks/rlhf-and-dpo-compared-user-feedback-methods-for-llm-optimization-44f4234ae689)

    ## Fine-Tuning and Compression
    <img width="500" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="[https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1](https://gist.github.com/assets/3837836/1a5cf5af-fd6b-4d11-b3ed-649a4c841f2b)">

    + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/)
    + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques
  13. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -37,7 +37,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361)
    + [T5](https://jmlr.org/papers/v21/20-074.html)
    + [GPT-2: Language Models are Unsupervised Multi-Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
    + [Training Language Models to Follow Instructions](https://arxiv.org/abs/2203.02155)
    + [InstructGPT: Training Language Models to Follow Instructions](https://arxiv.org/abs/2203.02155)
    + [GPT-3: Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)

    ## The Transformer Architecture
    @@ -150,6 +150,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Prompt Engineering Versus Blind Prompting](https://mitchellh.com/writing/prompt-engineering-vs-blind-prompting)
    + [Building RAG-Based Applications for Production](https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1)
    + [Full Fine-Tuning, PEFT, or RAG?](https://deci.ai/blog/fine-tuning-peft-prompt-engineering-and-rag-which-one-is-right-for-you/)
    + [Prompt Engineering Guide](https://github.com/dair-ai/Prompt-Engineering-Guide)

    ## GPUs

    @@ -173,6 +174,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    ### Eval Frameworks
    + [HELM](https://arxiv.org/pdf/2211.09110.pdf)
    + [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness)
    + [LmSys Chatbot Arena](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)

    ## UX

  14. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -33,7 +33,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Seq2Seq](https://arxiv.org/abs/1409.3215v3)
    + [Attention is all you Need](https://arxiv.org/abs/1706.03762)
    + [BERT](https://arxiv.org/abs/1810.04805)
    + [GPT-1](https://mistral.ai/news/mixtral-of-experts/)
    + [GPT-1](https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf)
    + [Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361)
    + [T5](https://jmlr.org/papers/v21/20-074.html)
    + [GPT-2: Language Models are Unsupervised Multi-Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
  15. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 4 additions and 4 deletions.
    8 changes: 4 additions & 4 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -163,17 +163,17 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N

    ## Evaluation

    ### Frameworks:
    + [HELM](https://arxiv.org/pdf/2211.09110.pdf)
    + [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness)

    + [Evaluating ChatGPT](https://ehudreiter.com/2023/04/04/evaluating-chatgpt/)
    + [ChatGPT: Jack of All Trades, Master of None](https://github.com/CLARIN-PL/chatgpt-evaluation-01-2023)
    + [What's Going on with the Open LLM Leaderboard](https://huggingface.co/blog/evaluating-mmlu-leaderboard)
    + [Challenges in Evaluating AI Systems](https://www.anthropic.com/index/evaluating-ai-systems)
    + [LLM Evaluation Papers](https://github.com/tjunlp-lab/Awesome-LLMs-Evaluation-Papers)
    + [Evaluating LLMs is a MineField](https://www.cs.princeton.edu/~arvindn/talks/evaluating_llms_minefield/)

    ### Eval Frameworks
    + [HELM](https://arxiv.org/pdf/2211.09110.pdf)
    + [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness)

    ## UX

    + [Generative Interfaces Beyond Chat (YouTube)](https://www.youtube.com/watch?v=rd-J3hmycQs)
  16. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 17 additions and 4 deletions.
    21 changes: 17 additions & 4 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -30,13 +30,15 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N

    <img width="500" alt="Screenshot 2023-12-18 at 8 35 18 PM" src="https://gist.github.com/assets/3837836/14d51fdf-1ad5-4807-8ad5-d3c81d16fef4">

    + [BERT](https://arxiv.org/abs/1810.04805)
    + [Seq2Seq](https://arxiv.org/abs/1409.3215v3)
    + [Attention is all you Need](https://arxiv.org/abs/1706.03762)
    + [BERT](https://arxiv.org/abs/1810.04805)
    + [GPT-1](https://mistral.ai/news/mixtral-of-experts/)
    + [Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361)
    + [Language Models are Unsupervised Multi-Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
    + [T5](https://jmlr.org/papers/v21/20-074.html)
    + [GPT-2: Language Models are Unsupervised Multi-Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
    + [Training Language Models to Follow Instructions](https://arxiv.org/abs/2203.02155)
    + [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
    + [GPT-3: Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)

    ## The Transformer Architecture

    @@ -60,9 +62,16 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/)
    + [My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698)
    + [Karpathy's The State of GPT (YouTube)](https://www.youtube.com/watch?v=bZQun8Y4L2A)

    + [OpenAI Cookbook](https://cookbook.openai.com/)

    ## Significant OSS Models

    + [Llama2](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/?ref=blog.oxen.ai)
    + [Mistral7B](https://arxiv.org/abs/2310.06825)
    + [Mixtral](https://mistral.ai/news/mixtral-of-experts/)
    + [Phi2](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/)
    + [Falcon7B](https://huggingface.co/blog/falcon)

    ### LLMs in 2023

    <img width="600" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1">
    @@ -154,6 +163,10 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N

    ## Evaluation

    ### Frameworks:
    + [HELM](https://arxiv.org/pdf/2211.09110.pdf)
    + [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness)

    + [Evaluating ChatGPT](https://ehudreiter.com/2023/04/04/evaluating-chatgpt/)
    + [ChatGPT: Jack of All Trades, Master of None](https://github.com/CLARIN-PL/chatgpt-evaluation-01-2023)
    + [What's Going on with the Open LLM Leaderboard](https://huggingface.co/blog/evaluating-mmlu-leaderboard)
  17. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 4 additions and 1 deletion.
    5 changes: 4 additions & 1 deletion normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -90,6 +90,8 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    ## RLHF and DPO

    + [RLHF](https://huggingface.co/blog/rlhf)
    + [Supervised Fine-tuning](https://huggingface.co/docs/trl/main/en/sft_trainer)
    + [How Abilities in LLMs Are Affected by SFT](https://arxiv.org/abs/2310.05492)
    + [Instruction-tuning for LLMs: Survey](https://arxiv.org/abs/2308.10792)
    + [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290)
    + [RLHF and DPO Compared](https://medium.com/aimonks/rlhf-and-dpo-compared-user-feedback-methods-for-llm-optimization-44f4234ae689)
    @@ -107,7 +109,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Fine-tuning with LoRA and QLoRA](https://lightning.ai/pages/community/lora-insights/)
    + [Adapters](https://arxiv.org/abs/2304.01933)
    + [Motivation for Parameter-Efficient Fine-tuning](https://www.reddit.com/r/MachineLearning/comments/186ck5k/d_what_is_the_motivation_for_parameterefficient/)
    + [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data)


    # Small and Local LLMs

    @@ -131,6 +133,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [LLM Inference Performance Engineering: Best Practices](https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices)
    + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5)
    + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering)
    + [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data)

    ## Prompt Engineering and RAG

  18. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -132,10 +132,12 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5)
    + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering)

    ## Prompt Engineering
    ## Prompt Engineering and RAG

    + [On Prompt Engineering](https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/)
    + [Prompt Engineering Versus Blind Prompting](https://mitchellh.com/writing/prompt-engineering-vs-blind-prompting)
    + [Building RAG-Based Applications for Production](https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1)
    + [Full Fine-Tuning, PEFT, or RAG?](https://deci.ai/blog/fine-tuning-peft-prompt-engineering-and-rag-which-one-is-right-for-you/)

    ## GPUs

  19. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -66,6 +66,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    ### LLMs in 2023

    <img width="600" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1">

    + [Catching up on the weird world of LLMS](https://simonwillison.net/2023/Aug/3/weird-world-of-llms)
    + [How open are open architectures?](https://opening-up-chatgpt.github.io/)
    + [Building an LLM from Scratch](https://github.com/rasbt/LLMs-from-scratch)
  20. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -64,10 +64,14 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [OpenAI Cookbook](https://cookbook.openai.com/)

    ### LLMs in 2023

    <img width="600" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1">
    + [Catching up on the weird world of LLMS](https://simonwillison.net/2023/Aug/3/weird-world-of-llms)
    + [How open are open architectures?](https://opening-up-chatgpt.github.io/)
    + [Building an LLM from Scratch](https://github.com/rasbt/LLMs-from-scratch)
    + [Large Language Models in 2023](https://www.youtube.com/watch?v=dbo3kNKPaUA&feature=youtu.be) and [Slides](https://docs.google.com/presentation/d/1636wKStYdT_yRPbJNrf8MLKpQghuWGDmyHinHhAKeXY/edit#slide=id.g2885e521b53_0_0)
    + [Timeline of Transformer Models](https://ai.v-gar.de/ml/transformer/timeline/)
    + [Large Language Model Evolutionary Tree](https://notes.kateva.org/2023/04/large-language-models-evolutionary-tree.html)

    ## Training Data

  21. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 5 additions and 2 deletions.
    7 changes: 5 additions & 2 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -60,10 +60,12 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/)
    + [My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698)
    + [Karpathy's The State of GPT (YouTube)](https://www.youtube.com/watch?v=bZQun8Y4L2A)
    + [How open are open architectures?](https://opening-up-chatgpt.github.io/)

    + [OpenAI Cookbook](https://cookbook.openai.com/)

    ### LLMs in 2023
    + [Catching up on the weird world of LLMS](https://simonwillison.net/2023/Aug/3/weird-world-of-llms)
    + [How open are open architectures?](https://opening-up-chatgpt.github.io/)
    + [Building an LLM from Scratch](https://github.com/rasbt/LLMs-from-scratch)
    + [Large Language Models in 2023](https://www.youtube.com/watch?v=dbo3kNKPaUA&feature=youtu.be) and [Slides](https://docs.google.com/presentation/d/1636wKStYdT_yRPbJNrf8MLKpQghuWGDmyHinHhAKeXY/edit#slide=id.g2885e521b53_0_0)

    @@ -91,8 +93,9 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N

    + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/)
    + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques
    + [On the Structural Pruning of Large Language Models](https://arxiv.org/abs/2305.11627)
    + Quantiztion
    + A Gentle Introduction to 8-bit matrix multiplication](https://huggingface.co/blog/hf-bitsandbytes-integration)
    + [A Gentle Introduction to 8-bit matrix multiplication](https://huggingface.co/blog/hf-bitsandbytes-integration)
    + [Which Quantization Method is Right for You?](https://maartengrootendorst.substack.com/p/which-quantization-method-is-right)
    + [Survey of Quantization for Inference](https://arxiv.org/abs/2103.13630)
    + [PEFT](https://github.com/huggingface/peft)
  22. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -24,6 +24,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Eight things to know about large language models](https://arxiv.org/pdf/2304.00612.pdf)
    + [The Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)
    + [The Hardware Lottery](https://arxiv.org/abs/2009.06489)
    + [The Scaling Hypothesis](https://gwern.net/scaling-hypothesis)

    ## Foundational Deep Learning Papers

  23. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 1 addition and 2 deletions.
    3 changes: 1 addition & 2 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -130,8 +130,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N

    ## GPUs


    <img width="865" alt="Screenshot 2023-12-18 at 10 02 48 PM" src="https://gist.github.com/assets/3837836/655fedc2-dbc8-406a-a583-65b9a91d4ab9">
    <img width="600" alt="Screenshot 2023-12-18 at 10 02 48 PM" src="https://gist.github.com/assets/3837836/655fedc2-dbc8-406a-a583-65b9a91d4ab9">

    + [The Best GPUS for Deep Learning 2023](https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/)
    + [Making Deep Learning Go Brr from First Principles](https://horace.io/brrr_intro.html)
  24. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 5 additions and 0 deletions.
    5 changes: 5 additions & 0 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -22,6 +22,8 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Language Modeling is Compression](https://arxiv.org/abs/2309.10668)
    + [Vector Search - Long-Term Memory in AI](https://github.com/edoliberty/vector-search-class-notes)
    + [Eight things to know about large language models](https://arxiv.org/pdf/2304.00612.pdf)
    + [The Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)
    + [The Hardware Lottery](https://arxiv.org/abs/2009.06489)

    ## Foundational Deep Learning Papers

    @@ -128,6 +130,9 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N

    ## GPUs


    <img width="865" alt="Screenshot 2023-12-18 at 10 02 48 PM" src="https://gist.github.com/assets/3837836/655fedc2-dbc8-406a-a583-65b9a91d4ab9">

    + [The Best GPUS for Deep Learning 2023](https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/)
    + [Making Deep Learning Go Brr from First Principles](https://horace.io/brrr_intro.html)
    + [Everything about Distributed Training and Efficient Finetuning](https://sumanthrh.com/post/distributed-and-efficient-finetuning/)
  25. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 16 additions and 5 deletions.
    21 changes: 16 additions & 5 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -77,14 +77,25 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + Training [Compute-Optimal Large Language Models](https://arxiv.org/abs/2203.15556)
    + [Opt-175B Logbook](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf)

    ## Fine-Tuning
    ## RLHF and DPO

    + [RLHF](https://huggingface.co/blog/rlhf)
    + [Instruction-tuning for LLMs: Survey](https://arxiv.org/abs/2308.10792)
    + [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290)
    + [RLHF and DPO Compared](https://medium.com/aimonks/rlhf-and-dpo-compared-user-feedback-methods-for-llm-optimization-44f4234ae689)

    ## Fine-Tuning and Compression

    + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/)
    + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques
    + [A Gentle Introduction to 8-bit matrix multiplication](https://huggingface.co/blog/hf-bitsandbytes-integration)
    + [Motivation for Parameter-Efficient Fine-tuning](https://www.reddit.com/r/MachineLearning/comments/186ck5k/d_what_is_the_motivation_for_parameterefficient/)
    + [Which Quantization Method is Right for You?](https://maartengrootendorst.substack.com/p/which-quantization-method-is-right)
    + [Fine-tuning with LoRA and QLoRA](https://lightning.ai/pages/community/lora-insights/)
    + Quantiztion
    + A Gentle Introduction to 8-bit matrix multiplication](https://huggingface.co/blog/hf-bitsandbytes-integration)
    + [Which Quantization Method is Right for You?](https://maartengrootendorst.substack.com/p/which-quantization-method-is-right)
    + [Survey of Quantization for Inference](https://arxiv.org/abs/2103.13630)
    + [PEFT](https://github.com/huggingface/peft)
    + [Fine-tuning with LoRA and QLoRA](https://lightning.ai/pages/community/lora-insights/)
    + [Adapters](https://arxiv.org/abs/2304.01933)
    + [Motivation for Parameter-Efficient Fine-tuning](https://www.reddit.com/r/MachineLearning/comments/186ck5k/d_what_is_the_motivation_for_parameterefficient/)
    + [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data)

    # Small and Local LLMs
  26. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -11,6 +11,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Transformers as Support Vector Machines](https://arxiv.org/abs/2308.16898v1)
    + [Survey of LLMS](https://arxiv.org/abs/2303.18223)
    + [Deep Learning Systems](https://dlsyscourse.org/lectures/)
    + [Fundamental ML Reading List](https://github.com/RoundtableML/ML-Fundamentals-Reading-Lists)

    ### Building Blocks
    <img width="300" alt="Screenshot 2023-12-18 at 8 33 35 PM" src="https://gist.github.com/assets/3837836/a92cc13c-105f-4e9a-9b68-dc9b6d4a6e47">
    @@ -91,7 +92,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [How is LlamaCPP Possible?](https://finbarr.ca/how-is-llama-cpp-possible/)
    + [How to beat GPT-4 with a 13-B Model](https://lmsys.org/blog/2023-11-14-llm-decontaminator/)
    + [Efficient LLM Inference on CPUs](https://arxiv.org/abs/2311.00502v1)
    + [Tiny Language Models Come of Age](}https://www.quantamagazine.org/tiny-language-models-thrive-with-gpt-4-as-a-teacher-20231005/)
    + [Tiny Language Models Come of Age](https://www.quantamagazine.org/tiny-language-models-thrive-with-gpt-4-as-a-teacher-20231005/)
    + [Efficiency LLM Spectrum](https://github.com/tding1/Efficient-LLM-Survey)
    + [TinyML at MIT](https://efficientml.ai/)

    @@ -107,6 +108,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [(InThe)WildChat: 570K ChatGPT Interaction Logs In The Wild](https://openreview.net/forum?id=Bl8u7ZRlbM)
    + [LLM Inference Performance Engineering: Best Practices](https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices)
    + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5)
    + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering)

    ## Prompt Engineering

  27. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -51,7 +51,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Keys, Queries, and Values](https://d2l.ai/chapter_attention-mechanisms-and-transformers/queries-keys-values.html)

    ## GPT
    <img width="300" alt="Screenshot 2023-12-18 at 8 37 44 PM" src="[https://gist.github.com/assets/3837836/5ada409d-32cf-496e-9572-cb985ec97165](https://camo.githubusercontent.com/85d00cf9bca67e33c2d1270b51ff1ac01853b26a8d6bb226b711f859d065b4a6/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f74726c2d696e7465726e616c2d74657374696e672f6578616d706c652d696d616765732f7265736f6c76652f6d61696e2f696d616765732f74726c5f6f766572766965772e706e67)">
    <img width="300" alt="Screenshot 2023-12-18 at 8 37 44 PM" src="https://camo.githubusercontent.com/85d00cf9bca67e33c2d1270b51ff1ac01853b26a8d6bb226b711f859d065b4a6/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f74726c2d696e7465726e616c2d74657374696e672f6578616d706c652d696d616765732f7265736f6c76652f6d61696e2f696d616765732f74726c5f6f766572766965772e706e67">

    + [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/)
    + [My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698)
  28. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -51,7 +51,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Keys, Queries, and Values](https://d2l.ai/chapter_attention-mechanisms-and-transformers/queries-keys-values.html)

    ## GPT
    ![](https://camo.githubusercontent.com/85d00cf9bca67e33c2d1270b51ff1ac01853b26a8d6bb226b711f859d065b4a6/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f74726c2d696e7465726e616c2d74657374696e672f6578616d706c652d696d616765732f7265736f6c76652f6d61696e2f696d616765732f74726c5f6f766572766965772e706e67)
    <img width="300" alt="Screenshot 2023-12-18 at 8 37 44 PM" src="[https://gist.github.com/assets/3837836/5ada409d-32cf-496e-9572-cb985ec97165](https://camo.githubusercontent.com/85d00cf9bca67e33c2d1270b51ff1ac01853b26a8d6bb226b711f859d065b4a6/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f74726c2d696e7465726e616c2d74657374696e672f6578616d706c652d696d616765732f7265736f6c76652f6d61696e2f696d616765732f74726c5f6f766572766965772e706e67)">

    + [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/)
    + [My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698)
  29. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -13,7 +13,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    + [Deep Learning Systems](https://dlsyscourse.org/lectures/)

    ### Building Blocks
    <img width="405" alt="Screenshot 2023-12-18 at 8 33 35 PM" src="https://gist.github.com/assets/3837836/a92cc13c-105f-4e9a-9b68-dc9b6d4a6e47">
    <img width="300" alt="Screenshot 2023-12-18 at 8 33 35 PM" src="https://gist.github.com/assets/3837836/a92cc13c-105f-4e9a-9b68-dc9b6d4a6e47">

    + [What are embeddings](https://vickiboykis.com/what_are_embeddings/)
    + [Concepts from Operating Systems that Found their way into LLMS](https://muhtasham.github.io/blog/posts/os-concepts-llm/)
    @@ -36,7 +36,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N

    ## The Transformer Architecture

    <img width="506" alt="Screenshot 2023-12-18 at 8 37 44 PM" src="https://gist.github.com/assets/3837836/5ada409d-32cf-496e-9572-cb985ec97165">
    <img width="300" alt="Screenshot 2023-12-18 at 8 37 44 PM" src="https://gist.github.com/assets/3837836/5ada409d-32cf-496e-9572-cb985ec97165">

    + [Transformers from Scratch](https://e2eml.school/transformers.html)
    + [Transformer Math](https://blog.eleuther.ai/transformer-math/)
  30. @veekaybee veekaybee revised this gist Dec 19, 2023. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions normcore-llm.md
    Original file line number Diff line number Diff line change
    @@ -5,7 +5,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
    ## Foundational Concepts

    ### Pre-Transformer Models
    <img width="858" alt="Screenshot 2023-12-18 at 8 25 42 PM" src="https://gist.github.com/assets/3837836/20d3c630-62b1-4717-84d7-e2f24e3f25c7">
    <img width="500" alt="Screenshot 2023-12-18 at 8 25 42 PM" src="https://gist.github.com/assets/3837836/20d3c630-62b1-4717-84d7-e2f24e3f25c7">

    + [The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning (YouTube)](https://www.youtube.com/watch?v=ISPId9Lhc1g)
    + [Transformers as Support Vector Machines](https://arxiv.org/abs/2308.16898v1)
    @@ -24,7 +24,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N

    ## Foundational Deep Learning Papers

    <img width="730" alt="Screenshot 2023-12-18 at 8 35 18 PM" src="https://gist.github.com/assets/3837836/14d51fdf-1ad5-4807-8ad5-d3c81d16fef4">
    <img width="500" alt="Screenshot 2023-12-18 at 8 35 18 PM" src="https://gist.github.com/assets/3837836/14d51fdf-1ad5-4807-8ad5-d3c81d16fef4">

    + [BERT](https://arxiv.org/abs/1810.04805)
    + [Seq2Seq](https://arxiv.org/abs/1409.3215v3)