Gyanachand1 · September 22, 2024 15:58 · Feb 7, 2024 · Feb 7, 2024 · Dec 30, 2023 · Dec 30, 2023
diff --git a/normcore-llm.md b/normcore-llm.md
@@ -153,6 +153,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [Transformer Inference Arithmetic](https://kipp.ly/transformer-inference-arithmetic/)
 + [Which serving technology to use for LLMs?](https://pages.run.ai/hubfs/PDFs/Serving-Large-Language-Models-Run-ai-Benchmarking-Study.pdf)
 + [Speeding up the K-V cache](https://www.dipkumar.dev/becoming-the-unbeatable/posts/gpt-kvcache/)
++ [Large Transformer Model Inference Optimization](https://lilianweng.github.io/posts/2023-01-10-inference-optimization/)
 
 ## Prompt Engineering and RAG
 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -141,11 +141,18 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [Against LLM Maximalism](https://explosion.ai/blog/against-llm-maximalism)
 + [A Guide to Inference and Performance](https://www.baseten.co/blog/llm-transformer-inference-guide/)
 + [(InThe)WildChat: 570K ChatGPT Interaction Logs In The Wild](https://openreview.net/forum?id=Bl8u7ZRlbM)
-+ [LLM Inference Performance Engineering: Best Practices](https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices)
 + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5)
 + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering)
 + [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data)
+
+
+## LLM Inference and K-V Cache
+
++ [LLM Inference Performance Engineering: Best Practices](https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices)
 + [How to Make LLMs go Fast](https://vgel.me/posts/faster-inference/)
++ [Transformer Inference Arithmetic](https://kipp.ly/transformer-inference-arithmetic/)
++ [Which serving technology to use for LLMs?](https://pages.run.ai/hubfs/PDFs/Serving-Large-Language-Models-Run-ai-Benchmarking-Study.pdf)
++ [Speeding up the K-V cache](https://www.dipkumar.dev/becoming-the-unbeatable/posts/gpt-kvcache/)
 
 ## Prompt Engineering and RAG
 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -145,6 +145,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5)
 + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering)
 + [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data)
++ [How to Make LLMs go Fast](https://vgel.me/posts/faster-inference/)
 
 ## Prompt Engineering and RAG
 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -27,6 +27,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [The Hardware Lottery](https://arxiv.org/abs/2009.06489)
 + [The Scaling Hypothesis](https://gwern.net/scaling-hypothesis)
 + [Tokenization](https://github.com/SumanthRH/tokenization)
++ [LLM Course](https://github.com/mlabonne/llm-course)
 
 ## Foundational Deep Learning Papers (in semi-chronological order)
 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -26,6 +26,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [The Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)
 + [The Hardware Lottery](https://arxiv.org/abs/2009.06489)
 + [The Scaling Hypothesis](https://gwern.net/scaling-hypothesis)
++ [Tokenization](https://github.com/SumanthRH/tokenization)
 
 ## Foundational Deep Learning Papers (in semi-chronological order)
 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -27,9 +27,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [The Hardware Lottery](https://arxiv.org/abs/2009.06489)
 + [The Scaling Hypothesis](https://gwern.net/scaling-hypothesis)
 
-## Foundational Deep Learning Papers
-
-<img width="500" alt="Screenshot 2023-12-18 at 8 35 18 PM" src="https://gist.github.com/assets/3837836/14d51fdf-1ad5-4807-8ad5-d3c81d16fef4">
+## Foundational Deep Learning Papers (in semi-chronological order)
 
 + [Seq2Seq](https://arxiv.org/abs/1409.3215v3)
 + [Attention is all you Need](https://arxiv.org/abs/1706.03762)

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -16,7 +16,6 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [Fundamental ML Reading List](https://github.com/RoundtableML/ML-Fundamentals-Reading-Lists)
 
 ### Building Blocks 
-<img width="300" alt="Screenshot 2023-12-18 at 8 33 35 PM" src="https://gist.github.com/assets/3837836/a92cc13c-105f-4e9a-9b68-dc9b6d4a6e47">
 
 + [What are embeddings](https://vickiboykis.com/what_are_embeddings/)
 + [Concepts from Operating Systems that Found their way into LLMS](https://muhtasham.github.io/blog/posts/os-concepts-llm/)

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -4,7 +4,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 
 ## Foundational Concepts 
 
-<img width="400" alt="Screenshot 2023-12-18 at 10 38 06 PM" src="https://gist.github.com/assets/3837836/b3385ca6-f833-4b69-ad92-f9d9f89b6be8">
+<img width="400" alt="Screenshot 2023-12-18 at 10 40 27 PM" src="https://gist.github.com/assets/3837836/4c30ad72-76ee-4939-a5fb-16b570d38cf2">
 
 ### Pre-Transformer Models
 <img width="500" alt="Screenshot 2023-12-18 at 8 25 42 PM" src="https://gist.github.com/assets/3837836/20d3c630-62b1-4717-84d7-e2f24e3f25c7">

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -4,6 +4,8 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 
 ## Foundational Concepts 
 
+<img width="400" alt="Screenshot 2023-12-18 at 10 38 06 PM" src="https://gist.github.com/assets/3837836/b3385ca6-f833-4b69-ad92-f9d9f89b6be8">
+
 ### Pre-Transformer Models
 <img width="500" alt="Screenshot 2023-12-18 at 8 25 42 PM" src="https://gist.github.com/assets/3837836/20d3c630-62b1-4717-84d7-e2f24e3f25c7">
 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -97,6 +97,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [Opt-175B Logbook](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf)
 
 ## RLHF and DPO
+<img width="500" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/1a5cf5af-fd6b-4d11-b3ed-649a4c841f2b">
 
 + [RLHF](https://huggingface.co/blog/rlhf)
   + [Supervised Fine-tuning](https://huggingface.co/docs/trl/main/en/sft_trainer) 
@@ -107,8 +108,6 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 
 ## Fine-Tuning and Compression
 
-<img width="500" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/1a5cf5af-fd6b-4d11-b3ed-649a4c841f2b">
-
 + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/)
 + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques
 + [On the Structural Pruning of Large Language Models](https://arxiv.org/abs/2305.11627)

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -106,7 +106,8 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [RLHF and DPO Compared](https://medium.com/aimonks/rlhf-and-dpo-compared-user-feedback-methods-for-llm-optimization-44f4234ae689)
 
 ## Fine-Tuning and Compression
-<img width="500" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="[https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1](https://gist.github.com/assets/3837836/1a5cf5af-fd6b-4d11-b3ed-649a4c841f2b)">
+
+<img width="500" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/1a5cf5af-fd6b-4d11-b3ed-649a4c841f2b">
 
 + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/)
 + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -106,6 +106,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [RLHF and DPO Compared](https://medium.com/aimonks/rlhf-and-dpo-compared-user-feedback-methods-for-llm-optimization-44f4234ae689)
 
 ## Fine-Tuning and Compression
+<img width="500" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="[https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1](https://gist.github.com/assets/3837836/1a5cf5af-fd6b-4d11-b3ed-649a4c841f2b)">
 
 + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/)
 + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -37,7 +37,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361)
 + [T5](https://jmlr.org/papers/v21/20-074.html)
 + [GPT-2: Language Models are Unsupervised Multi-Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
-+ [Training Language Models to Follow Instructions](https://arxiv.org/abs/2203.02155)
++ [InstructGPT: Training Language Models to Follow Instructions](https://arxiv.org/abs/2203.02155)
 + [GPT-3: Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) 
 
 ## The Transformer Architecture
@@ -150,6 +150,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [Prompt Engineering Versus Blind Prompting](https://mitchellh.com/writing/prompt-engineering-vs-blind-prompting)
 + [Building RAG-Based Applications for Production](https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1)
 + [Full Fine-Tuning, PEFT, or RAG?](https://deci.ai/blog/fine-tuning-peft-prompt-engineering-and-rag-which-one-is-right-for-you/)
++ [Prompt Engineering Guide](https://github.com/dair-ai/Prompt-Engineering-Guide)
 
 ## GPUs
 
@@ -173,6 +174,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 ### Eval Frameworks
   + [HELM](https://arxiv.org/pdf/2211.09110.pdf)
   + [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness)
+  + [LmSys Chatbot Arena](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)
 
 ## UX
 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -33,7 +33,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [Seq2Seq](https://arxiv.org/abs/1409.3215v3)
 + [Attention is all you Need](https://arxiv.org/abs/1706.03762)
 + [BERT](https://arxiv.org/abs/1810.04805)
-+ [GPT-1](https://mistral.ai/news/mixtral-of-experts/)
++ [GPT-1](https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf)
 + [Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361)
 + [T5](https://jmlr.org/papers/v21/20-074.html)
 + [GPT-2: Language Models are Unsupervised Multi-Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -163,17 +163,17 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 
 ## Evaluation
 
-### Frameworks:
-  + [HELM](https://arxiv.org/pdf/2211.09110.pdf)
-  + [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness)
-
 + [Evaluating ChatGPT](https://ehudreiter.com/2023/04/04/evaluating-chatgpt/)
 + [ChatGPT: Jack of All Trades, Master of None](https://github.com/CLARIN-PL/chatgpt-evaluation-01-2023)
 + [What's Going on with the Open LLM Leaderboard](https://huggingface.co/blog/evaluating-mmlu-leaderboard)
 + [Challenges in Evaluating AI Systems](https://www.anthropic.com/index/evaluating-ai-systems)
 + [LLM Evaluation Papers](https://github.com/tjunlp-lab/Awesome-LLMs-Evaluation-Papers)
 + [Evaluating LLMs is a MineField](https://www.cs.princeton.edu/~arvindn/talks/evaluating_llms_minefield/)
 
+### Eval Frameworks
+  + [HELM](https://arxiv.org/pdf/2211.09110.pdf)
+  + [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness)
+
 ## UX
 
 + [Generative Interfaces Beyond Chat (YouTube)](https://www.youtube.com/watch?v=rd-J3hmycQs)

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -30,13 +30,15 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 
 <img width="500" alt="Screenshot 2023-12-18 at 8 35 18 PM" src="https://gist.github.com/assets/3837836/14d51fdf-1ad5-4807-8ad5-d3c81d16fef4">
 
-+ [BERT](https://arxiv.org/abs/1810.04805)
 + [Seq2Seq](https://arxiv.org/abs/1409.3215v3)
 + [Attention is all you Need](https://arxiv.org/abs/1706.03762)
++ [BERT](https://arxiv.org/abs/1810.04805)
++ [GPT-1](https://mistral.ai/news/mixtral-of-experts/)
 + [Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361)
-+ [Language Models are Unsupervised Multi-Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
++ [T5](https://jmlr.org/papers/v21/20-074.html)
++ [GPT-2: Language Models are Unsupervised Multi-Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
 + [Training Language Models to Follow Instructions](https://arxiv.org/abs/2203.02155)
-+ [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) 
++ [GPT-3: Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) 
 
 ## The Transformer Architecture
 
@@ -60,9 +62,16 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/)
 + [My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698) 
 + [Karpathy's The State of GPT (YouTube)](https://www.youtube.com/watch?v=bZQun8Y4L2A)
-
 + [OpenAI Cookbook](https://cookbook.openai.com/)
 
+## Significant OSS Models
+
++ [Llama2](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/?ref=blog.oxen.ai)
++ [Mistral7B](https://arxiv.org/abs/2310.06825)
+  + [Mixtral](https://mistral.ai/news/mixtral-of-experts/)
++ [Phi2](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/)
++ [Falcon7B](https://huggingface.co/blog/falcon)
+
 ### LLMs in 2023
 
 <img width="600" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1">
@@ -154,6 +163,10 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 
 ## Evaluation
 
+### Frameworks:
+  + [HELM](https://arxiv.org/pdf/2211.09110.pdf)
+  + [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness)
+
 + [Evaluating ChatGPT](https://ehudreiter.com/2023/04/04/evaluating-chatgpt/)
 + [ChatGPT: Jack of All Trades, Master of None](https://github.com/CLARIN-PL/chatgpt-evaluation-01-2023)
 + [What's Going on with the Open LLM Leaderboard](https://huggingface.co/blog/evaluating-mmlu-leaderboard)

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -90,6 +90,8 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 ## RLHF and DPO
 
 + [RLHF](https://huggingface.co/blog/rlhf)
+  + [Supervised Fine-tuning](https://huggingface.co/docs/trl/main/en/sft_trainer) 
+  + [How Abilities in LLMs Are Affected by SFT](https://arxiv.org/abs/2310.05492)
 + [Instruction-tuning for LLMs: Survey](https://arxiv.org/abs/2308.10792)
 + [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290)
 + [RLHF and DPO Compared](https://medium.com/aimonks/rlhf-and-dpo-compared-user-feedback-methods-for-llm-optimization-44f4234ae689)
@@ -107,7 +109,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
   + [Fine-tuning with LoRA and QLoRA](https://lightning.ai/pages/community/lora-insights/)
   + [Adapters](https://arxiv.org/abs/2304.01933)
   + [Motivation for Parameter-Efficient Fine-tuning](https://www.reddit.com/r/MachineLearning/comments/186ck5k/d_what_is_the_motivation_for_parameterefficient/)
-+ [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data)
+
 
 # Small and Local LLMs
 
@@ -131,6 +133,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [LLM Inference Performance Engineering: Best Practices](https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices)
 + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5)
 + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering)
++ [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data)
 
 ## Prompt Engineering and RAG
 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -132,10 +132,12 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5)
 + [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering)
 
-## Prompt Engineering 
+## Prompt Engineering and RAG
 
 + [On Prompt Engineering](https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/)
 + [Prompt Engineering Versus Blind Prompting](https://mitchellh.com/writing/prompt-engineering-vs-blind-prompting)
++ [Building RAG-Based Applications for Production](https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1)
++ [Full Fine-Tuning, PEFT, or RAG?](https://deci.ai/blog/fine-tuning-peft-prompt-engineering-and-rag-which-one-is-right-for-you/)
 
 ## GPUs
 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -66,6 +66,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 ### LLMs in 2023
 
 <img width="600" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1">
+
 + [Catching up on the weird world of LLMS](https://simonwillison.net/2023/Aug/3/weird-world-of-llms)
 + [How open are open architectures?](https://opening-up-chatgpt.github.io/)
 + [Building an LLM from Scratch](https://github.com/rasbt/LLMs-from-scratch)

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -64,10 +64,14 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [OpenAI Cookbook](https://cookbook.openai.com/)
 
 ### LLMs in 2023
+
+<img width="600" alt="Screenshot 2023-12-18 at 10 07 57 PM" src="https://gist.github.com/assets/3837836/9fcc3f92-719b-4b2c-b4f1-9be506101eb1">
 + [Catching up on the weird world of LLMS](https://simonwillison.net/2023/Aug/3/weird-world-of-llms)
 + [How open are open architectures?](https://opening-up-chatgpt.github.io/)
 + [Building an LLM from Scratch](https://github.com/rasbt/LLMs-from-scratch)
 + [Large Language Models in 2023](https://www.youtube.com/watch?v=dbo3kNKPaUA&feature=youtu.be) and [Slides](https://docs.google.com/presentation/d/1636wKStYdT_yRPbJNrf8MLKpQghuWGDmyHinHhAKeXY/edit#slide=id.g2885e521b53_0_0)
++ [Timeline of Transformer Models](https://ai.v-gar.de/ml/transformer/timeline/)
++ [Large Language Model Evolutionary Tree](https://notes.kateva.org/2023/04/large-language-models-evolutionary-tree.html)
 
 ## Training Data
 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -60,10 +60,12 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/)
 + [My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698) 
 + [Karpathy's The State of GPT (YouTube)](https://www.youtube.com/watch?v=bZQun8Y4L2A)
-+ [How open are open architectures?](https://opening-up-chatgpt.github.io/)
+
++ [OpenAI Cookbook](https://cookbook.openai.com/)
 
 ### LLMs in 2023
 + [Catching up on the weird world of LLMS](https://simonwillison.net/2023/Aug/3/weird-world-of-llms)
++ [How open are open architectures?](https://opening-up-chatgpt.github.io/)
 + [Building an LLM from Scratch](https://github.com/rasbt/LLMs-from-scratch)
 + [Large Language Models in 2023](https://www.youtube.com/watch?v=dbo3kNKPaUA&feature=youtu.be) and [Slides](https://docs.google.com/presentation/d/1636wKStYdT_yRPbJNrf8MLKpQghuWGDmyHinHhAKeXY/edit#slide=id.g2885e521b53_0_0)
 
@@ -91,8 +93,9 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 
 + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/)
 + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques
++ [On the Structural Pruning of Large Language Models](https://arxiv.org/abs/2305.11627)
 + Quantiztion
-  + A Gentle Introduction to 8-bit matrix multiplication](https://huggingface.co/blog/hf-bitsandbytes-integration)
+  + [A Gentle Introduction to 8-bit matrix multiplication](https://huggingface.co/blog/hf-bitsandbytes-integration)
   + [Which Quantization Method is Right for You?](https://maartengrootendorst.substack.com/p/which-quantization-method-is-right)
   + [Survey of Quantization for Inference](https://arxiv.org/abs/2103.13630)
 + [PEFT](https://github.com/huggingface/peft)

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -24,6 +24,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [Eight things to know about large language models](https://arxiv.org/pdf/2304.00612.pdf)
 + [The Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)
 + [The Hardware Lottery](https://arxiv.org/abs/2009.06489)
++ [The Scaling Hypothesis](https://gwern.net/scaling-hypothesis)
 
 ## Foundational Deep Learning Papers
 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -130,8 +130,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 
 ## GPUs
 
-
-<img width="865" alt="Screenshot 2023-12-18 at 10 02 48 PM" src="https://gist.github.com/assets/3837836/655fedc2-dbc8-406a-a583-65b9a91d4ab9">
+<img width="600" alt="Screenshot 2023-12-18 at 10 02 48 PM" src="https://gist.github.com/assets/3837836/655fedc2-dbc8-406a-a583-65b9a91d4ab9">
 
 + [The Best GPUS for Deep Learning 2023](https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/)
 + [Making Deep Learning Go Brr from First Principles](https://horace.io/brrr_intro.html)

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -22,6 +22,8 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [Language Modeling is Compression](https://arxiv.org/abs/2309.10668)
 + [Vector Search - Long-Term Memory in AI](https://github.com/edoliberty/vector-search-class-notes)
 + [Eight things to know about large language models](https://arxiv.org/pdf/2304.00612.pdf)
++ [The Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)
++ [The Hardware Lottery](https://arxiv.org/abs/2009.06489)
 
 ## Foundational Deep Learning Papers
 
@@ -128,6 +130,9 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 
 ## GPUs
 
+
+<img width="865" alt="Screenshot 2023-12-18 at 10 02 48 PM" src="https://gist.github.com/assets/3837836/655fedc2-dbc8-406a-a583-65b9a91d4ab9">
+
 + [The Best GPUS for Deep Learning 2023](https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/)
 + [Making Deep Learning Go Brr from First Principles](https://horace.io/brrr_intro.html)
 + [Everything about Distributed Training and Efficient Finetuning](https://sumanthrh.com/post/distributed-and-efficient-finetuning/)

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -77,14 +77,25 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + Training [Compute-Optimal Large Language Models](https://arxiv.org/abs/2203.15556)
 + [Opt-175B Logbook](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf)
 
-## Fine-Tuning 
+## RLHF and DPO
+
++ [RLHF](https://huggingface.co/blog/rlhf)
++ [Instruction-tuning for LLMs: Survey](https://arxiv.org/abs/2308.10792)
++ [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290)
++ [RLHF and DPO Compared](https://medium.com/aimonks/rlhf-and-dpo-compared-user-feedback-methods-for-llm-optimization-44f4234ae689)
+
+## Fine-Tuning and Compression
 
 + [The Complete Guide to LLM Fine-tuning](https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/)
 + [LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language](https://arxiv.org/pdf/2312.09993.pdf) - Really great overview of SOTA fine-tuning techniques
-+ [A Gentle Introduction to 8-bit matrix multiplication](https://huggingface.co/blog/hf-bitsandbytes-integration)
-+ [Motivation for Parameter-Efficient Fine-tuning](https://www.reddit.com/r/MachineLearning/comments/186ck5k/d_what_is_the_motivation_for_parameterefficient/)
-+ [Which Quantization Method is Right for You?](https://maartengrootendorst.substack.com/p/which-quantization-method-is-right)
-+ [Fine-tuning with LoRA and QLoRA](https://lightning.ai/pages/community/lora-insights/)
++ Quantiztion
+  + A Gentle Introduction to 8-bit matrix multiplication](https://huggingface.co/blog/hf-bitsandbytes-integration)
+  + [Which Quantization Method is Right for You?](https://maartengrootendorst.substack.com/p/which-quantization-method-is-right)
+  + [Survey of Quantization for Inference](https://arxiv.org/abs/2103.13630)
++ [PEFT](https://github.com/huggingface/peft)
+  + [Fine-tuning with LoRA and QLoRA](https://lightning.ai/pages/community/lora-insights/)
+  + [Adapters](https://arxiv.org/abs/2304.01933)
+  + [Motivation for Parameter-Efficient Fine-tuning](https://www.reddit.com/r/MachineLearning/comments/186ck5k/d_what_is_the_motivation_for_parameterefficient/)
 + [Fine-tuning RedPajama on Slack Data](https://www.union.ai/blog-post/fine-tuning-insights-lessons-from-experimenting-with-redpajama-large-language-model-on-flyte-slack-data)
 
 # Small and Local LLMs

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -11,6 +11,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [Transformers as Support Vector Machines](https://arxiv.org/abs/2308.16898v1)
 + [Survey of LLMS](https://arxiv.org/abs/2303.18223)
 + [Deep Learning Systems](https://dlsyscourse.org/lectures/)
++ [Fundamental ML Reading List](https://github.com/RoundtableML/ML-Fundamentals-Reading-Lists)
 
 ### Building Blocks 
 <img width="300" alt="Screenshot 2023-12-18 at 8 33 35 PM" src="https://gist.github.com/assets/3837836/a92cc13c-105f-4e9a-9b68-dc9b6d4a6e47">
@@ -91,7 +92,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [How is LlamaCPP Possible?](https://finbarr.ca/how-is-llama-cpp-possible/)
 + [How to beat GPT-4 with a 13-B Model](https://lmsys.org/blog/2023-11-14-llm-decontaminator/)
 + [Efficient LLM Inference on CPUs](https://arxiv.org/abs/2311.00502v1)
-+ [Tiny Language Models Come of Age](}https://www.quantamagazine.org/tiny-language-models-thrive-with-gpt-4-as-a-teacher-20231005/)
++ [Tiny Language Models Come of Age](https://www.quantamagazine.org/tiny-language-models-thrive-with-gpt-4-as-a-teacher-20231005/)
 + [Efficiency LLM Spectrum](https://github.com/tding1/Efficient-LLM-Survey)
 + [TinyML at MIT](https://efficientml.ai/)
 
@@ -107,6 +108,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [(InThe)WildChat: 570K ChatGPT Interaction Logs In The Wild](https://openreview.net/forum?id=Bl8u7ZRlbM)
 + [LLM Inference Performance Engineering: Best Practices](https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices)
 + [The State of Production LLMs in 2023](https://youtu.be/kMb4TmhTlbk?si=Tdbp-2BKGF5G_qk5)
++ [Machine Learning Engineering for successful training of large language models and multi-modal models.](https://github.com/stas00/ml-engineering)
 
 ## Prompt Engineering 
 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -51,7 +51,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [Keys, Queries, and Values](https://d2l.ai/chapter_attention-mechanisms-and-transformers/queries-keys-values.html)
 
 ## GPT 
-<img width="300" alt="Screenshot 2023-12-18 at 8 37 44 PM" src="[https://gist.github.com/assets/3837836/5ada409d-32cf-496e-9572-cb985ec97165](https://camo.githubusercontent.com/85d00cf9bca67e33c2d1270b51ff1ac01853b26a8d6bb226b711f859d065b4a6/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f74726c2d696e7465726e616c2d74657374696e672f6578616d706c652d696d616765732f7265736f6c76652f6d61696e2f696d616765732f74726c5f6f766572766965772e706e67)">
+<img width="300" alt="Screenshot 2023-12-18 at 8 37 44 PM" src="https://camo.githubusercontent.com/85d00cf9bca67e33c2d1270b51ff1ac01853b26a8d6bb226b711f859d065b4a6/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f74726c2d696e7465726e616c2d74657374696e672f6578616d706c652d696d616765732f7265736f6c76652f6d61696e2f696d616765732f74726c5f6f766572766965772e706e67">
 
 + [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/)
 + [My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698) 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -51,7 +51,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [Keys, Queries, and Values](https://d2l.ai/chapter_attention-mechanisms-and-transformers/queries-keys-values.html)
 
 ## GPT 
-![](https://camo.githubusercontent.com/85d00cf9bca67e33c2d1270b51ff1ac01853b26a8d6bb226b711f859d065b4a6/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f74726c2d696e7465726e616c2d74657374696e672f6578616d706c652d696d616765732f7265736f6c76652f6d61696e2f696d616765732f74726c5f6f766572766965772e706e67)
+<img width="300" alt="Screenshot 2023-12-18 at 8 37 44 PM" src="[https://gist.github.com/assets/3837836/5ada409d-32cf-496e-9572-cb985ec97165](https://camo.githubusercontent.com/85d00cf9bca67e33c2d1270b51ff1ac01853b26a8d6bb226b711f859d065b4a6/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f74726c2d696e7465726e616c2d74657374696e672f6578616d706c652d696d616765732f7265736f6c76652f6d61696e2f696d616765732f74726c5f6f766572766965772e706e67)">
 
 + [What is ChatGPT doing and why does it work](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/)
 + [My own notes from a few months back.](https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698) 

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -13,7 +13,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 + [Deep Learning Systems](https://dlsyscourse.org/lectures/)
 
 ### Building Blocks 
-<img width="405" alt="Screenshot 2023-12-18 at 8 33 35 PM" src="https://gist.github.com/assets/3837836/a92cc13c-105f-4e9a-9b68-dc9b6d4a6e47">
+<img width="300" alt="Screenshot 2023-12-18 at 8 33 35 PM" src="https://gist.github.com/assets/3837836/a92cc13c-105f-4e9a-9b68-dc9b6d4a6e47">
 
 + [What are embeddings](https://vickiboykis.com/what_are_embeddings/)
 + [Concepts from Operating Systems that Found their way into LLMS](https://muhtasham.github.io/blog/posts/os-concepts-llm/)
@@ -36,7 +36,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 
 ## The Transformer Architecture
 
-<img width="506" alt="Screenshot 2023-12-18 at 8 37 44 PM" src="https://gist.github.com/assets/3837836/5ada409d-32cf-496e-9572-cb985ec97165">
+<img width="300" alt="Screenshot 2023-12-18 at 8 37 44 PM" src="https://gist.github.com/assets/3837836/5ada409d-32cf-496e-9572-cb985ec97165">
 
 + [Transformers from Scratch](https://e2eml.school/transformers.html)
 + [Transformer Math](https://blog.eleuther.ai/transformer-math/)

diff --git a/normcore-llm.md b/normcore-llm.md
@@ -5,7 +5,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 ## Foundational Concepts 
 
 ### Pre-Transformer Models
-<img width="858" alt="Screenshot 2023-12-18 at 8 25 42 PM" src="https://gist.github.com/assets/3837836/20d3c630-62b1-4717-84d7-e2f24e3f25c7">
+<img width="500" alt="Screenshot 2023-12-18 at 8 25 42 PM" src="https://gist.github.com/assets/3837836/20d3c630-62b1-4717-84d7-e2f24e3f25c7">
 
 + [The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning (YouTube)](https://www.youtube.com/watch?v=ISPId9Lhc1g)
 + [Transformers as Support Vector Machines](https://arxiv.org/abs/2308.16898v1)
@@ -24,7 +24,7 @@ Goals: Add links that are reasonable and good explanations of how stuff works. N
 
 ## Foundational Deep Learning Papers
 
-<img width="730" alt="Screenshot 2023-12-18 at 8 35 18 PM" src="https://gist.github.com/assets/3837836/14d51fdf-1ad5-4807-8ad5-d3c81d16fef4">
+<img width="500" alt="Screenshot 2023-12-18 at 8 35 18 PM" src="https://gist.github.com/assets/3837836/14d51fdf-1ad5-4807-8ad5-d3c81d16fef4">
 
 + [BERT](https://arxiv.org/abs/1810.04805)
 + [Seq2Seq](https://arxiv.org/abs/1409.3215v3)