# ChatGPT Resources ## Public Info + [Blog Post](https://openai.com/blog/chatgpt/) > ChatGPT is a sibling model to **InstructGPT**, which is trained to follow an instruction in a prompt and provide a detailed response. We trained this model using **Reinforcement Learning from Human Feedback (RLHF)**, using the same methods as InstructGPT, but with slight differences in the data collection setup. ChatGPT is fine-tuned from a model in the **GPT-3.5** series `text-davinci-003 is an improvement on text-davinci-002` Screen Shot 2022-12-10 at 2 10 54 PM

## Business Context + [OpenAI in 2019](https://vicki.substack.com/p/i-spent-1-billion-and-all-i-got-was) ## High-level overview: + [OpenGPT Jailbreak?](https://www.youtube.com/watch?v=0A8ljAkdFtg) ## Training Data Screen Shot 2022-12-10 at 2 13 51 PM

The model was trained on + Books1 + Books2 + [Common Crawl](https://en.wikipedia.org/wiki/Common_Crawl) + WebText2 + [My Twitter Thread Question on Training Data](https://twitter.com/vboykis/status/1290030614410702848) + [Books1 and Books2](https://twitter.com/theshawwn/status/1320282151645073408) - [Books1 Resources](https://github.com/soskek/bookcorpus/issues/27#issuecomment-716104208) + [Bookcorpus paper](https://arxiv.org/abs/2105.05241) + [What's in MyAI Paper](https://lifearchitect.ai/whats-in-my-ai-paper/), [Source](https://twitter.com/kdamica/status/1600328844753240065) + The model data is recent as of 2021 and does offline inference : Screen Shot 2022-12-08 at 2 49 16 PM

## Models + My Twitter Thread Question [on the Model](https://twitter.com/vboykis/status/1600307649496522753) + [Language Models are Few-Shot Learners: GPT3](https://arxiv.org/abs/2005.14165) + [Model- reinforcement learning for language models](https://github.com/lvwerra/trl) + [Illutstrating Reinforcement Learning from Human Feedback Tutorial](https://huggingface.co/blog/rlhf) + ChatGPT is actually three models in a trench coat: a language model, a reward model, and fine-tuning the langauge model with a reward model + [InstructGPT Blog Post](https://openai.com/blog/instruction-following/) + [InstructGPT Model Card](https://github.com/openai/following-instructions-human-feedback/blob/main/model-card.md) + [Models Referred to as GPT 3.5](https://beta.openai.com/docs/model-index-for-researchers) + [OpenAI comes clean about GPT 3.5](https://jmcdonnell.substack.com/p/openai-comes-clean-about-gpt-35) + [Possibly davinci-003](https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse) ## Model Evaluation + ## Market + [OpenAI models as a service](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/models)

## Infra + Azure + [K8s](https://openai.com/blog/scaling-kubernetes-to-7500-nodes/) [Source here](https://jawns.club/@april@sigmoid.social/109480732828653579) > A large machine learning job spans many nodes and runs most efficiently when it has access to all of the hardware resources on each node. This allows GPUs to cross-communicate directly using NVLink, or GPUs to directly communicate with the NIC using GPUDirect. So for many of our workloads, a single pod occupies the entire node. > We have very little HTTPS traffic, with no need for A/B testing, blue/green, or canaries. Pods communicate directly with one another on their pod IP addresses with MPI via SSH, not service endpoints. Service “discovery” is limited; we just do a one-time lookup for which pods are participating in MPI at job startup time. + [Terraform, Python, Chef, GPU workloads on 500+ node clusters](https://boards.greenhouse.io/openai/jobs/4315830004) ## Use-Cases + Code completion + Semantic search ## My Attempts Screen Shot 2022-12-08 at 11 45 43 AM