ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response. We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. ChatGPT is fine-tuned from a model in the GPT-3.5 series
text-davinci-003 is an improvement on text-davinci-002

The model was trained on
-
Books1
-
Books2
-
WebText2
-
The model data is recent as of 2021 and does offline inference :
- My Twitter Thread Question on the Model
- Language Models are Few-Shot Learners: GPT3
- Model- reinforcement learning for language models
- Illutstrating Reinforcement Learning from Human Feedback Tutorial
- ChatGPT is actually three models in a trench coat: a language model, a reward model, and fine-tuning the langauge model with a reward model
- InstructGPT Blog Post
- InstructGPT Model Card
- Models Referred to as GPT 3.5
- OpenAI comes clean about GPT 3.5
- Possibly davinci-003
- Azure
- K8s Source here
A large machine learning job spans many nodes and runs most efficiently when it has access to all of the hardware resources on each node. This allows GPUs to cross-communicate directly using NVLink, or GPUs to directly communicate with the NIC using GPUDirect. So for many of our workloads, a single pod occupies the entire node.
We have very little HTTPS traffic, with no need for A/B testing, blue/green, or canaries. Pods communicate directly with one another on their pod IP addresses with MPI via SSH, not service endpoints. Service “discovery” is limited; we just do a one-time lookup for which pods are participating in MPI at job startup time.
- Code completion
- Semantic search










