ChatGPT Resources

Public Info

Blog Post

ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response. We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. ChatGPT is fine-tuned from a model in the GPT-3.5 series

text-davinci-003 is an improvement on text-davinci-002

Business Context

OpenAI in 2019

High-level overview:

OpenGPT Jailbreak?

Training Data

The model was trained on

Books1
Books2
Common Crawl
WebText2
My Twitter Thread Question on Training Data
Books1 and Books2 - Books1 Resources
Bookcorpus paper
What's in MyAI Paper, Source
The model data is recent as of 2021 and does offline inference :

Models

My Twitter Thread Question on the Model
Language Models are Few-Shot Learners: GPT3
Model- reinforcement learning for language models
Illutstrating Reinforcement Learning from Human Feedback Tutorial
- ChatGPT is actually three models in a trench coat: a language model, a reward model, and fine-tuning the langauge model with a reward model
InstructGPT Blog Post
InstructGPT Model Card
Models Referred to as GPT 3.5
OpenAI comes clean about GPT 3.5
Possibly davinci-003

Model Evaluation

Market

OpenAI models as a service

Infra

Azure
K8s Source here

A large machine learning job spans many nodes and runs most efficiently when it has access to all of the hardware resources on each node. This allows GPUs to cross-communicate directly using NVLink, or GPUs to directly communicate with the NIC using GPUDirect. So for many of our workloads, a single pod occupies the entire node.

We have very little HTTPS traffic, with no need for A/B testing, blue/green, or canaries. Pods communicate directly with one another on their pod IP addresses with MPI via SSH, not service endpoints. Service “discovery” is limited; we just do a one-time lookup for which pods are participating in MPI at job startup time.