Skip to content

Instantly share code, notes, and snippets.

@AlanHowlett
Forked from veekaybee/chatgpt.md
Created December 12, 2022 16:02
Show Gist options
  • Save AlanHowlett/b3302458e827e2ae3a18053cacf69aad to your computer and use it in GitHub Desktop.
Save AlanHowlett/b3302458e827e2ae3a18053cacf69aad to your computer and use it in GitHub Desktop.
Everything I understand about chatgpt

ChatGPT Resources

Public Info

Business Context

Training Data

Models

We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup.

Infra

  • Azure

  • K8s Source here A large machine learning job spans many nodes and runs most efficiently when it has access to all of the hardware resources on each node. This allows GPUs to cross-communicate directly using NVLink, or GPUs to directly communicate with the NIC using GPUDirect. So for many of our workloads, a single pod occupies the entire node.

    We have very little HTTPS traffic, with no need for A/B testing, blue/green, or canaries. Pods communicate directly with one another on their pod IP addresses with MPI via SSH, not service endpoints. Service “discovery” is limited; we just do a one-time lookup for which pods are participating in MPI at job startup time.

  • Terraform, Python, Chef, GPU workloads on 500+ node clusters

My Attempts

Screen Shot 2022-12-08 at 11 45 43 AM

Screen Shot 2022-12-08 at 4 10 38 PM

Screen Shot 2022-12-08 at 4 11 07 PM

Screen Shot 2022-12-08 at 4 22 22 PM

Screen Shot 2022-12-08 at 4 23 07 PM

Screen Shot 2022-12-08 at 4 24 17 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment