Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
 
| # train_grpo.py | |
| # | |
| # See https://github.com/willccbb/verifiers for ongoing developments | |
| # | |
| """ | |
| citation: | |
| @misc{brown2025grpodemo, | |
| title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models}, | |
| author={Brown, William}, | 
ChatGPT appeared like an explosion on all my social media timelines in early December 2022. While I keep up with machine learning as an industry, I wasn't focused so much on this particular corner, and all the screenshots seemed like they came out of nowhere. What was this model? How did the chat prompting work? What was the context of OpenAI doing this work and collecting my prompts for training data?
I decided to do a quick investigation. Here's all the information I've found so far. I'm aggregating and synthesizing it as I go, so it's currently changing pretty frequently.
Lecture 1: Introduction to Research — [📝Lecture Notebooks] [
Lecture 2: Introduction to Python — [📝Lecture Notebooks] [
Lecture 3: Introduction to NumPy — [📝Lecture Notebooks] [
Lecture 4: Introduction to pandas — [📝Lecture Notebooks] [
Lecture 5: Plotting Data — [📝Lecture Notebooks] [[
| import kotlin.coroutines.* | |
| import kotlin.coroutines.intrinsics.* | |
| /** | |
| * Implementation for Delimited Continuations `shift`/`reset` primitives via Kotlin Coroutines. | |
| * See [https://en.wikipedia.org/wiki/Delimited_continuation]. | |
| * | |
| * The following LISP code: | |
| * | |
| * ``` | 
| rem see https://github.com/coreybutler/nvm-windows/issues/300 | |
| @echo off | |
| SETLOCAL EnableDelayedExpansion | |
| if [%1] == [] ( | |
| echo Pass in the version you would like to install, or "latest" to install the latest npm version. | |
| ) else ( | |
| set wanted_version=%1 | 
Delimited continuations manipulate the control flow of programs. Similar to control structures like conditionals or loops they allow to deviate from a sequential flow of control.
We use exception handling as another example for control flow manipulation and later show how to implement it using delimited continuations. Finally, we show that nondeterminism can also be expressed using delimited continuations.