Skip to content

Instantly share code, notes, and snippets.

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@ChristopherA
ChristopherA / brew-bundle-brewfile-tips.md
Last active October 22, 2025 20:29
Brew Bundle Brewfile Tips

Brew Bundle Brewfile Tips

Copyright & License

Unless otherwise noted (either in this file or in a file's copyright section) the contents of this gist are Copyright ©️2020 by Christopher Allen, and are shared under spdx:Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.) open-source license.

Sponsor

If you more tips and advice like these, you can become a monthly patron on my GitHub Sponsor Page for as little as $5 a month; and your contributions will be multipled, as GitHub is matching the first $5,000! This gist is all about Homebrew, so if you like it you can support it by donating to them or becoming one of their Github Sponsors.

@akashpalrecha
akashpalrecha / an-inquiry-into-matplotlib-figures.ipynb
Last active December 27, 2024 14:38
An Inquiry into Matplotlib's Figures, Axes, subplots and the very amazing GridSpec!
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@thomwolf
thomwolf / gpt-2-wikitext-103.py
Last active October 25, 2025 13:45
A very small and self-contained gist to train a GPT-2 transformer model on wikitext-103
# Copyright (c) 2019-present, Thomas Wolf.
# All rights reserved. This source code is licensed under the MIT-style license.
""" A very small and self-contained gist to train a GPT-2 transformer model on wikitext-103 """
import os
from collections import namedtuple
from tqdm import tqdm
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from ignite.engine import Engine, Events
@thomwolf
thomwolf / top-k-top-p.py
Last active October 25, 2025 20:25
Sample the next token from a probability distribution using top-k and/or nucleus (top-p) sampling
def top_k_top_p_filtering(logits, top_k=0, top_p=0.0, filter_value=-float('Inf')):
""" Filter a distribution of logits using top-k and/or nucleus (top-p) filtering
Args:
logits: logits distribution shape (vocabulary size)
top_k >0: keep only top k tokens with highest probability (top-k filtering).
top_p >0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering).
Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)
"""
assert logits.dim() == 1 # batch size 1 for now - could be updated for more but the code would be less clear
top_k = min(top_k, logits.size(-1)) # Safety check
@MarvinT
MarvinT / black_code_prettify.json
Last active January 16, 2023 15:41
json you can paste into jupyter notebook's code prettify configuration that makes it use black to reformat your code instead of yapf.
{
"python": {
"library": "import json\ndef black_reformat(cell_text):\n import black\n import re\n cell_text = re.sub('^%', '#%#', cell_text, flags=re.M)\n try:\n reformated_text = black.format_str(cell_text, 88)\n except TypeError:\n reformated_text = black.format_str(cell_text, mode=black.FileMode(line_length=88))\n return re.sub('^#%#', '%', reformated_text, flags=re.M)",
"prefix": "print(json.dumps(black_reformat(u",
"postfix": ")))"
},
"r": {
"library": "library(formatR)\nlibrary(jsonlite)",
"prefix": "cat(toJSON(paste(tidy_source(text=",
"postfix": ", output=FALSE)[['text.tidy']], collapse='\n')))"
@chirag1992m
chirag1992m / weight_transfer.py
Created December 1, 2017 21:36
weight_transfer
import numpy as np
import torch
import keras
def pyt_to_keras(pytorch_model, keras_model):
"""
Given a PyTorch model, this method transfers the weight to
a Keras Model (with backend TensorFlow) with the same architecture.
Assumptions:
1. The corresponding layer names in both the models will be the same
@mommi84
mommi84 / awesome-kge.md
Last active April 14, 2025 11:27
Awesome Knowledge Graph Embedding Approaches

Awesome Knowledge Graph Embedding Approaches

Awesome

This list contains repositories of libraries and approaches for knowledge graph embeddings, which are vector representations of entities and relations in a multi-relational directed labelled graph. Licensed under CC0.

Libraries

@naotokui
naotokui / audio_lstm_keras.ipynb
Last active July 15, 2025 08:25
Audio generation with LSTM in keras
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@nvnhat95
nvnhat95 / basedline
Last active December 12, 2016 16:17
#include <opencv\cv.h>
#include <opencv\highgui.h>
#include <iostream>
#include <string>
#include <cmath>
using namespace std;
// display video from array of frames, using left and right button.
void displayVideo(cv::Mat* frames, cv::Point2i* centralPoints, int numFrame, int FPS, string windowName) {