Skip to content

Instantly share code, notes, and snippets.

@notpolomarco
Forked from kalomaze/pref_model.md
Created April 5, 2025 10:56
Show Gist options
  • Save notpolomarco/c4f4788cca1e8a349f162fc4ff801ca4 to your computer and use it in GitHub Desktop.
Save notpolomarco/c4f4788cca1e8a349f162fc4ff801ca4 to your computer and use it in GitHub Desktop.
pref modeling overview

the generic basics of preference reward modeling

The Bradley-Terry model works like this:

  • It's based on a chosen/rejected split
  • The model is trained on binary judgements of specific content/samples as being either 'preferred' or 'dispreferred'
  • The log ratio between preferred and dispreferred can be used as the natural reward signal

what parts are new when it comes to what i am trying to do

For my experimental setup I am doing chunks of the last 64 tokens in the sequence to train my reward model, and evaluating each chunk on a sliding window. Then, I am taking the average of these judgements across the sequence as the reward for the whole longform generation.

In addition to this, I'm making synthetic preferred/unpreferred data via the Qwen2.5 7b base model at varying temperatures. For future revisions, I want to experiment with intentionally making the text worse in more diverse ways, such as translating to and from another language.

This creates a preference modeling baseline that by default is normalized at different positions, and is always judging the same relative "volume" of information at a time on average.

The model expects input in this precise format:

[Original text from previous 64-token chunks]...

<<JUDGEMENT_REGION>>
[Next 64-token chunk to evaluate]
<</JUDGEMENT_REGION>>

<<JUDGEMENT>>letter

In my setup, the letter is A (chosen) or B (rejected).

I use vllm to evaluate the probability distribution for the A/B comparison, for every chunk.

link to the model and dataset

https://huggingface.co/Quest-AI/pretrain-rm-baseline-7b

https://huggingface.co/datasets/Quest-AI/quest-270k-chunked-64-judgement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment