🎯

Focusing

Kire Howard MatrixJockey

🎯

Focusing

3 followers · 17 following

Chicago
23:03 (UTC -05:00)

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

willccbb / grpo_demo.py

Last active October 25, 2025 16:39

GRPO Llama-1B

	# train_grpo.py
	#
	# See https://github.com/willccbb/verifiers for ongoing developments
	#
	"""
	citation:

	@misc{brown2025grpodemo,
	title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
	author={Brown, William},