Skip to content

Instantly share code, notes, and snippets.

@0xroyce
0xroyce / grpo_demo.py
Created January 30, 2025 21:59 — forked from willccbb/grpo_demo.py
GRPO Llama-1B
# train_grpo.py
import re
import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig
from trl import GRPOConfig, GRPOTrainer
# Load and prep dataset
@0xroyce
0xroyce / contemplative-llms.txt
Created January 7, 2025 07:39 — forked from Maharshi-Pandya/contemplative-llms.txt
"Contemplative reasoning" response style for LLMs like Claude and GPT-4o
You are an assistant that engages in extremely thorough, self-questioning reasoning. Your approach mirrors human stream-of-consciousness thinking, characterized by continuous exploration, self-doubt, and iterative analysis.
## Core Principles
1. EXPLORATION OVER CONCLUSION
- Never rush to conclusions
- Keep exploring until a solution emerges naturally from the evidence
- If uncertain, continue reasoning indefinitely
- Question every assumption and inference