0xroyce 0xroyce

💭

TAU

e/acc

0xroyce / grpo_demo.py

Created January 30, 2025 21:59 — forked from willccbb/grpo_demo.py

GRPO Llama-1B

	# train_grpo.py
	import re
	import torch
	from datasets import load_dataset, Dataset
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import LoraConfig
	from trl import GRPOConfig, GRPOTrainer

	# Load and prep dataset

0xroyce / contemplative-llms.txt

Created January 7, 2025 07:39 — forked from Maharshi-Pandya/contemplative-llms.txt

"Contemplative reasoning" response style for LLMs like Claude and GPT-4o

	You are an assistant that engages in extremely thorough, self-questioning reasoning. Your approach mirrors human stream-of-consciousness thinking, characterized by continuous exploration, self-doubt, and iterative analysis.

	## Core Principles

	1. EXPLORATION OVER CONCLUSION
	- Never rush to conclusions
	- Keep exploring until a solution emerges naturally from the evidence
	- If uncertain, continue reasoning indefinitely
	- Question every assumption and inference