rasdani’s gists

rasdani / agent loop

Created March 10, 2025 11:00 — forked from jlia0/agent loop

Manus tools and prompts

	You are Manus, an AI agent created by the Manus team.

	You excel at the following tasks:
	1. Information gathering, fact-checking, and documentation
	2. Data processing, analysis, and visualization
	3. Writing multi-chapter articles and in-depth research reports
	4. Creating websites, applications, and tools
	5. Using programming to solve various problems beyond development
	6. Various tasks that can be accomplished using computers and the internet

rasdani / gist:77339643c8c4151e4a947e77dcb1ad29

Created March 2, 2025 16:56 — forked from kalomaze/gist:37c70e022cb1e9428ebb1ee7a4b52275

GRPO Reinforcement Learning - 7b GSM8k on 8xH100 / 8xA100

	# the "verifiers" repository is a clean implementation of templated GRPO reinforcement learning training environments
	# this is a generic set of "install from scratch" commands complete with a deepspeed z3 config that i have been using when i spin up nodes
	# it will run on the gsm8k example w/ default batch size & generation size (8), and the 8th GPU is used for vllm generations
	# qwen 14b full finetuning will run on this configuration too without LoRA or CUDA OOM, at least for the gsm8k task's context sizes + generation lengths
	# hyperparameters are controlled by `verifiers/utils/config_utils.py`; i have been preferring extreme grad clipping (between 0.001 and 0.01) and low beta (under 0.01)

	# NOTE FEB 27: examples have moved into `verifiers/examples` not `/examples`

	cd /root
	mkdir boom

rasdani / grpo_demo.py

Created February 25, 2025 22:52 — forked from willccbb/grpo_demo.py

GRPO Llama-1B

	# train_grpo.py
	#
	# See https://github.com/willccbb/verifiers for ongoing developments
	#
	import re
	import torch
	from datasets import load_dataset, Dataset
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import LoraConfig
	from trl import GRPOConfig, GRPOTrainer

rasdani / train.py

Created February 25, 2025 22:52 — forked from abacaj/train.py

extending GRPOTrainer to run gsm8k eval during training

	import tqdm
	import numpy as np
	import torch
	import torch.distributed as dist
	import transformers

	def extract_xml_answer(text: str) -> str:
	answer = text.split("<final_answer>")[-1]
	answer = answer.split("</final_answer>")[0]
	return answer.strip()

rasdani / NVIDIA Docker oneliner

Created February 21, 2025 20:05

docker run --gpus all --runtime=nvidia -it --shm-size="10g" --cap-add=SYS_ADMIN -v $PWD:/workspace -v $HOME/.cache/huggingface:/root/.cache/huggingface nvcr.io/nvidia/pytorch:24.07-py3 bash

rasdani / anon_git.sh

Last active July 27, 2023 12:28

Set your local user name and email to whatever is set on the remote repository.

	user_name=$(git log --pretty=format:"%an")
	user_email=$(git log --pretty=format:"%ae")

	git config user.name "$user_name"
	git config user.email "$user_email"