Skip to content

Instantly share code, notes, and snippets.

@rasdani
rasdani / agent loop
Created March 10, 2025 11:00 — forked from jlia0/agent loop
Manus tools and prompts
You are Manus, an AI agent created by the Manus team.
You excel at the following tasks:
1. Information gathering, fact-checking, and documentation
2. Data processing, analysis, and visualization
3. Writing multi-chapter articles and in-depth research reports
4. Creating websites, applications, and tools
5. Using programming to solve various problems beyond development
6. Various tasks that can be accomplished using computers and the internet
@rasdani
rasdani / gist:77339643c8c4151e4a947e77dcb1ad29
Created March 2, 2025 16:56 — forked from kalomaze/gist:37c70e022cb1e9428ebb1ee7a4b52275
GRPO Reinforcement Learning - 7b GSM8k on 8xH100 / 8xA100
# the "verifiers" repository is a clean implementation of templated GRPO reinforcement learning training environments
# this is a generic set of "install from scratch" commands complete with a deepspeed z3 config that i have been using when i spin up nodes
# it will run on the gsm8k example w/ default batch size & generation size (8), and the 8th GPU is used for vllm generations
# qwen 14b full finetuning will run on this configuration too without LoRA or CUDA OOM, at least for the gsm8k task's context sizes + generation lengths
# hyperparameters are controlled by `verifiers/utils/config_utils.py`; i have been preferring extreme grad clipping (between 0.001 and 0.01) and low beta (under 0.01)
# NOTE FEB 27: examples have moved into `verifiers/examples` not `/examples`
cd /root
mkdir boom
@rasdani
rasdani / grpo_demo.py
Created February 25, 2025 22:52 — forked from willccbb/grpo_demo.py
GRPO Llama-1B
# train_grpo.py
#
# See https://github.com/willccbb/verifiers for ongoing developments
#
import re
import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig
from trl import GRPOConfig, GRPOTrainer
@rasdani
rasdani / train.py
Created February 25, 2025 22:52 — forked from abacaj/train.py
extending GRPOTrainer to run gsm8k eval during training
import tqdm
import numpy as np
import torch
import torch.distributed as dist
import transformers
def extract_xml_answer(text: str) -> str:
answer = text.split("<final_answer>")[-1]
answer = answer.split("</final_answer>")[0]
return answer.strip()
docker run --gpus all --runtime=nvidia -it --shm-size="10g" --cap-add=SYS_ADMIN -v $PWD:/workspace -v $HOME/.cache/huggingface:/root/.cache/huggingface nvcr.io/nvidia/pytorch:24.07-py3 bash
@rasdani
rasdani / anon_git.sh
Last active July 27, 2023 12:28
Set your local user name and email to whatever is set on the remote repository.
user_name=$(git log --pretty=format:"%an")
user_email=$(git log --pretty=format:"%ae")
git config user.name "$user_name"
git config user.email "$user_email"