Skip to content

Instantly share code, notes, and snippets.

View alvinjamur's full-sized avatar

Al Vinjamur alvinjamur

View GitHub Profile
@alvinjamur
alvinjamur / grpo_demo.py
Created February 9, 2025 01:04 — forked from willccbb/grpo_demo.py
GRPO Llama-1B
# train_grpo.py
import re
import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig
from trl import GRPOConfig, GRPOTrainer
# Load and prep dataset
@alvinjamur
alvinjamur / pg-pong.py
Created January 3, 2017 22:54 — forked from karpathy/pg-pong.py
Training a Neural Network ATARI Pong agent with Policy Gradients from raw pixels
""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
import numpy as np
import cPickle as pickle
import gym
# hyperparameters
H = 200 # number of hidden layer neurons
batch_size = 10 # every how many episodes to do a param update?
learning_rate = 1e-4
gamma = 0.99 # discount factor for reward
@alvinjamur
alvinjamur / readme.md
Created November 20, 2016 00:16 — forked from flammit/readme.md
Building mxnet with CUDA 8 GPU support in R - Windows Instructions (Nov 19, 2016)

Building mxnet with CUDA 8 GPU support in R - Windows Instructions (Nov 19, 2016)

The goal here is to build mxnet.io on Windows. This was tested on Windows 10. Overall, the procedure is to compile mxnet to generate an up-to-date libmxnet.dll. Once we have that, we will build and install the R package containing all dependent libraries, including the newly-built libmxnet.dll.

This guide was adapted from the primary setup documentation http://mxnet.io/get_started/setup.html which seems out-of-date.

Dependencies

  • Visual Studio 2013 (tested with Community Edition)
  • cmake - from here