Skip to content

Instantly share code, notes, and snippets.

@qunash
qunash / grpo_qwen-0-5b_single_t4.ipynb
Last active October 15, 2025 03:21
grpo_qwen-0-5b_single_t4.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ChrisHayduk
ChrisHayduk / merge_qlora_with_quantized_model.py
Last active September 27, 2025 08:22
Merging QLoRA weights with quantized model
"""
The code below combines approaches published by both @eugene-yh and @jinyongyoo on Github.
Thanks for the contributions guys!
"""
import torch
import peft
@mjpost
mjpost / trim_fairseq_model.py
Created May 15, 2020 14:37
Removes ADAM optimizer state from fairseq models, greatly reducing their size
#!/usr/bin/env python3
"""
This is code to take a trained Fairseq model and discard the ADAM optimizer state,
which is not needed at test time. It can reduce a model size by ~70%.
Original author: Brian Thompson
"""
from fairseq import checkpoint_utils
@yuchenlin
yuchenlin / masked_word_prediction_bert.py
Last active November 5, 2024 14:23
A simple example script for predicting masked words in a sentence using BERT.
import torch
from transformers import BertTokenizer, BertModel, BertForMaskedLM
import logging
logging.basicConfig(level=logging.INFO)# OPTIONAL
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.eval()

CLI script for extracting plain text out of a raw Wikipedia dump. This is a xml.bz2 file provided by MediaWiki and looks like wiki--pages-articles.xml.bz2 or wiki-latest-pages-articles.xml.bz2 (e.g. 14 GB: https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2).

It streams through all the XML articles using multiple cores (#cores - 1, by default), decompressing on the fly and extracting plain text article sections from each article.

For each extracted article, it prints its title, section names and plaintext section contents, in json-line format.

Examples

bash

@erickrf
erickrf / tokenizer.py
Last active March 5, 2023 05:12
Portuguese tokenizer
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from nltk.tokenize import RegexpTokenizer
import argparse
import os
"""
Script for tokenizing Portuguese text according to the Universal Dependencies
(UD) tokenization standards. This script was not created by the UD team; it was