yiliu30’s gists

yiliu30 / ctypes-nvrtc.py

Created September 19, 2025 08:53 — forked from malfet/ctypes-nvrtc.py

	import ctypes
	import torch
	import time

	def nvrtc_compile(source: str) -> str:
	from ctypes import CDLL, c_void_p, c_char_p, c_size_t, byref, create_string_buffer
	libnvrtc = CDLL('libnvrtc.so')
	def get_error_string() -> str:
	err_p = c_char_p()
	libnvrtc.nvrtcGetErrorString(result, byref(err_str))

yiliu30 / run_logs_test_gpt_oss_4gpu.txt

Created September 6, 2025 09:13

	Run 1:
	Auto-configed device: cuda
	WARNING:sglang.srt.server_args:Detected SM100 and MXFP4 quantization format for GPT-OSS model, enabling FlashInfer MXFP4 MOE kernel.
	WARNING:sglang.srt.server_args:TensorRT-LLM MHA only supports page_size of 16, 32 or 64, changing page_size from None to 64.
	[2025-09-06 08:26:09] server_args=ServerArgs(model_path='/home/yiliu7/models/openai/gpt-oss-120b', tokenizer_path='/home/yiliu7/models/openai/gpt-oss-120b', tokenizer_mode='auto', tokenizer_worker_num=1, skip_tokenizer_init=False, load_format='auto', model_loader_extra_config='{}', trust_remote_code=False, context_length=None, is_embedding=False, enable_multimodal=None, revision=None, model_impl='auto', host='127.0.0.1', port=8400, skip_server_warmup=False, warmups=None, nccl_port=None, dtype='bfloat16', quantization=None, quantization_param_path=None, kv_cache_dtype='auto', mem_fraction_static=0.93, max_running_requests=None, max_queued_requests=9223372036854775807, max_total_tokens=None, chunked_prefill_size=16384, max_p

yiliu30 / test_ds_v31.md

Created August 27, 2025 08:55

skip sdpa

{
    "mode": "QUANTIZE",
    "observer": "maxabs",
    "scale_method": "ACT_MAXABS_HW_WEIGHTS_PCS_MAXABS_POW2",
    "scale_format": "const",
    "allowlist": {
        "types": [],
        "names": []

yiliu30 / bench_packing.py

Created August 25, 2025 03:20


	from triton.testing import do_bench
	import torch

	from test_packing import _create_random_e2m1_tensor, pack_fp4_to_uint8_old

	from auto_round.export.export_to_autoround.qlinear_fp import FLOAT_TO_E2M1, pack_fp4_to_uint8

yiliu30 / share_hpu_graph.py

Created May 22, 2025 08:21

	"""
	------------------------------------------------------------------------------
	out shape: torch.Size([4096, 7168])
	out shape: torch.Size([4096, 7168])
	out shape: torch.Size([8192, 7168])
	out shape: torch.Size([8192, 7168])
	out shape: torch.Size([16384, 7168])
	out shape: torch.Size([16384, 7168])
	out shape: torch.Size([4096, 7168])
	out shape: torch.Size([4096, 7168])

yiliu30 / unify_ep16_to_ep8.py

Created May 8, 2025 03:06

	from dataclasses import dataclass
	from typing import List, Dict
	import json


	@dataclass
	class MoeOpInfo:
	num_inputs: int = 0
	num_outputs: int = 0

yiliu30 / prompt_logprobs.layer25__mnt_disk2_hf_models_DeepSeek-R1-G2__2025-04-22_10-27-57.json

Last active April 23, 2025 01:10

	// INC
	[
	{
	"0": {
	"logprob": 0.0,
	"rank": 1,
	"decoded_token": ""
	},
	"113689": {
	"logprob": -18.8125,

yiliu30 / post_process_measure.py

Created April 3, 2025 13:48

	###############################################################################
	# Copyright (C) 2024 Habana Labs, Ltd. an Intel Company
	###############################################################################
	import argparse
	import json
	import os
	import sys

	import numpy as np

yiliu30 / lm_eval_gsm8k_questions.txt

Created February 19, 2025 11:38

This file has been truncated, but you can view the full file.

Question: Jen and Tyler are gymnasts practicing flips. Jen is practicing the triple-flip while Tyler is practicing the double-flip. Jen did sixteen triple-flips during practice. Tyler flipped in the air half the number of times Jen did. How many double-flips did Tyler do?\nAnswer: Jen did 16 triple-flips, so she did 16 * 3 = <<16*3=48>>48 flips.\nTyler did half the number of flips, so he did 48 / 2 = <<48/2=24>>24 flips.\nA double flip has two flips, so Tyler did 24 / 2 = <<24/2=12>>12 double-flips.\n#### 12\n\nQuestion: Four people in a law firm are planning a party. Mary will buy a platter of pasta for $20 and a loaf of bread for $2. Elle and Andrea will split the cost for buying 4 cans of soda which cost $1.50 each, and chicken wings for $10. Joe will buy a cake that costs $5. How much more will Mary spend than the rest of the firm put together?\nAnswer: Mary will spend $20 + $2 = $<<20+2=22>>22.\nElle and Andrea will spend $1.5 x 4 = $<<1.5*4=6>>6 for the soda.\nElle and Andrea will spend $6 + $10 = $<<6+

yiliu30 / _quant_block_without_wrap_lr_as_tensor_log

Created November 13, 2024 06:04

This file has been truncated, but you can view the full file.

	Warning, examples/language-modeling/main.py is deprecated, please use auto-round cmd line instead. The file will be deleted in the V0.4.1 release
	/models/Llama-2-7b-chat-hf
	2024-11-13 00:49:05 INFO utils.py L494: Using GPU device

	Loading checkpoint shards: 0%\| \| 0/2 [00:00<?, ?it/s]
	Loading checkpoint shards: 50%\|█████ \| 1/2 [00:01<00:01, 1.61s/it]
	Loading checkpoint shards: 100%\|██████████\| 2/2 [00:02<00:00, 1.04s/it]
	Loading checkpoint shards: 100%\|██████████\| 2/2 [00:02<00:00, 1.12s/it]
	2024-11-13 00:49:10 INFO autoround.py L218: using torch.float16 for quantization tuning
	2024-11-13 00:49:10 INFO autoround.py L286: start calibration

Yi Liu yiliu30