Skip to content

Instantly share code, notes, and snippets.

View lhl's full-sized avatar

Leonard lhl

View GitHub Profile
@lhl
lhl / ANALYSIS-gpus.md
Created October 24, 2025 06:56
Codex 5 High analysis of GGML CUDA paths

GGML CUDA/HIP Inference Paths and Precision by Architecture

This document summarizes how ggml’s CUDA/HIP backend executes inference on different GPU families, which code paths are used, and at what numeric precision the major compute happens. It also provides rough workload composition percentages to relate paths to each architecture’s FLOPS/TOPs.

References are to files under ggml/src/ggml-cuda unless noted.

  • Matmul (quantized): mmq.cu, mmq.cuh, vecdotq.cuh, quantize.cu/.cuh
  • Matmul (float): mmf.cu, mmvf.cu, cuBLAS/hipBLAS calls in ggml-cuda.cu
  • FlashAttention: fattn*.cu/.cuh
  • Softmax: softmax.cu
@lhl
lhl / power-usage.py
Created January 13, 2025 05:58
2025-01 vLLM/Llama 3.3 70B FP8 tokens/joule
# Power Usage Calculator for AI Workloads
'''
# Serving
$ vllm serve meta-llama/Llama-3.3-70B-Instruct --tensor-parallel-size 4 --num-scheduler-steps 20 --quantization=fp8 --gpu-memory-utilization=0.97
INFO 01-13 04:59:05 api_server.py:712] vLLM API server version 0.6.6.post2.dev5+g5ce4627a
# Benchmark - we do bs=64 to emulate https://arxiv.org/pdf/2310.03003
cmd = [
"python", os.path.expanduser("~/vllm/benchmarks/benchmark_serving.py"),
@lhl
lhl / HOWTO.md
Last active July 19, 2018 12:22
How to configure NGINX with LetsEncrypt using the simp_le client

How to configure NGINX with LetsEncrypt using the simp_le client.

this includes the nginx configs, as well as the auto renewal steps. I took a bunch of these steps from this blog, and adapted it to how I like.

simp_le issues three return codes depending on the status of the request.

  • 0 if certificate data was created or updated;
  • 1 if renewal not necessary;
  • 2 in case of errors.
! http://crunchbang.org/forums/viewtopic.php?id=5618
! Xft.dpi: 110
Xft.dpi: 96
Xft.autohint: 0
Xft.lcdfilter: lcddefault
Xft.hintstyle: hintfull
Xft.hinting: 1
Xft.antialias: 1
Xft.rgba: rgb
@lhl
lhl / DAMP.ahk
Last active August 29, 2015 14:10
AutoHotKey for DAI MP
/*
Dragon Age Inquisition Multiplayer Key Bindings
---
You should map WASD (from WQSE to movement).
These tweaks should make DAI easier to control.
What the script does:
* MB4 toggles RMB down/up (freelook)
* Caps lock toggles sprint

Keybase proof

I hereby claim:

  • I am lhl on github.
  • I am lhl (https://keybase.io/lhl) on keybase.
  • I have a public key whose fingerprint is 4DAB 5922 AD2C B6F2 780C CC2A CE9A 69D9 663F C373

To claim this, I am signing this object:

@lhl
lhl / script.rpy
Created August 6, 2013 00:14
You can simply copy this file into the Save the Date http://paperdino.com/games/save-the-date/ game folder and it'll overwrite the script.rpyc on the next open. The edited option still requires the I_AM_A_HACKER boolean but incorporates that as a valid choice instead of an FU.
init:
define f = Character('Felicia', color="#c8ffc8", show_two_window=False, image="felicia")
$ narrator = Character(None, color="#c8ffc8")
init:
image felicia happy = Image("art/f_happy.png")
image felicia sad = Image("art/f_sad.png")
image felicia angry = Image("art/f_angry.png")
image felicia pensive = Image("art/f_pensive.png")
image felicia surprised = Image("art/f_surprised.png")
image felicia suspicious = Image("art/f_suspicious.png")
@lhl
lhl / gist:4368260
Created December 24, 2012 07:48
Open a new window and load a page in Google Chrome w/ Applescript
tell application "Google Chrome"
set myWindow to make new window
set myTab to active tab of myWindow
set URL of myTab to "http://randomfoo.net/"
activate end tell
@lhl
lhl / gist:4238942
Created December 8, 2012 06:39
Get Recent Tweets for WordPress
function get_tweets($num=3) {
// Cached
if($tweets = get_transient('tweets')) {
return $tweets;
}
$url = 'http://api.twitter.com/1/statuses/user_timeline.json?include_entities=true&include_rts=true&screen_name=lhl&count=20&exclude_replies=true';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);