Skip to content

Instantly share code, notes, and snippets.

View pszemraj's full-sized avatar

Peter pszemraj

View GitHub Profile
@pszemraj
pszemraj / vid_dedupe_gve.py
Created November 6, 2025 01:18
GVE + SimSIMD video deduplication CLI via https://gzn00417.github.io/GVE/
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Apache-2.0
GVE + SimSIMD video deduplication CLI via https://gzn00417.github.io/GVE/
Design highlights
- Embeddings: GVE (Qwen2.5-VL based) last-token pooled + ℓ2-normalized (bf16/float16), per paper/model card.
- Test-time policy: 8 frames baseline, scale with duration (16/32/48) for long videos; ~200 visual tokens per frame.
@pszemraj
pszemraj / llama.cpp-issue.md
Created November 5, 2025 00:40
issue with llama.cpp server (multimodal) lfm2-vl

Llama.cpp Multimodal Crash (general issue write-up)

Tue 04 Nov 2025 07:37:48 PM EST, commit a5c07dc

description

  • Symptom: llama-server exits with GGML_ASSERT(!slot.prompt.tokens.has_mtmd) inside server_context::update_slots() after a few multimodal requests.
  • Repro: launch any vision-capable GGUF (e.g. -VL-) with default slot reuse (--slot-prompt-similarity 0.10), hit /v1/chat/completions twice using OpenAI-format payloads that include image_url parts (base64 data URIs). The second call often reuses a slot whose has_mtmd flag is still set, triggering the assert and a core dump.
  • Flags already tried: disabling similarity (--slot-prompt-similarity 0.0), restoring checkpoints (--ctx-checkpoints 8), toggling continuous batching. Crash still occurs on current master.
  • Logs: daemon sees "connection closed before message completed," server backtrace ends in ggml_abort server_context::update_slots.
@pszemraj
pszemraj / inference_example_lfm2vl_3b.py
Last active November 5, 2025 04:51
inference with 3b
"""
example script for inference with LFM2-VL-3B model
https://hf.co/LiquidAI/LFM2-VL-3B
"""
from transformers import AutoModelForImageTextToText, AutoProcessor
from transformers.image_utils import load_image
# Load model and processor
@pszemraj
pszemraj / encoding_visualizer.py
Last active October 1, 2025 13:21
helper scripts for tokenizer encoding viz
import argparse
import webbrowser
from pathlib import Path
from typing import Any, Callable, Optional, Union
from tokenizers import Tokenizer as RustTokenizer
from tokenizers.tools import EncodingVisualizer
from transformers import AutoTokenizer, PreTrainedTokenizerBase
SAMPLE_TEXT = '''class DyT(nn.Module):
@pszemraj
pszemraj / clipboard_helper_xclip.sh
Created September 25, 2025 02:09
cz() two letter clipboard helper for linux/xclip
# Copy file contents or stdin to clipboard
# Usage: cz [file]
# cz file.txt - copy file to clipboard
# cmd | cz - copy stdin to clipboard
# Fails on: binary files, files >10MB, non-existent files
cz() {
if [ -z "$1" ]; then
xclip -selection clipboard
elif [ -f "$1" ]; then
# Check if it's a text file
@pszemraj
pszemraj / check_md_links.py
Last active September 20, 2025 00:21
check through .md files a repo [directory] and subdirs for broken/nonexistent links to files
@pszemraj
pszemraj / wsl_clipboard.md
Created September 16, 2025 15:35
CLI/'programmatically' copy to clipboard from wsl

copy to clipboard from wsl

An adapted fn of what I use on ubuntu but leverages WSL exposing clip.exe to get stuff on your clipboard

# Copy file contents or stdin to clipboard
# Usage: cz [file]
#   cz file.txt  - copy file to clipboard
#   cmd | cz     - copy stdin to clipboard
# Fails on: binary files, files >10MB, non-existent files
@pszemraj
pszemraj / clock.html
Last active September 12, 2025 04:37
simple html clock
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<title>Clock AI</title>
<style>
:root {
--bg: #0a0b0d;
--panel: #141518;
@pszemraj
pszemraj / lfm_1b6.py
Created September 8, 2025 07:59
LFM2-VL inference with recommended params
from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image
# Load model and processor
model_id = "LiquidAI/LFM2-VL-1.6B"
model = AutoModelForImageTextToText.from_pretrained(
model_id, device_map="auto", torch_dtype="bfloat16", trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)