Peter pszemraj

Llama.cpp Multimodal Crash (general issue write-up)

Tue 04 Nov 2025 07:37:48 PM EST, commit a5c07dc

description

Symptom: llama-server exits with GGML_ASSERT(!slot.prompt.tokens.has_mtmd) inside server_context::update_slots() after a few multimodal requests.
Repro: launch any vision-capable GGUF (e.g. -VL-) with default slot reuse (--slot-prompt-similarity 0.10), hit /v1/chat/completions twice using OpenAI-format payloads that include image_url parts (base64 data URIs). The second call often reuses a slot whose has_mtmd flag is still set, triggering the assert and a core dump.
Flags already tried: disabling similarity (--slot-prompt-similarity 0.0), restoring checkpoints (--ctx-checkpoints 8), toggling continuous batching. Crash still occurs on current master.
Logs: daemon sees "connection closed before message completed," server backtrace ends in ggml_abort server_context::update_slots.

Write the Paper First

URL Source: http://www.cs.jhu.edu/~jason/advice/write-the-paper-first.html

M- Write the Paper First - by Jason Eisner (2010)

Write the Paper First - by Jason Eisner (2010)
Writing is the best use of limited time

copy to clipboard from wsl

An adapted fn of what I use on ubuntu but leverages WSL exposing clip.exe to get stuff on your clipboard

# Copy file contents or stdin to clipboard
# Usage: cz [file]
#   cz file.txt  - copy file to clipboard
#   cmd | cz     - copy stdin to clipboard
# Fails on: binary files, files &gt;10MB, non-existent files

	#!/usr/bin/env python3
	# -- coding: utf-8 --
	"""
	Apache-2.0

	GVE + SimSIMD video deduplication CLI via https://gzn00417.github.io/GVE/

	Design highlights
	- Embeddings: GVE (Qwen2.5-VL based) last-token pooled + ℓ2-normalized (bf16/float16), per paper/model card.
	- Test-time policy: 8 frames baseline, scale with duration (16/32/48) for long videos; ~200 visual tokens per frame.

	"""
	example script for inference with LFM2-VL-3B model

	https://hf.co/LiquidAI/LFM2-VL-3B
	"""

	from transformers import AutoModelForImageTextToText, AutoProcessor
	from transformers.image_utils import load_image

	# Load model and processor

	import argparse
	import webbrowser
	from pathlib import Path
	from typing import Any, Callable, Optional, Union

	from tokenizers import Tokenizer as RustTokenizer
	from tokenizers.tools import EncodingVisualizer
	from transformers import AutoTokenizer, PreTrainedTokenizerBase

	SAMPLE_TEXT = '''class DyT(nn.Module):

	# Copy file contents or stdin to clipboard
	# Usage: cz [file]
	# cz file.txt - copy file to clipboard
	# cmd \| cz - copy stdin to clipboard
	# Fails on: binary files, files >10MB, non-existent files
	cz() {
	if [ -z "$1" ]; then
	xclip -selection clipboard
	elif [ -f "$1" ]; then
	# Check if it's a text file

	#!/usr/bin/env python3
	"""
	Markdown broken link checker - properly distinguishes between URLs and file paths.
	"""

	import re
	import sys
	import argparse
	from pathlib import Path
	from typing import Dict, List, Tuple, Optional, Set, Iterator

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="utf-8" />
	<meta name="viewport" content="width=device-width,initial-scale=1" />
	<title>Clock AI</title>
	<style>
	:root {
	--bg: #0a0b0d;
	--panel: #141518;

	from transformers import AutoProcessor, AutoModelForImageTextToText
	from transformers.image_utils import load_image

	# Load model and processor
	model_id = "LiquidAI/LFM2-VL-1.6B"
	model = AutoModelForImageTextToText.from_pretrained(
	model_id, device_map="auto", torch_dtype="bfloat16", trust_remote_code=True
	)
	processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)