Skip to content

Instantly share code, notes, and snippets.

@ubergarm
Created August 1, 2025 23:41
Show Gist options
  • Select an option

  • Save ubergarm/ee121af8c7974d05d4df12ccd35e529b to your computer and use it in GitHub Desktop.

Select an option

Save ubergarm/ee121af8c7974d05d4df12ccd35e529b to your computer and use it in GitHub Desktop.

Revisions

  1. ubergarm created this gist Aug 1, 2025.
    65 changes: 65 additions & 0 deletions testing-llamacpp-glm4.5.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,65 @@
    gotta go play some games now but quick test:

    ```bash
    $ cd llama.cpp
    $ git remote -v | grep sam
    sammcj [email protected]:sammcj/llama.cpp.git (fetch)
    sammcj [email protected]:sammcj/llama.cpp.git (push)
    $ git checkout glm-4-5
    $ git rev-parse --short HEAD
    3d15c4a94
    # compile cpu only
    $ ./build/bin/llama-server --version
    version: 6038 (3d15c4a94)
    built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
    # test
    #!/usr/bin/env bash

    #ulimit -n 9999

    model=/mnt/data/models/Thireus/GLM-4.5-THIREUS-BF16-SPECIAL_SPLIT/GLM-4.5-Thireus-Q8_0.gguf

    numactl -N 0 -m 0 \
    ./build/bin/llama-server \
    --model "$model"\
    --alias ubergarm/GLM-4.5-Q8_0 \
    --ctx-size 196608 \
    -fa \
    -ctk q8_0 -ctv q8_0 \
    --parallel 1 \
    --threads 128 \
    --threads-batch 192 \
    --numa numactl \
    --host 127.0.0.1 \
    --port 8080 \
    --no-mmap

    print_info: model type = 355B.A32B
    print_info: model params = 358.34 B
    print_info: general.name = GLM 4.5
    print_info: vocab type = BPE
    print_info: n_vocab = 151552
    print_info: n_merges = 318088
    print_info: BOS token = 151329 '<|endoftext|>'
    print_info: EOS token = 151329 '<|endoftext|>'
    print_info: EOT token = 151336 '<|user|>'
    print_info: UNK token = 151329 '<|endoftext|>'
    print_info: PAD token = 151329 '<|endoftext|>'
    print_info: LF token = 198 'Ċ'
    print_info: EOG token = 151329 '<|endoftext|>'
    print_info: EOG token = 151336 '<|user|>'
    print_info: max token length = 1024
    load_tensors: loading model tensors, this can take a while... (mmap = false)
    model has unused tensor blk.92.eh_proj (size = 209715200 bytes) -- ignoring
    model has unused tensor blk.92.embed_tokens (size = 3103784960 bytes) -- ignoring
    model has unused tensor blk.92.enorm (size = 20480 bytes) -- ignoring
    model has unused tensor blk.92.hnorm (size = 20480 bytes) -- ignoring
    model has unused tensor blk.92.shared_head.head (size = 3103784960 bytes) -- ignoring
    model has unused tensor blk.92.shared_head.norm (size = 20480 bytes) -- ignoring
    llama_model_load: error loading model: missing tensor 'blk.3.exp_probs_b'
    llama_model_load_from_file_impl: failed to load model
    common_init_from_params: failed to load model '/mnt/data/models/Thireus/GLM-4.5-THIREUS-BF16-SPECIAL_SPLIT/GLM-4.5-Thireus-Q8_0.gguf'
    srv load_model: failed to load model, '/mnt/data/models/Thireus/GLM-4.5-THIREUS-BF16-SPECIAL_SPLIT/GLM-4.5-Thireus-Q8_0.gguf'
    srv operator(): operator(): cleaning up before exit...
    main: exiting due to model loading error
    ```