marcinantkiewicz · July 31, 2025 21:10 · Jul 31, 2025 · Jul 29, 2025 · Jul 29, 2025 · Jul 29, 2025
diff --git a/model sources b/model sources
@@ -0,0 +1,5 @@
+Hugging Face
+- set your Local Apps in https://huggingface.co/settings/local-apps#local-apps
+- find model repo, `Use This Model` button, select your local app from the dropbox and the quantization. 
+  - the different vaulue signify loss from the decreased precision of the weights, [good overview](https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-1579252501). For tl;dr and if GPU-poor, start with Q4_K.
+  - at first stick to the official sources, `GGUF` or `safetensors`. Pytorch (.pt/.pth) are serialied python datastructures, the deserialization process is fragile if the contents are not 100% trustworthy.
diff --git a/making it useful.md b/making it useful.md
@@ -0,0 +1,8 @@
+- chat interface - [https://github.com/open-webui/open-webui](https://github.com/open-webui/open-webui) 
+  - this allows chat history to be recorded 
+  - and will consume API keys to commercial inference providers
+  - for commerical inference I like openrouter, it is cheap to test >70Bn models I cannot usably run at home, for $0.01c-$2/day (2025)
+- CLI interface - nothing beats [llm](https://github.com/taketwo/llm-ollama), this is CLI tool in the best unix tradition, it is modular and just pleasant to use
+  - this will produce description of a photo `$ llm -m  moondream:latest -a /space/phonepics/iphone8/YARU7264.JPG`
+  - for remote access set `OLLAMA_HOST=$ip` to point llm to the API, it can be any openai compatible API (hosted locally via ollama or through openrouter)
+- there are better tools than ollama to host models as actual services, with tight control over parallelism, batching, where are what tensors hosted, but I did not play with that yet.
diff --git a/basic ollama care and handling b/basic ollama care and handling
@@ -23,4 +23,4 @@ $ docker exec -it ollama ollama ps
 $ docker exec -it ollama ollama show --modelfile dengcao/ERNIE-4.5-21B-A3B-PT > ERNIE.modelfile
 
 # copy the file into the container and create the new entry (smame model but new config) 
-$ ollama create dengcao/ERNIE-4.5-21B-A3B-PT -f /app/ollama/modelfiles/ERNIE-16 
+$ docker exec -it ollama ollama create dengcao/ERNIE-4.5-21B-A3B-PT -f /app/ollama/modelfiles/ERNIE-16 
diff --git a/basic ollama care and handling b/basic ollama care and handling
@@ -0,0 +1,26 @@
+# docker needs the container toolkit to be able to make nvidia drivers available in the containers and probably more.
+# - you will need nvidia drivers too. https://github.com/NVIDIA/nvidia-container-toolkit
+# - model directory will need some IOPS to load them, dedicated NVME is both fast and naturally limits the sprawl
+# - in GPU stats you will see both (G)raphics and (C)ompute jobs. LLM-related tooling only controls the C jobs.
+
+# -- once Ollama container is running
+#
+# this should produce help output
+$ docker exec -it ollama ollama
+
+# ollama.com hosts some of the models, so this nicely works
+# ex: https://ollama.com/dengcao/ERNIE-4.5-21B-A3B-PT
+$ docker exec -it ollama ollama pull dengcao/ERNIE-4.5-21B-A3B-PT:latest
+
+# will show which models are loaded into memory, balance between layers loaded into gpu and cpu
+# also check out nvtop 
+$ docker exec -it ollama ollama ps
+
+# you can create custom configs for the models, set parameters such as number of layers in GPU by editing the default one
+# to set number of layers in GPU, you either `/set parameter num_gpu 16` in the interactive interface or set it in the 
+# modelfile as `PARAMETER  num_gpu 16`. Note - this should be called `count_layers_in_gpu` the name is too generic. 
+# `num_gpu 0` disables gpu for the model
+$ docker exec -it ollama ollama show --modelfile dengcao/ERNIE-4.5-21B-A3B-PT > ERNIE.modelfile
+
+# copy the file into the container and create the new entry (smame model but new config) 
+$ ollama create dengcao/ERNIE-4.5-21B-A3B-PT -f /app/ollama/modelfiles/ERNIE-16 
diff --git a/docker-ollama.service b/docker-ollama.service
@@ -0,0 +1,13 @@
+[Unit]
+Description=Ollama Docker Container
+Requires=docker.service
+After=docker.service
+
+[Service]
+Restart=always
+User=user
+ExecStart=docker run --rm --name ollama --gpus=all -v /space/ollama:/root/.ollama -p 0.0.0.0:11434:11434 -e OLLAMA_DEBUG=1 ollama/ollama
+ExecStop=/usr/bin/docker stop ollama
+
+[Install]
+WantedBy=multi-user.target
No results found