nathan lile (@nlile) nlile

Prompt Chaining with QwQ, Qwen, o1-mini, Ollama, and LLM

Here we explore prompt chaining with local reasoning models in combination with base models. With shockingly powerful local models like QwQ and Qwen, we can build some powerful prompt chains that let us tap into their capabilities in a immediately useful, local, private, AND free way.

Explore the idea of building prompt chains where the first is a powerful reasoning model that generates a response, and then use a base model to extract the response.

Play with the prompts and models to see what works best for your use cases. Use the o1 series to see how qwq compares.

Setup

Bun (to run bun run chain.ts ...)

Current WebSim prompts and main context. System/User/Assistant blocks denote different roles in the messages array for the API requests. Stuff in {} is either a file that's too big to be inserted directly, or an explanation.

From what I can see, WebSim is mostly "carried" by Claude's creativity.

Main prompt: main_prompt.txt - also check main_flow.txt to see how a complete request is made.
Edit prompt: edit_prompt.txt- used when right-click editing the element. Uses the currently selected model. I didn't manage to get the whole prompt with the examples, but most of it at least.
Fake LLM API prompt: api_prompt.txt - currently WebSim always uses Claude 3.5 Sonnet for this (from info on Discord).
Image rewriting prompt: image_gen_prompt.txt - also uses Claude (don't know what model). Not sure what image model is being used, probably some version SDXL (like SDXL Turbo and similar)

The temperature used is 1, at least for Claude.

LLM Samplers Explained

Everytime a large language model makes predictions, all of the thousands of tokens in the vocabulary are assigned some degree of probability, from almost 0%, to almost 100%. There are different ways you can decide to choose from those predictions. This process is known as "sampling", and there are various strategies you can use which I will cover here.

OpenAI Samplers

Temperature

Temperature is a way to control the overall confidence of the model's scores (the logits). What this means is that, if you use a lower value than 1.0, the relative distance between the tokens will become larger (more deterministic), and if you use a larger value than 1.0, the relative distance between the tokens becomes smaller (less deterministic).
1.0 Temperature is the original distribution that the model was trained to optimize for, since the scores remain the same.
Graph demonstration with voiceover: https://files.catbox.moe/6ht56x.mp4

Stevey's Google Platforms Rant

I was at Amazon for about six and a half years, and now I've been at Google for that long. One thing that struck me immediately about the two companies -- an impression that has been reinforced almost daily -- is that Amazon does everything wrong, and Google does everything right. Sure, it's a sweeping generalization, but a surprisingly accurate one. It's pretty crazy. There are probably a hundred or even two hundred different ways you can compare the two companies, and Google is superior in all but three of them, if I recall correctly. I actually did a spreadsheet at one point but Legal wouldn't let me show it to anyone, even though recruiting loved it.

I mean, just to give you a very brief taste: Amazon's recruiting process is fundamentally flawed by having teams hire for themselves, so their hiring bar is incredibly inconsistent across teams, despite various efforts they've made to level it out. And their operations are a mess; they don't real

	#!/usr/bin/env python3
	"""
	Expose Ollama models to LM Studio by symlinking its model files.
	NOTE: On Windows, you need to run this script with administrator privileges.
	"""

	import json
	import os
	from pathlib import Path