Skip to content

Instantly share code, notes, and snippets.

View nlile's full-sized avatar
🚀

nathan lile (@nlile) nlile

🚀
View GitHub Profile
@nlile
nlile / genrm.md
Created August 29, 2025 05:11
Generative Reward Models (GenRM) Research Summary

Generative Reward Models

Paper: https://www.synthlabs.ai/pdf/Generative_Reward_Models.pdf arXiv: https://arxiv.org/abs/2410.12832 Official SynthLabs blog post: https://www.synthlabs.ai/research/generative-reward-models Rentry: https://rentry.org/genrm

introduction

synthlabs proposes Generative Reward Models (GenRM): instead of training a separate scalar reward head (e.g., Bradley–Terry), they use an LLM itself as the reward model—prompted to generate a decision token (and optionally a chain of thought) that selects the preferred response. they introduce two variants: GenRM (direct classifier via an answer indicator) and CoT-GenRM (produce reasoning, then the indicator). trained with STaR-style bootstrapping and a DPO objective (STaR-DPO), the judge matches classical reward models in-distribution and generalizes better out-of-distribution, with the strongest OOD gains coming from the reasoning-based STaR-DPO setup. ([arXiv][1])

@nlile
nlile / link-ollama-models-to-lm-studio.py
Created August 7, 2025 06:24 — forked from YuriyGuts/link-ollama-models-to-lm-studio.py
Expose Ollama models to LM Studio by symlinking its model files. Just run `python3 link-ollama-models-to-lm-studio.py`. On Windows, run it as admin.
@nlile
nlile / README.md
Created July 17, 2025 09:41 — forked from disler/README.md
Prompt Chaining with QwQ, Qwen, o1-mini, Ollama, and LLM

Prompt Chaining with QwQ, Qwen, o1-mini, Ollama, and LLM

Here we explore prompt chaining with local reasoning models in combination with base models. With shockingly powerful local models like QwQ and Qwen, we can build some powerful prompt chains that let us tap into their capabilities in a immediately useful, local, private, AND free way.

Explore the idea of building prompt chains where the first is a powerful reasoning model that generates a response, and then use a base model to extract the response.

Play with the prompts and models to see what works best for your use cases. Use the o1 series to see how qwq compares.

Setup

  • Bun (to run bun run chain.ts ...)
@nlile
nlile / claude-code-system-prompts.js
Created March 6, 2025 03:46
Claude Code Agent System Prompts and Tool Definitions
// Claude Code is a Beta product per Anthropic's Commercial Terms of Service.
// By using Claude Code, you agree that all code acceptance or rejection decisions you make,
// and the associated conversations in context, constitute Feedback under Anthropic's Commercial Terms,
// and may be used to improve Anthropic's products, including training models.
// You are responsible for reviewing any code suggestions before use.
// (c) Anthropic PBC. All rights reserved. Use is subject to Anthropic's Commercial Terms of Service (https://www.anthropic.com/legal/commercial-terms).
// Version: 0.2.9
@nlile
nlile / WebSim Prompts.md
Last active July 19, 2024 02:47 — forked from hourianto/README.md
Current prompts for WebSim (as of July 13, 2024)

Current WebSim prompts and main context. System/User/Assistant blocks denote different roles in the messages array for the API requests. Stuff in {} is either a file that's too big to be inserted directly, or an explanation.

From what I can see, WebSim is mostly "carried" by Claude's creativity.

  • Main prompt: main_prompt.txt - also check main_flow.txt to see how a complete request is made.
  • Edit prompt: edit_prompt.txt- used when right-click editing the element. Uses the currently selected model. I didn't manage to get the whole prompt with the examples, but most of it at least.
  • Fake LLM API prompt: api_prompt.txt - currently WebSim always uses Claude 3.5 Sonnet for this (from info on Discord).
  • Image rewriting prompt: image_gen_prompt.txt - also uses Claude (don't know what model). Not sure what image model is being used, probably some version SDXL (like SDXL Turbo and similar)

The temperature used is 1, at least for Claude.

@nlile
nlile / anthropic claude artifacts full system prompt.txt
Created July 9, 2024 22:02
Full system prompt for Anthropic's Claude sonnet-3.5 artifacts
<artifacts_info>
The assistant can create and reference artifacts during conversations. Artifacts are for substantial, self-contained content that users might modify or reuse, displayed in a separate UI window for clarity.
Good artifacts are...
Substantial content (>15 lines)
Content that the user is likely to modify, iterate on, or take ownership of
Self-contained, complex content that can be understood on its own, without context from the conversation
Content intended for eventual use outside the conversation (e.g., reports, emails, presentations)
Content likely to be referenced or reused multiple times
@nlile
nlile / gist:a8a6ea925b7f4872a4491361adcb0dfd
Last active July 6, 2024 23:27
Class with Dynamic Properties and ST
from pydantic import BaseModel
from typing import Any, Dict
class ST(BaseModel):
value: str
class DynamicPropertyClass:
def __init__(self, **kwargs):
for key, value in kwargs.items():
setattr(self, f"_{key}", ST(value=str(value)))
@nlile
nlile / llm_samplers_explained.md
Created May 29, 2024 12:48 — forked from kalomaze/llm_samplers_explained.md
LLM Samplers Explained

LLM Samplers Explained

Everytime a large language model makes predictions, all of the thousands of tokens in the vocabulary are assigned some degree of probability, from almost 0%, to almost 100%. There are different ways you can decide to choose from those predictions. This process is known as "sampling", and there are various strategies you can use which I will cover here.

OpenAI Samplers

Temperature

  • Temperature is a way to control the overall confidence of the model's scores (the logits). What this means is that, if you use a lower value than 1.0, the relative distance between the tokens will become larger (more deterministic), and if you use a larger value than 1.0, the relative distance between the tokens becomes smaller (less deterministic).
  • 1.0 Temperature is the original distribution that the model was trained to optimize for, since the scores remain the same.
  • Graph demonstration with voiceover: https://files.catbox.moe/6ht56x.mp4
@nlile
nlile / transcribe.sh
Last active October 27, 2023 21:44
Transcribe YouTube Playlist with whisper.cpp & yt-dlp
#!/bin/bash
# This script downloads audio from a YouTube playlist, resamples the audio (if necessary), transcribes it using whisper, and saves the transcription.
# The audio files are saved in the `~/mp3` directory, and transcriptions are saved in the `~/transcripts` directory.
# If the `--remove-audio` flag is set, the audio files are not saved.
# Usage: ./script_name.sh --playlist-url [YouTube playlist URL] [--remove-audio]
# Paths for whisper and model
whisper_path="$HOME/whisper.cpp/main"
model_path="$HOME/whisper.cpp/models/ggml-medium.en.bin"

Stevey's Google Platforms Rant

I was at Amazon for about six and a half years, and now I've been at Google for that long. One thing that struck me immediately about the two companies -- an impression that has been reinforced almost daily -- is that Amazon does everything wrong, and Google does everything right. Sure, it's a sweeping generalization, but a surprisingly accurate one. It's pretty crazy. There are probably a hundred or even two hundred different ways you can compare the two companies, and Google is superior in all but three of them, if I recall correctly. I actually did a spreadsheet at one point but Legal wouldn't let me show it to anyone, even though recruiting loved it.

I mean, just to give you a very brief taste: Amazon's recruiting process is fundamentally flawed by having teams hire for themselves, so their hiring bar is incredibly inconsistent across teams, despite various efforts they've made to level it out. And their operations are a mess; they don't real