Minoru Mizutani mmizutani

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Pre-Transformer Models

This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.

It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.

requirements

steps

# follow `ruby.wasm` tutorial
curl -LO https://github.com/ruby/ruby.wasm/releases/latest/download/ruby-head-wasm32-unknown-wasi-full.tar.gz

	/**
	* My Portable RAG
	* $ pnpm add sqlite-vec @ai-sdk/google ai
	* SQLite Vector Search + Google AI Embeddings
	*
	* Required environment variables:
	* GOOGLE_GENERATIVE_AI_API_KEY=your-api-key
	*
	* Usage:
	* # Index text content

	export const requireComment = {
	meta: {
	type: "suggestion",
	docs: {
	description: "useEffectにはコメントでの説明が必須です。",
	},
	schema: [],
	messages: {
	requireCommentOnUseEffect: `useEffectにはコメントでの説明が必須です。

	"""
	A minimal, fast example generating text with Llama 3.1 in MLX.

	To run, install the requirements:

	pip install -U mlx transformers fire

	Then generate text with:

	python l3min.py "How tall is K2?"

	"""QA Chatbot streaming using FastAPI, LangChain Expression Language , OpenAI, and Chroma.

	Features
	--------
	- Persistent Chat Memory:
	Stores chat history in a local file.
	- Persistent Vector Store:
	Stores document embeddings in a local vector store.
	- Standalone Question Generation:
	Rephrases follow-up questions to standalone questions in their original language.

	#!/bin/bash
	SCRIPTNAME=$(basename "$0")

	function realpath () {
	f=$@;
	if [ -d "$f" ]; then
	base="";
	dir="$f";
	else
	base="/$(basename "$f")";

	import { Client } from '../../../src/middleware/client'
	import type { AppType } from './server'

	const client = new Client<AppType>('http://127.0.0.1:8787/api')

	const res = await client.json('/posts', {
	id: 123,
	title: 'hello',
	})

	const isUseEffect = (node) => node.callee.name === 'useEffect';

	const argumentIsArrowFunction = (node) => node.arguments[0].type === 'ArrowFunctionExpression';

	const effectBodyIsSingleFunction = (node) => {
	const { body } = node.arguments[0];

	// It's a single unwrapped function call:
	// `useEffect(() => theNameOfAFunction(), []);`
	if (body.type === 'CallExpression') {