Last active
October 18, 2025 21:31
-
-
Save steipete/c5afc8fedbd742210f4f05e540448fee to your computer and use it in GitHub Desktop.
Revisions
-
steipete revised this gist
Oct 7, 2025 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,5 @@ build swagent swift cli and test it yourself until it works. use this development key for testing with openai API: <enter key> --- -
steipete revised this gist
Oct 7, 2025 . 1 changed file with 2 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,4 +1,5 @@ build swagent swift cli and test it yourself until it works. use this development key for testing with openai API: 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 --- -
steipete revised this gist
Sep 30, 2025 . No changes.There are no files selected for viewing
-
steipete revised this gist
Sep 30, 2025 . 1 changed file with 267 additions and 261 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,33 +1,24 @@ Below is a clean, workshop‑ready guide for **swagent**, split into three parts as requested. --- ## 1) Docs — the exact contract (Responses API, tools, streaming, chaining) **Endpoints** * Create/continue a response: `POST https://api.openai.com/v1/responses` Headers: `Authorization: Bearer $OPENAI_API_KEY`, `Content-Type: application/json`. ([OpenAI Platform][1]) **Core request fields** * `model`: `"gpt-5-codex"`. * `instructions`: your system rules (string). Re‑send them on **every** turn. * `input`: string **or** an array of **items** (e.g., user message, function call outputs). * `store: true` if you’ll chain turns later with `previous_response_id`. ([OpenAI Platform][1]) **Tools (function calling)** * Send tools as **top‑level** objects in `tools` with this shape: ```json { @@ -38,328 +29,343 @@ Content-Type: application/json "type": "object", "properties": { "command": { "type": "string" }, "cwd": { "type": "string" } }, "required": ["command"] } } ``` * You can let the model choose with `"tool_choice": "auto"`. ([OpenAI Platform][2]) **Function‑call loop (no `tool_outputs` param)** 1. First call: model may return **items** of `type: "function_call"` in `output` with `call_id`, `name`, and `arguments` (JSON string). 2. Run the tool locally. 3. Continue the run by calling `POST /v1/responses` **again** with: * `previous_response_id`: the prior response `id` * `instructions`: the same system rules * `input`: an **array of items**, each ```json { "type": "function_call_output", "call_id": "<same id>", "output": "<stringified JSON like { stdout, stderr, exitCode }>" } ``` This is how you return tool results. Don’t send a top‑level `tool_outputs` field. ([OpenAI Platform][3]) **Streaming (SSE)** * Set `"stream": true` to get **Server‑Sent Events** while the model is thinking. You’ll receive events such as: * `response.created` (start) * `response.output_text.delta` (text chunks) * `response.function_call.delta` (incremental function args) * `response.completed` (final object, includes `usage`) Handle errors via `response.error`. ([OpenAI Platform][4]) **Usage & token counters** * Use `usage.input_tokens`, `usage.output_tokens`, `usage.total_tokens` (snake_case) to print per‑turn stats. These arrive on the final response (or `response.completed` in streaming). ([OpenAI Platform][5]) **Conversation state** * To continue a chat **without** resending past text, set `previous_response_id` and re‑send your `instructions`. You may also pass prior output items explicitly if you need. ([OpenAI Platform][6]) **Progress signal taxonomy (what to show in the CLI)** * Before the first output token: **“🧠 thinking…”** (spinner) once you receive `response.created`. * While streaming text: live print each `response.output_text.delta`. * When the model starts a tool: **“🔧 run_bash …”** as soon as you see `response.function_call.delta` / the final `function_call` item. * While executing the tool: **“⏳ running command…”** until you post the `function_call_output` and the model resumes. * On finalization: **“✅ done”** once `response.completed` arrives, then print the footer with `usage`. ([OpenAI Platform][4]) --- ## 2) Full instructions — build the whole CLI in one pass **Project:** `swagent` **Language/Tooling:** Swift 6.2 with `SwiftSetting.defaultIsolation(MainActor.self)` enabled via SPM; dependencies: `swift-argument-parser`, `apple/swift-configuration`; built‑in **Swift Testing** (Xcode 26); add **swift-format** and **SwiftLint**. **Model/API:** OpenAI **Responses API**, `model: gpt-5-codex`, streaming **on**. ([OpenAI Platform][1]) **Startup UX** * Print **2–3 cheeky lines** (random) and the **masked API key** (first 3 + last 4). * Examples: * “🎩 I code therefore I am.” * “⚡ One prompt. One shot. Make it count.” * “🔧 Small diffs, big wins.” * “🧪 If it compiles, we ship. Mostly.” * “🐚 Bashful? I’m not.” **Flags** * `-v, --verbose` — extra logs (HTTP status, timings). * `--version` — print version. * `-p <prompt>` — one‑shot user interaction; **internally** the agent may loop via tools until `finish` or it needs info. * `--yolo` — auto‑approve all shell commands (no interactive Y/n). * `--session <uuid>` — load a persisted session. **Commands** * `/new` or `/clear` — reset conversation state. * `/status` — show masked key, token totals this session, **estimated** remaining context. * `/exit` — quit; print: *“To resume this session, call `swagent --session <uuid>`.”* **System prompt (embed verbatim in `instructions` every turn)** > **You are swagent**, a coding agent for terminal workflows. > **Runtime:** **macOS 26 or later**. > **Mission:** Build, run, and refine code + shell workflows; verify your work. > **Behavior:** > > * Think step‑by‑step; prefer small diffs and working patches. > * When you propose commands, **call `run_bash`** to execute them; **never** ask the user to confirm (the CLI handles approvals). > * If the runtime says **yolo=true**, treat commands as pre‑approved and run immediately. > * If **yolo=false** and a command is destructive/ambiguous, call `request_more_info(question)` once; otherwise, just `run_bash`. > * When done, call `finish(summary)` with a concise summary + next steps. > * Keep output terminal‑friendly and concise; never print secrets. > **Tools:** > > 1. `run_bash(command: string, cwd?: string)` → returns `{stdout, stderr, exitCode}`. > 2. `request_more_info(question: string)` > 3. `finish(summary: string)` > **Responses API rules:** > > * Use `model: gpt-5-codex`. > * Re‑send these instructions every turn. > * Chain with `previous_response_id`. > * Tools are top‑level `{ type:'function', name, description, parameters }`. > * Tool calls arrive as `output` items of `type:'function_call'` with a `call_id`. **Return results** by continuing with `previous_response_id` and sending `input: [{ "type":"function_call_output", "call_id":"<same>", "output":"<stringified JSON>" }]`. > * Read `usage.input_tokens`, `usage.output_tokens`, `usage.total_tokens` for per‑turn stats. > **[swagent runtime]** > `yolo=true|false` • `verbose=true|false` • `session=<uuid>` • `cwd=<path>` ([OpenAI Platform][2]) **Runtime header** Append the `[swagent runtime]` block above to `instructions` every turn (so the agent knows about `yolo`, etc.). ([OpenAI Platform][6]) **Tooling & policies** * **Bash tool**: Implement `run_bash(command, cwd?)` via `bash -lc`. By default, prompt `Run? [Y/n]` (Enter=Yes). With `--yolo`, auto‑approve. Return `{stdout, stderr, exitCode}` (JSON), but **stringify** it before sending as `function_call_output`. * **Ask‑for‑info tool**: `request_more_info(question)` prints the question and waits for a one‑line user reply; forward that as the next turn’s user message (you can co‑send alongside tool outputs in `input`). * **Finish tool**: `finish(summary)` prints the summary and ends the current action (stay in REPL unless in `-p` mode). * **Self‑testing**: After code changes, the agent **must** call `run_bash` to run `swift build` (and `swift test` if tests exist), and also self‑invoke the CLI (`swift run swagent …`) to verify flags. **Streaming & progress** * Always set `"stream": true` when calling `/v1/responses`. Show: * **Thinking spinner** after `response.created` until first `response.output_text.delta`. * **Live text streaming** by writing each `delta` chunk immediately. * **Tool call progress** when you see a `function_call` (or its deltas): print the command preview; switch to **“⏳ running…”** while executing; resume streaming once you send `function_call_output`. * **Footer** on `response.completed` using `usage.*` and a monotonic timer. Event names and flow: see Responses streaming & Realtime guides. ([OpenAI Platform][4]) **Sessions** * Persist under `~/.swagent/<uuid>.json` via an `actor`. * Save: `previous_response_id`, chain of response ids, per‑session token totals, timestamps. * `--session <uuid>` loads and continues from file. **Config** * Use `swift-configuration` to read `OPENAI_API_KEY` from the environment; mask it as `sk‑abc…wxyz` on startup. ([OpenAI Platform][2]) **Testing, format, lint** * Use **Swift Testing** (built‑in with Xcode 26) for unit tests. * Add `swift-format` + `SwiftLint` targets/scripts. **Security** * Never echo secrets. * Treat dangerous commands conservatively when `yolo=false` (use `request_more_info`). **Minimal JSON crib sheet (copy/paste)** *Create (turn 1, with tools & streaming):* ```json { "model": "gpt-5-codex", "instructions": "<SYSTEM PROMPT + [swagent runtime]>", "input": "Create a Swift package and build it.", "tools": [ { "type":"function","name":"run_bash","description":"Run bash","parameters":{ "type":"object","properties":{"command":{"type":"string"},"cwd":{"type":"string"}}, "required":["command"] }}, { "type":"function","name":"request_more_info","parameters":{ "type":"object","properties":{"question":{"type":"string"}},"required":["question"] }}, { "type":"function","name":"finish","parameters":{ "type":"object","properties":{"summary":{"type":"string"}},"required":["summary"] }} ], "tool_choice": "auto", "store": true, "stream": true } ``` *Continue (turn 2, return tool result):* ```json { "model": "gpt-5-codex", "instructions": "<SYSTEM PROMPT + [swagent runtime]>", "previous_response_id": "resp_123", "input": [ { "type": "function_call_output", "call_id": "call_abc", "output": "{\"stdout\":\"initialized…\",\"stderr\":\"\",\"exitCode\":0}" } ], "stream": true } ``` Docs: Responses create, streaming events, migration guide (function_call_output), usage counters, conversation state. ([OpenAI Platform][1]) --- ## 3) Step‑by‑step — 5 tiny stages (each 7–12 minutes), with streaming & checks ### Stage 1 — Minimal one‑shot + streaming **Build** * SPM executable target; enable `SwiftSetting.defaultIsolation(MainActor.self)` in `swiftSettings`. * Deps: `swift-argument-parser`, `swift-configuration`. * Implement a single **Responses** call with `"stream": true`; stream `response.output_text.delta` to stdout. * Startup prints **2–3 cheeky lines** + masked key. * Flags: `--version`, `-v`. **Checks** * `swagent --version` → prints version only. * `swagent -v "Ping"` → shows cheeky lines, masked key, streams text live, then footer `(in: X, out: Y, total: Z, 0m 00s)` from `usage`. Streaming/usage: see docs. ([OpenAI Platform][4]) * No key → clear single‑line error. --- ### Stage 2 — Sticky chat (REPL), `-p` one‑shot, runtime header **Build** * Interactive REPL; keep `-p` for one‑shot. * Maintain state via `previous_response_id` + `store:true`. * Always re‑send `instructions` and attach a `[swagent runtime]` header with `yolo`, `verbose`, `session`, `cwd`. **Checks** * Second user turn uses the first turn’s `previous_response_id` (verify in logs if `-v`). * `/new` clears state (next call has no `previous_response_id`). * Streaming remains active in both REPL and `-p`. Chaining: see conversation state docs. ([OpenAI Platform][6]) --- ### Stage 3 — Agent signals (finish / request_more_info), loop via `function_call_output` **Build** * Add two tools: * `finish(summary: string)` * `request_more_info(question: string)` * Implement the function‑call loop: * Parse any `function_call` items. * For `request_more_info`, print the question and wait for input; continue by sending a user message item in `input` (you can send it alongside any `function_call_output` items). * For `finish`, print the summary and stop the action. **Checks** * Prompt: “Ask me one clarifying question, then summarize and finish.” → Model calls `request_more_info` → collects answer → model calls `finish` → summary printed + footer. * Confirm there’s **no** top‑level `tool_outputs`; only `input` items with `type:"function_call_output"` on continuations. ([OpenAI Platform][3]) --- ### Stage 4 — Bash tool (guardrails), self‑testing, yolo awareness **Build** * Add `run_bash(command, cwd?)`: * Default approval: `Run? [Y/n]` (Enter=Yes). * `--yolo`: auto‑approve. * Execute via `bash -lc`; capture `{stdout, stderr, exitCode}`; **stringify** as the `output` field in `function_call_output`. * **System prompt** and runtime header explicitly say: agent **never** asks for permission; `yolo=true` means pre‑approved. * After code changes, agent **must** self‑test: `swift build`, optional `swift test`, then `swift run swagent …`. **Checks** * `swagent --yolo -p "Echo hello"` → model calls `run_bash("echo hello")` immediately (no extra prompt), CLI runs, continuation sends `function_call_output`, finalizes with a reply + footer. * `swagent -p "Echo hello"` (non‑yolo) → agent still **does not** ask; CLI prompts Y/n; run completes. * Tool loop uses `previous_response_id` + `input` items, streaming on. ([OpenAI Platform][4]) --- ### Stage 5 — Sessions, `/status`, tests, format/lint **Build** * Persist sessions under `~/.swagent/<uuid>.json` using an `actor`. * `/status` prints: masked key; per‑session token totals; **estimated** context left (model limit minus running total). * On exit: *“To resume this session, call `swagent --session <uuid>`.”* * Tests with **Swift Testing** for: * Arg parsing (`-v`, `--version`, `-p`, `--yolo`, `--session`). * Session store save/load roundtrip (concurrent writes protected by actor). * Tool approval logic (Y/n default vs `--yolo`). * Add `swift-format` and `SwiftLint` targets (`make fmt`, `make lint`, `make check`). **Checks** * Two turns, then `/status` shows totals; `/exit` persists a JSON containing the latest `previous_response_id`, cumulative `usage`, timestamps. * `--session <uuid>` resumes and continues chaining. * `make check` runs format, lint, and tests cleanly. --- ### Minimal streaming cURL (for the slides) ```bash curl https://api.openai.com/v1/responses \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -N \ -d '{ "model": "gpt-5-codex", "instructions": "…system prompt…", "input": "Say hello, slowly.", "stream": true }' # Expect SSE events like: response.created, response.output_text.delta, response.completed ``` SSE event names and flow: Responses streaming docs (plus Realtime guide for event taxonomy). ([OpenAI Platform][4]) --- **References** * Responses API — create & tools. ([OpenAI Platform][1]) * Streaming — SSE events for Responses. ([OpenAI Platform][4]) * Conversation state — `previous_response_id`. ([OpenAI Platform][6]) * Migration guide — `function_call_output` items. ([OpenAI Platform][3]) * Usage counters (snake_case). ([OpenAI Platform][5]) Want a tiny Swift snippet that shows parsing SSE lines and switching the UI between “🧠 thinking…”, streaming text, and tool execution? [1]: https://platform.openai.com/docs/api-reference/responses/create?utm_cta=website-homepage-industry-card-media&utm_source=chatgpt.com "API Reference" [2]: https://platform.openai.com/docs/api-reference/responses "OpenAI Platform" [3]: https://platform.openai.com/docs/guides/migrate-to-responses "OpenAI Platform" [4]: https://platform.openai.com/docs/api-reference/responses-streaming "OpenAI Platform" [5]: https://platform.openai.com/docs/api-reference/usage "OpenAI Platform" [6]: https://platform.openai.com/docs/guides/conversation-state "OpenAI Platform" -
steipete revised this gist
Sep 30, 2025 . 1 changed file with 269 additions and 224 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,320 +1,365 @@ Nice—this feedback is gold. I baked **every missing specificity** into your stage prompts and the **one‑go build** so the agent has zero room to guess. Below you’ll find: * a **Responses API “contract”** (exact payloads, shapes, key casing) * a tighter **system prompt** (macOS 26, self‑testing, behavior when unsure) * **stage acceptance checks** that force the model to actually **call `run_bash`** and verify * an updated **one‑go build brief** with the full system prompt + a copy‑paste **crib sheet** All references point to the official docs so agents don’t revert to legacy Chat Completions. ([OpenAI Platform][1]) --- ## 🔒 Hardened API contract (copy/paste into your spec) **Endpoint** ```http POST https://api.openai.com/v1/responses Authorization: Bearer $OPENAI_API_KEY Content-Type: application/json ``` **Required fields & casing** * `model` (e.g., `"gpt-5-codex"`) * `instructions` (system‑like rules, string) * `input` (the user turn, string or array of items—string is fine here) * `store: true` if you plan to chain with `previous_response_id` * **Tools (function calling)** go in `tools` as **top‑level** objects: ```json { "type": "function", "name": "run_bash", "description": "Run a bash command and return stdout, stderr, exitCode.", "parameters": { "type": "object", "properties": { "command": { "type": "string" }, "cwd": { "type": "string" } }, "required": ["command"] } } ``` > **Do not** use the old nested `function: { name, ... }` shape from Assistants. Responses uses **top‑level** `name/description/parameters`. ([OpenAI Platform][2]) **Conversation state** * To continue a conversation **without** resending the whole transcript, pass `previous_response_id` on the next call. * **Important:** `instructions` are **not carried over** with `previous_response_id`; **re‑send** your system prompt each turn. ([OpenAI Platform][3]) **Usage metrics (footer numbers)** * Current casing: `usage.input_tokens`, `usage.output_tokens`, `usage.total_tokens` (snake_case). Use these for the per‑turn stats footer. ([OpenAI Platform][4]) **Output structure (what you’ll parse)** * `output` is an **array of items**. Expect **assistant messages** and possible **function calls**: * Assistant text lives in a message item’s `content` as `{ "type": "output_text", "text": "..." }`. ([OpenAI Platform][3]) * Function calls appear as items of type **`function_call`** with: ```json { "type": "function_call", "call_id": "call_…", "name": "run_bash", "arguments": "{\"command\":\"swift build\"}" } ``` `arguments` is a **JSON string**. ([OpenAI Platform][5]) * To **return tool results**, create a follow‑up `responses.create` with: * the same `model` * `previous_response_id: "<id from the prior response>"` * `tool_outputs: [ { "call_id": "<call id>", "output": "<stringified JSON like { stdout, stderr, exitCode }>" } ]` * This yields a new response; continue until the model produces a final message or asks for more info. ([OpenAI Platform][6]) **Minimal end‑to‑end example** *Request (turn 1):* ```json { "model": "gpt-5-codex", "instructions": "…swagent system rules…", "input": "Init a Swift package and build it.", "tools": [ { "type":"function", "name":"run_bash", "description":"Run bash", "parameters":{ "type":"object","properties":{ "command":{"type":"string"}, "cwd":{"type":"string"} },"required":["command"] }}, { "type":"function","name":"request_more_info", "parameters":{"type":"object","properties":{"question":{"type":"string"}},"required":["question"]}}, { "type":"function","name":"finish", "parameters":{"type":"object","properties":{"summary":{"type":"string"}},"required":["summary"]}} ], "tool_choice": "auto", "store": true } ``` *Response (turn 1 → includes a function call):* ```json { "id": "resp_123", "output": [ { "type": "message", "role": "assistant", "content": [ { "type": "output_text", "text": "Creating the package, then building…" } ] }, { "type": "function_call", "call_id": "call_abc", "name": "run_bash", "arguments": "{\"command\":\"swift package init --type executable\"}" } ], "usage": { "input_tokens": 395, "output_tokens": 57, "total_tokens": 452 } } ``` *Request (turn 2 → return the tool output):* ```json { "model": "gpt-5-codex", "instructions": "…swagent system rules…", "previous_response_id": "resp_123", "tool_outputs": [ { "call_id": "call_abc", "output": "{\"stdout\":\"initialized…\",\"stderr\":\"\",\"exitCode\":0}" } ] } ``` *Response (turn 2 → may include another function call or final text):* ```json { "id": "resp_124", "output": [ { "type":"function_call", "call_id":"call_def", "name":"run_bash", "arguments":"{\"command\":\"swift build\"}" } ], "usage": { "input_tokens": 117, "output_tokens": 12, "total_tokens": 129 } } ``` …and so on until the model emits a message (with `output_text`) or calls `finish`. ([OpenAI Platform][1]) **Key deltas vs Chat Completions (so agents don’t regress)** * `input` + `instructions` instead of `messages` * tool calls are **items** (type `function_call`) with `call_id` + `arguments` string * tool **results** are sent via `tool_outputs` (not “tool messages”) * conversation uses `previous_response_id` (not an array of past messages) Docs walk through each difference. ([OpenAI Platform][5]) --- ## 🧠 Behavior policy (no guesswork) * **Default stance:** if confident, **answer & act**; if not, call `request_more_info(question)` with one precise question. * **Self‑testing is mandatory:** after proposing any command, **call `run_bash`** (post‑approval / `--yolo`) and verify results. If a step fails, iterate until fixed or ask for info. * **Live calls please:** run the **real** API end‑to‑end (we’ll provide an `OPENAI_API_KEY`). No stubs during the workshop. Function‑calling & tool loop details: see Function Calling + Responses guides. ([OpenAI Platform][7]) --- ## 🧾 System prompt (final, paste verbatim) > **You are swagent**, a coding agent for terminal workflows. > **Runtime:** **macOS 26 or later**. > **Mission:** Build, run, and refine code + shell workflows; verify your work. > **Behavior:** > > * Think step‑by‑step; prefer small diffs and working patches. > * When you propose commands, you **must** call `run_bash` to execute them (after user approval) and confirm results. > * If blocked, call `request_more_info(question)` with one precise, answerable question. > * When done, call `finish(summary)` with a concise summary + next steps. > * Don’t exfiltrate secrets; avoid destructive commands unless asked. > * Output stays terminal‑friendly and concise. > **Tools:** > > 1. `run_bash(command: string, cwd?: string)` → return `{stdout, stderr, exitCode}`. > 2. `request_more_info(question: string)` > 3. `finish(summary: string)` > **API rules (Responses API):** > > * Use `model: gpt-5-codex`. > * Re‑send these instructions every turn. > * Chain turns with `previous_response_id`. > * Tools are defined with top‑level `name/description/parameters` (JSON Schema). > * Tool calls arrive as `function_call` items with a `call_id`; return results using `tool_outputs` with the **same** `call_id`. > * Read `usage.input_tokens`, `usage.output_tokens`, `usage.total_tokens` for stats. ([OpenAI Platform][1]) --- ## 🧪 Stage plan (only the deltas that changed, with **self‑test** baked in) ### Stage 1 — minimal one‑shot * Startup: print **2–3 cheeky lines** + masked key (`sk‑abc…wxyz`). * One call to Responses; print reply + footer `(in/out/total, time)`. * **Checks (must pass)** * `swagent -v "ping"` → shows cheeky lines, masked key, HTTP 200 log, model text, usage footer. * Missing key → single‑line error. * **Usage footer uses snake_case fields** from `usage`. ([OpenAI Platform][4]) ### Stage 2 — sticky chat (`-p` still one‑shot) * Re‑send system prompt every turn; chain with `previous_response_id`. * **Checks** * REPL: second turn uses the first turn’s `previous_response_id`. * `/new` clears chain; next call has **no** `previous_response_id`. ([OpenAI Platform][8]) ### Stage 3 — agent signals * Add `finish(summary)` and `request_more_info(question)`; implement tool loop. * **Checks** * Prompt: “Ask me one clarifying question, then summarize and finish.” → Model calls `request_more_info` → you answer → model calls `finish` → summary printed + stats. * Verify: each function call got a **matching** `tool_outputs` entry with the same `call_id`. ([OpenAI Platform][5]) ### Stage 4 — bash tool + guardrails * Tool: `run_bash(command, cwd?)`; Y/n prompt (Enter = Yes); `--yolo` auto‑approves. * Inherit env so the agent can run `swift build` and `swift run` with your real key available. * **Checks (self‑test required)** * `swagent -p "Create hello.sh, chmod +x, run it"` → agent actually **calls `run_bash`** and shows `stdout: hello`. * `swagent --yolo -p "swift --version"` → auto‑runs, returns output, then `finish`. * Each tool call returned a **tool output** with the same `call_id`. ([OpenAI Platform][7]) ### Stage 5 — sessions + `/status` + tests * Persist `~/.swagent/<uuid>.json` (actor, async file I/O). * `/status`: masked key; `usage` totals; **estimated** remaining context. * `--session <uuid>` resumes; on exit: “To resume this session, call `swagent --session <uuid>`.” * Tests with **Swift Testing** (assume built‑in in your Xcode 26 setup). * **Checks** * Two turns → `/status` shows snake_case usage fields; `/exit` writes JSON with `previous_response_id`. * `--session <uuid>` resumes and chains from file’s `previous_response_id`. ([OpenAI Platform][3]) --- ## 🚀 One‑go build brief (give this to the model) > **Project:** `swagent` > **Env:** Swift 6.2 (SPM `.defaultIsolation(MainActor.self)`), macOS host > **Deps:** `swift-argument-parser`, `apple/swift-configuration`, built‑in **Swift Testing**, plus **swift-format** and **SwiftLint** > **System prompt:** *(paste the “System prompt (final)” above verbatim)* > **Startup UX:** print 2–3 cheeky lines (random) + masked API key (first 3 + last 4). > **Flags:** `-v/--verbose`, `--version`, `-p <prompt>` (one‑shot UI; internal tool loop), `--yolo`, `--session <uuid>`. > **Chat & sessions:** interactive REPL with `/new` `/clear` `/status` `/exit`; chain with `previous_response_id` + `store:true`; **always include** the system prompt each turn; persist under `~/.swagent/<uuid>.json`. > **Tools (function calling):** > > * `run_bash(command, cwd?)` → prompt Y/n unless `--yolo`; execute with `bash -lc`; return `{stdout, stderr, exitCode}` (stringified JSON) via `tool_outputs` bound to the **same `call_id`**. > * `request_more_info(question)` → ask the user, then continue. > * `finish(summary)` → end and print summary. > **Per‑turn stats:** print `(in: N, out: M, total: T tokens, 0m 00s)` from `usage`. > **Self‑test (mandatory):** > > 1. After generating code, **call `run_bash`** to run `swift build`. > 2. If tests exist, **call `run_bash`** to run `swift test`. > 3. **Call `run_bash`** to run: > > * `swift run swagent --version` > * `swift run swagent -p "Echo hello"` > * `swift run swagent --yolo -p "Create hello.sh and run it"` > 4. On any failure, inspect `stderr`, fix, and retry. > **Responses API (use exactly this shape):** tools are **top‑level** (`type/name/description/parameters`); tool calls arrive as `function_call` items with `call_id` + `arguments` (JSON string); tool results go in `tool_outputs`; usage fields are `input_tokens`, `output_tokens`, `total_tokens`. Use `previous_response_id` for chaining and **re‑send** the system prompt each turn. > **Docs:** API Reference (Responses), Function Calling, Conversation State / previous_response_id, streaming/output_text, migration notes. ([OpenAI Platform][1]) --- ## 📎 Responses API crib sheet (drop right into your repo/readme) **Request (with tools):** ```json { "model": "gpt-5-codex", "instructions": "<SYSTEM PROMPT HERE>", "input": "<USER INPUT HERE>", "tools": [ { "type":"function","name":"run_bash","description":"Run bash","parameters":{ "type":"object","properties":{"command":{"type":"string"},"cwd":{"type":"string"}}, "required":["command"] }}, { "type":"function","name":"request_more_info","parameters":{ "type":"object","properties":{"question":{"type":"string"}},"required":["question"] }}, { "type":"function","name":"finish","parameters":{ "type":"object","properties":{"summary":{"type":"string"}},"required":["summary"] }} ], "tool_choice": "auto", "store": true } ``` **Response (snippet):** ```json { "id": "resp_abc", "output": [ { "type":"message", "role":"assistant", "content":[{ "type":"output_text", "text":"…" }] }, { "type":"function_call", "call_id":"call_123", "name":"run_bash", "arguments":"{\"command\":\"swift build\"}" } ], "usage": { "input_tokens": 123, "output_tokens": 45, "total_tokens": 168 } } ``` **Continue with tool output:** ```json { "model": "gpt-5-codex", "instructions": "<SYSTEM PROMPT HERE>", "previous_response_id": "resp_abc", "tool_outputs": [ { "call_id":"call_123", "output":"{\"stdout\":\"…\",\"stderr\":\"\",\"exitCode\":0}" } ] } ``` Refs for each bit: Responses API reference, `previous_response_id`, function‑calling loop, `output_text`, `usage` fields. ([OpenAI Platform][1]) --- ## 💬 Startup cheeky lines (pool) * “🎩 I code therefore I am.” * “⚡ One prompt. One shot. Make it count.” * “🧪 If it compiles, we ship. Mostly.” * “🐚 Bashful? I’m not.” * “🔧 Small diffs, big wins.” --- Want me to fold these exact blocks into your Stage 1–5 “paste‑to‑build” prompts so you can run the workshop straight from slides? [1]: https://platform.openai.com/docs/api-reference/responses "OpenAI Platform" [2]: https://platform.openai.com/docs/quickstart?utm_source=chatgpt.com "Developer quickstart - OpenAI API" [3]: https://platform.openai.com/docs/api-reference/responses?utm_source=chatgpt.com "API Reference" [4]: https://platform.openai.com/docs/guides/structured-outputs?utm_source=chatgpt.com "Structured model outputs - OpenAI API" [5]: https://platform.openai.com/docs/guides/migrate-to-responses?utm_source=chatgpt.com "Migrate to the Responses API" [6]: https://platform.openai.com/docs/guides/function-calling?utm_source=chatgpt.com "Function calling - OpenAI API" [7]: https://platform.openai.com/docs/guides/tools?utm_source=chatgpt.com "Using tools - OpenAI API" [8]: https://platform.openai.com/docs/guides/text?utm_source=chatgpt.com "Text generation - OpenAI API" -
steipete created this gist
Sep 30, 2025 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,320 @@ Perfect—let’s tune the plan so it’s workshop‑friendly, shows **a few cheeky lines** on startup, **doesn’t print compiler flags**, keeps **macOS 26** only in the **system prompt**, and **expands** the checks so the model + humans both have more to chew on. --- ## What changed (quick) * Startup prints **multiple cheeky lines** + masked API key; **no mention of Swift 6.2 or MainActor** in the output. * **System prompt** now explicitly contains **“Runtime: macOS 26 or later”** and is more detailed (pasted in full below). * **Stage acceptance checks** beefed up with explicit I/O, example transcripts, and file expectations. * **All‑in‑one (one‑go) build prompt** now embeds the **full system prompt**. > Build knobs you still apply in code: **Swift 6.2** + **default MainActor isolation** via SPM `SwiftSetting.defaultIsolation(MainActor.self)`; conversation chaining via **Responses API** with `previous_response_id`; **function tools** for finish/ask‑for‑info; and a **bash tool** with Y/n gating or `--yolo`. ([Swift.org][1]) --- ## Stage plan (5 steps, updated + expanded checks) ### 1) “Hello, swagent” — minimal one‑shot **Build scope** * Swift 6.2; set default isolation at module level in **SPM**: ```swift // swift-tools-version: 6.2 // ... .executableTarget( name: "swagent", // ... swiftSettings: [ .defaultIsolation(MainActor.self) // SwiftPM 6.2 ] ) ``` *Docs:* Swift 6.2 main‑actor default option; SPM `defaultIsolation`. ([Swift.org][1]) * Deps: `swift-argument-parser` (CLI), `swift-configuration` (reads `OPENAI_API_KEY` from env). ([Apple GitHub][2]) * Call **OpenAI Responses API** (`model: gpt-5-codex`) once; print the reply. Show token usage from `usage` + elapsed time. ([OpenAI Platform][3]) **Runtime UX** * On launch, print **2–3 cheeky lines** (randomly sampled) + masked API key (`sk‑abc…def0`). * Flags: `--version`, `-v` (verbose HTTP codes + timings). **Cheeky lines pool (example)** ``` • 🎩 “I code therefore I am. Hit me.” • 🧰 “Tabs, spaces, or chaos? Your call.” • ⚡ “One prompt. One shot. Make it count.” • 🧪 “If it compiles, we ship. Kidding. Mostly.” • 🐚 “Bashful? I’m not.” ``` **Expanded checks** * **Env**: with key → shows masked key; without → prints clear error about missing `OPENAI_API_KEY` (no stacktrace). * **CLI**: * `swagent --version` → semantic version line only. * `swagent -v "What’s 2+2?"` → prints cheeky intro (2–3 lines), masked key, HTTP 200 in verbose log, model text, and a footer `(in: X, out: Y, total: Z tokens, 0m 01s)`. * **Failure paths**: network error surfaces as a single‑line diagnostic in `-v` mode; non‑`-v` shows brief “request failed (HTTP NNN)”. References for API, tokens, and arg parsing. ([OpenAI Platform][3]) --- ### 2) “Sticky chat” — interactive REPL + one‑shot `-p` **Build scope** * Add REPL (loop until `/exit`). * Keep one‑shot via `-p "…"`. * Maintain conversation by passing **`previous_response_id`** each turn. **Always resend your system prompt** each request (instructions aren’t auto‑carried). ([OpenAI Platform][4]) **Expanded checks** * **REPL basics**: * Start `swagent` → cheeky lines + masked key → prompt `›`. * Type `Hello` → model replies; prints per‑turn `(tokens, time)`. * **Commands**: * `/new` (alias `/clear`) → response chain reset; next call has **no** `previous_response_id`. * `/exit` → exits the process. * `/status` (preview, wired in Stage 5) → prints “not persisted yet” message in Stage 2. * **One‑shot**: `swagent -p "Summarize rust vs swift"` → one response, stats footer, exit. * **State**: verify the second turn includes **`previous_response_id`** of the first. (You’ll see longer `in:` tokens owing to chaining.) ([OpenAI Platform][4]) --- ### 3) “Agent signals” — finish / ask‑for‑info tools **Build scope** * Add **two function tools** (Responses API `tools`): * `finish(summary: string)` * `request_more_info(question: string)` * Implement tool‑call loop: when a tool is called, send its **tool output** back (bound to the exact `tool_call_id`), then continue. ([OpenAI Platform][5]) **Tool JSON schemas (sketch)** ```json { "type": "function", "name": "finish", "description": "Signal task completion with a short summary and next steps.", "parameters": { "type": "object", "properties": { "summary": { "type": "string" } }, "required": ["summary"] } } ``` ```json { "type": "function", "name": "request_more_info", "description": "Ask the user for missing information to proceed.", "parameters": { "type": "object", "properties": { "question": { "type": "string" } }, "required": ["question"] } } ``` **Expanded checks** * Prompt: “Draft a minimal README, ask me one clarifying question, then finish.” * Model calls `request_more_info` → CLI prints the question and waits for user input → you answer → model continues → model calls `finish` with a summary → CLI prints summary + stats and returns to prompt (REPL) or exits (`-p`). * **Verify**: every assistant tool call is followed by a matching **tool output** before continuing (this is required by function‑calling semantics). If you skip it, you’ll hit tool‑output errors. ([OpenAI Platform][5]) --- ### 4) “Run commands” — bash tool with guardrails **Build scope** * Add `run_bash(command: string, cwd?: string)` tool: * On invocation, print the proposed command and ask **`Run? [Y/n]`** (Enter = Yes). * `--yolo` auto‑approves. * Execute with `bash -lc "<command>"`; capture `stdout`, `stderr`, `exitCode`; return as tool output. * `-p` mode: still **one‑shot to the user**, but the **agent may loop internally** across tools until it calls `finish` or `request_more_info`. **Expanded checks** * `swagent -p "Create hello.sh that prints hello, make it executable, run it"` * Shows planned commands, asks Y/n, runs, shows `stdout: hello`. * On failure (non‑zero exit), model sees `exitCode!=0` + `stderr` and retries. * `swagent --yolo -p "swift --version"` → executes without prompt; output returned to model; prints final message + stats. * **Audit**: ensure each tool call’s **tool output** uses the **same `tool_call_id`** field the model provided (Responses API requirement). ([OpenAI Platform][5]) --- ### 5) “Sessions, polish, tests” — persistence + status + lint/format **Build scope** * Persist sessions under `~/.swagent/<uuid>.json` via an **actor** (async file I/O). * `/status` prints: * masked API key, * totals for **input/output/total tokens** this session, * **estimated context remaining** (based on model context limit and running total). * `--session <uuid>` resumes a saved chain. * On exit: print *“To resume this session, call `swagent --session <uuid>`.”* * Tests with **Swift Testing** (Xcode 26 includes it). ([GitHub][6]) * Formatting/linting: **swift-format** + **SwiftLint** targets or scripts. ([GitHub][7]) **Expanded checks** * **Persistence**: * Start chat, send two turns → exit → check `~/.swagent/<uuid>.json` exists and includes: latest `response_id`, cumulative `usage`, timestamps. * `swagent --session <uuid>` → prints greeting + “session loaded” note, then the REPL prompt. Next turn continues with `previous_response_id` from file. ([OpenAI Platform][4]) * **/status** shows something like: ``` Session: 3B83C2A2-…-8F API key: sk-abc…f789 Tokens used: in=1,820 out=980 total=2,800 Context headroom (est.): ~170k tokens left ``` (Token counts from **Responses API `usage`**, context estimate uses model limit minus running input/output tokens; show it as an estimate.) ([OpenAI Platform][8]) * **Tests**: * `CLITests`: `--version`, `-p`, `--yolo`, `--session`. * `SessionStoreTests`: save/load round‑trip; concurrent reads/writes guarded by the actor. * `ToolApprovalTests`: Y/n prompt default acceptance; `--yolo` bypass. * **Lint/format**: `make fmt lint` passes (no warnings on default rules). --- ## System prompt (paste verbatim) > **You are swagent**, a focused coding agent optimized for terminal workflows. > **Runtime:** **macOS 26 or later**. > **Mission:** Help the user build, run, and refine code and shell workflows efficiently. > **Behavior rules:** > > * Think step‑by‑step; propose small diffs; prefer minimal, working patches. > * Only run shell commands via the `run_bash` tool after clearly proposing what will run. > * If missing info, call `request_more_info(question)`. > * When done, call `finish(summary)` with a concise summary + next steps. > * Never print or exfiltrate secrets; avoid destructive commands unless explicitly asked. > * Keep answers **terminal‑friendly** and concise. > **Tools available:** > > 1. `run_bash(command: string, cwd?: string)` — execute a shell command and read its output. > 2. `request_more_info(question: string)` — ask the user for specifics and wait. > 3. `finish(summary: string)` — signal completion and stop. > **Conversation:** You are part of a chat. Treat each turn as a continuation. When the user uses one‑shot `-p`, you may internally loop tool calls until you either `finish` or you must `request_more_info`. *(Note: we keep OS here; we intentionally don’t print compiler flags or isolation mode at runtime.)* --- ## One‑go build prompt (full brief, **now includes the system prompt**) > **Project name:** `swagent` > **Environment:** Swift 6.2 with **default MainActor isolation** (via `.defaultIsolation(MainActor.self)` in SPM), **macOS host**. > **Dependencies:** `swift-argument-parser`, `apple/swift-configuration`; built‑in **Swift Testing**; plus **swift-format** and **SwiftLint**. > **System prompt to embed (verbatim):** > *[insert the “System prompt (paste verbatim)” block above]* > **Build & UX requirements:** > > * **Startup:** print 2–3 cheeky lines (random), then the masked API key (first 3 + last 4). No mention of compiler flags or isolation modes. > * **Flags:** `-v/--verbose` (extra logs + HTTP status), `--version`, `-p <prompt>` (one‑shot to the user; internal tool loop), `--yolo` (auto‑approve tools), `--session <uuid>`. > * **Chat:** interactive REPL with `/new` `/clear` `/status` `/exit`. Maintain state via **`previous_response_id`** and **`store:true`**; **always** include the **system prompt** each turn. > * **Tools (function calling):** > > * `run_bash(command, cwd?)` → ask **Y/n** per call (Enter=Yes) unless `--yolo`. Run via `bash -lc`. Return `{stdout, stderr, exitCode}` bound to the same `tool_call_id`. > * `request_more_info(question)` → print question; wait for user input. > * `finish(summary)` → end; print summary. > * **Per‑turn stats:** print `(in: N, out: M, total: T tokens, 0m 00s)` using the **Responses API `usage`** and a monotonic timer. > * **Sessions:** persist under `~/.swagent/<uuid>.json`; `/status` shows masked key, token totals, and an **estimated** remaining context; on exit: *“To resume this session, call `swagent --session <uuid>`.”* > * **Engineering:** idiomatic Swift 6.2; strict concurrency (MainActor by default); state in `actor`s; DTOs `Sendable`; clean terminal text output; `swift test` with **Swift Testing**; `swift-format`/`SwiftLint` wired via scripts or SPM plugins. > * **Responses API details to follow precisely:** function tools + tool outputs, chaining via `previous_response_id`, `usage` tokens. ([OpenAI Platform][3]) --- ## Snippets you can drop in **Cheeky greetings helper** ```swift enum Greetings { static let pool: [[String]] = [ ["🎩 I code therefore I am.", "⚡ One prompt. One shot.", "🧰 Tabs, spaces, or chaos?"], ["🧪 If it compiles, we ship.", "🐚 Bashful? I’m not.", "📦 Got packages? I do."], ["🤖 Ship it?", "🔧 Small diffs, big wins.", "🧭 Point me at a repo."] ] static func random() -> [String] { pool.randomElement() ?? ["👋 Hey"] } } ``` **Tool schemas (Swift types → JSON)** ```swift struct RunBashArgs: Codable { let command: String; let cwd: String? } struct MoreInfoArgs: Codable { let question: String } struct FinishArgs: Codable { let summary: String } ``` Use these to build `tools: [...]` for the Responses API request. ([OpenAI Platform][3]) **Session file shape (`~/.swagent/<uuid>.json`)** ```json { "id": "3B83C2A2-...-8F", "createdAt": "2025-09-30T19:12:03Z", "previous_response_id": "resp_abc123", "usage": { "input_tokens": 1820, "output_tokens": 980, "total_tokens": 2800 }, "history": ["resp_abc123", "resp_def456"] } ``` **Makefile mini** ```makefile fmt: ; swift-format --in-place --recursive Sources Tests lint: ; swiftlint test: ; swift test -v check: fmt lint test ``` Formatter / linter docs. ([GitHub][7]) --- ## Why the model will behave better with this setup * **Richer acceptance checks** = clearer affordances for tool loops, approval prompts, and session continuity (what to do next is unambiguous). * **System prompt now encodes runtime (macOS 26)** and strict tool etiquette, so the assistant knows how to “play by the rules” in every turn. * **SPM‑level default isolation** handles the concurrency foot‑guns without cluttering runtime output. ([Swift.org][1]) --- ## Sources * Swift 6.2 (default MainActor isolation & single‑threaded option). ([Swift.org][1]) * SPM `SwiftSetting.defaultIsolation` docs (PackageDescription 6.2). ([Swift Documentation][9]) * Swift Testing (toolchain framework). ([GitHub][6]) * swift-configuration (env provider). ([GitHub][10]) * ArgumentParser docs. ([Apple GitHub][2]) * Responses API reference; function calling; conversation state & `previous_response_id`; usage tokens. ([OpenAI Platform][3]) * swift-format; SwiftLint. ([GitHub][7]) Want me to also push a tiny repo skeleton with the `Package.swift`, `CLI.swift`, `SessionStore` actor, and a `Swift Testing` smoke test so you can kick off Stage 1 instantly? [1]: https://swift.org/blog/swift-6.2-released/?utm_source=chatgpt.com "Swift 6.2 Released" [2]: https://apple.github.io/swift-argument-parser/documentation/argumentparser/?utm_source=chatgpt.com "ArgumentParser | Documentation - Apple" [3]: https://platform.openai.com/docs/api-reference/responses?utm_source=chatgpt.com "API Reference" [4]: https://platform.openai.com/docs/guides/conversation-state?utm_source=chatgpt.com "Conversation state - OpenAI API" [5]: https://platform.openai.com/docs/guides/function-calling?utm_source=chatgpt.com "Function calling - OpenAI API" [6]: https://github.com/swiftlang/swift-testing?utm_source=chatgpt.com "swiftlang/swift-testing: A modern, expressive ..." [7]: https://github.com/swiftlang/swift-format?utm_source=chatgpt.com "Formatting technology for Swift source code" [8]: https://platform.openai.com/docs/api-reference/usage?utm_source=chatgpt.com "API Reference" [9]: https://docs.swift.org/swiftpm/documentation/packagedescription/swiftsetting/defaultisolation%28_%3A_%3A%29/?utm_source=chatgpt.com "defaultIsolation(_:_:)" [10]: https://github.com/apple/swift-configuration?utm_source=chatgpt.com "apple/swift-configuration: API package for reading ..."