Skip to content

Instantly share code, notes, and snippets.

@jawond
Forked from steipete/swagent-spec.md
Created September 30, 2025 21:27
Show Gist options
  • Save jawond/4c306a6524f12c8be4ed202bfd4c9516 to your computer and use it in GitHub Desktop.
Save jawond/4c306a6524f12c8be4ed202bfd4c9516 to your computer and use it in GitHub Desktop.
Working Promot for AI Agent Workshop at Swift Connnection 2025

Below is a clean, workshop‑ready guide for swagent, split into three parts as requested.


1) Docs — the exact contract (Responses API, tools, streaming, chaining)

Endpoints

  • Create/continue a response: POST https://api.openai.com/v1/responses Headers: Authorization: Bearer $OPENAI_API_KEY, Content-Type: application/json. (OpenAI Platform)

Core request fields

  • model: "gpt-5-codex".
  • instructions: your system rules (string). Re‑send them on every turn.
  • input: string or an array of items (e.g., user message, function call outputs).
  • store: true if you’ll chain turns later with previous_response_id. (OpenAI Platform)

Tools (function calling)

  • Send tools as top‑level objects in tools with this shape:

    {
      "type": "function",
      "name": "run_bash",
      "description": "Run a bash command and return stdout, stderr, exitCode.",
      "parameters": {
        "type": "object",
        "properties": {
          "command": { "type": "string" },
          "cwd": { "type": "string" }
        },
        "required": ["command"]
      }
    }
  • You can let the model choose with "tool_choice": "auto". (OpenAI Platform)

Function‑call loop (no tool_outputs param)

  1. First call: model may return items of type: "function_call" in output with call_id, name, and arguments (JSON string).

  2. Run the tool locally.

  3. Continue the run by calling POST /v1/responses again with:

    • previous_response_id: the prior response id

    • instructions: the same system rules

    • input: an array of items, each

      {
        "type": "function_call_output",
        "call_id": "<same id>",
        "output": "<stringified JSON like { stdout, stderr, exitCode }>"
      }

This is how you return tool results. Don’t send a top‑level tool_outputs field. (OpenAI Platform)

Streaming (SSE)

  • Set "stream": true to get Server‑Sent Events while the model is thinking. You’ll receive events such as:

    • response.created (start)
    • response.output_text.delta (text chunks)
    • response.function_call.delta (incremental function args)
    • response.completed (final object, includes usage) Handle errors via response.error. (OpenAI Platform)

Usage & token counters

  • Use usage.input_tokens, usage.output_tokens, usage.total_tokens (snake_case) to print per‑turn stats. These arrive on the final response (or response.completed in streaming). (OpenAI Platform)

Conversation state

  • To continue a chat without resending past text, set previous_response_id and re‑send your instructions. You may also pass prior output items explicitly if you need. (OpenAI Platform)

Progress signal taxonomy (what to show in the CLI)

  • Before the first output token: “🧠 thinking…” (spinner) once you receive response.created.
  • While streaming text: live print each response.output_text.delta.
  • When the model starts a tool: “🔧 run_bash …” as soon as you see response.function_call.delta / the final function_call item.
  • While executing the tool: “⏳ running command…” until you post the function_call_output and the model resumes.
  • On finalization: “✅ done” once response.completed arrives, then print the footer with usage. (OpenAI Platform)

2) Full instructions — build the whole CLI in one pass

Project: swagent Language/Tooling: Swift 6.2 with SwiftSetting.defaultIsolation(MainActor.self) enabled via SPM; dependencies: swift-argument-parser, apple/swift-configuration; built‑in Swift Testing (Xcode 26); add swift-format and SwiftLint. Model/API: OpenAI Responses API, model: gpt-5-codex, streaming on. (OpenAI Platform)

Startup UX

  • Print 2–3 cheeky lines (random) and the masked API key (first 3 + last 4).

  • Examples:

    • “🎩 I code therefore I am.”
    • “⚡ One prompt. One shot. Make it count.”
    • “🔧 Small diffs, big wins.”
    • “🧪 If it compiles, we ship. Mostly.”
    • “🐚 Bashful? I’m not.”

Flags

  • -v, --verbose — extra logs (HTTP status, timings).
  • --version — print version.
  • -p <prompt> — one‑shot user interaction; internally the agent may loop via tools until finish or it needs info.
  • --yolo — auto‑approve all shell commands (no interactive Y/n).
  • --session <uuid> — load a persisted session.

Commands

  • /new or /clear — reset conversation state.
  • /status — show masked key, token totals this session, estimated remaining context.
  • /exit — quit; print: “To resume this session, call swagent --session <uuid>.”

System prompt (embed verbatim in instructions every turn)

You are swagent, a coding agent for terminal workflows. Runtime: macOS 26 or later. Mission: Build, run, and refine code + shell workflows; verify your work. Behavior:

  • Think step‑by‑step; prefer small diffs and working patches.
  • When you propose commands, call run_bash to execute them; never ask the user to confirm (the CLI handles approvals).
  • If the runtime says yolo=true, treat commands as pre‑approved and run immediately.
  • If yolo=false and a command is destructive/ambiguous, call request_more_info(question) once; otherwise, just run_bash.
  • When done, call finish(summary) with a concise summary + next steps.
  • Keep output terminal‑friendly and concise; never print secrets. Tools:
  1. run_bash(command: string, cwd?: string) → returns {stdout, stderr, exitCode}.
  2. request_more_info(question: string)
  3. finish(summary: string) Responses API rules:
  • Use model: gpt-5-codex.
  • Re‑send these instructions every turn.
  • Chain with previous_response_id.
  • Tools are top‑level { type:'function', name, description, parameters }.
  • Tool calls arrive as output items of type:'function_call' with a call_id. Return results by continuing with previous_response_id and sending input: [{ "type":"function_call_output", "call_id":"<same>", "output":"<stringified JSON>" }].
  • Read usage.input_tokens, usage.output_tokens, usage.total_tokens for per‑turn stats. [swagent runtime] yolo=true|falseverbose=true|falsesession=<uuid>cwd=<path> (OpenAI Platform)

Runtime header Append the [swagent runtime] block above to instructions every turn (so the agent knows about yolo, etc.). (OpenAI Platform)

Tooling & policies

  • Bash tool: Implement run_bash(command, cwd?) via bash -lc. By default, prompt Run? [Y/n] (Enter=Yes). With --yolo, auto‑approve. Return {stdout, stderr, exitCode} (JSON), but stringify it before sending as function_call_output.
  • Ask‑for‑info tool: request_more_info(question) prints the question and waits for a one‑line user reply; forward that as the next turn’s user message (you can co‑send alongside tool outputs in input).
  • Finish tool: finish(summary) prints the summary and ends the current action (stay in REPL unless in -p mode).
  • Self‑testing: After code changes, the agent must call run_bash to run swift build (and swift test if tests exist), and also self‑invoke the CLI (swift run swagent …) to verify flags.

Streaming & progress

  • Always set "stream": true when calling /v1/responses. Show:

    • Thinking spinner after response.created until first response.output_text.delta.
    • Live text streaming by writing each delta chunk immediately.
    • Tool call progress when you see a function_call (or its deltas): print the command preview; switch to “⏳ running…” while executing; resume streaming once you send function_call_output.
    • Footer on response.completed using usage.* and a monotonic timer. Event names and flow: see Responses streaming & Realtime guides. (OpenAI Platform)

Sessions

  • Persist under ~/.swagent/<uuid>.json via an actor.
  • Save: previous_response_id, chain of response ids, per‑session token totals, timestamps.
  • --session <uuid> loads and continues from file.

Config

  • Use swift-configuration to read OPENAI_API_KEY from the environment; mask it as sk‑abc…wxyz on startup. (OpenAI Platform)

Testing, format, lint

  • Use Swift Testing (built‑in with Xcode 26) for unit tests.
  • Add swift-format + SwiftLint targets/scripts.

Security

  • Never echo secrets.
  • Treat dangerous commands conservatively when yolo=false (use request_more_info).

Minimal JSON crib sheet (copy/paste)

Create (turn 1, with tools & streaming):

{
  "model": "gpt-5-codex",
  "instructions": "<SYSTEM PROMPT + [swagent runtime]>",
  "input": "Create a Swift package and build it.",
  "tools": [ { "type":"function","name":"run_bash","description":"Run bash","parameters":{
    "type":"object","properties":{"command":{"type":"string"},"cwd":{"type":"string"}},
    "required":["command"]
  }}, { "type":"function","name":"request_more_info","parameters":{
    "type":"object","properties":{"question":{"type":"string"}},"required":["question"]
  }}, { "type":"function","name":"finish","parameters":{
    "type":"object","properties":{"summary":{"type":"string"}},"required":["summary"]
  }} ],
  "tool_choice": "auto",
  "store": true,
  "stream": true
}

Continue (turn 2, return tool result):

{
  "model": "gpt-5-codex",
  "instructions": "<SYSTEM PROMPT + [swagent runtime]>",
  "previous_response_id": "resp_123",
  "input": [
    {
      "type": "function_call_output",
      "call_id": "call_abc",
      "output": "{\"stdout\":\"initialized…\",\"stderr\":\"\",\"exitCode\":0}"
    }
  ],
  "stream": true
}

Docs: Responses create, streaming events, migration guide (function_call_output), usage counters, conversation state. (OpenAI Platform)


3) Step‑by‑step — 5 tiny stages (each 7–12 minutes), with streaming & checks

Stage 1 — Minimal one‑shot + streaming

Build

  • SPM executable target; enable SwiftSetting.defaultIsolation(MainActor.self) in swiftSettings.
  • Deps: swift-argument-parser, swift-configuration.
  • Implement a single Responses call with "stream": true; stream response.output_text.delta to stdout.
  • Startup prints 2–3 cheeky lines + masked key.
  • Flags: --version, -v.

Checks

  • swagent --version → prints version only.
  • swagent -v "Ping" → shows cheeky lines, masked key, streams text live, then footer (in: X, out: Y, total: Z, 0m 00s) from usage. Streaming/usage: see docs. (OpenAI Platform)
  • No key → clear single‑line error.

Stage 2 — Sticky chat (REPL), -p one‑shot, runtime header

Build

  • Interactive REPL; keep -p for one‑shot.
  • Maintain state via previous_response_id + store:true.
  • Always re‑send instructions and attach a [swagent runtime] header with yolo, verbose, session, cwd.

Checks

  • Second user turn uses the first turn’s previous_response_id (verify in logs if -v).
  • /new clears state (next call has no previous_response_id).
  • Streaming remains active in both REPL and -p. Chaining: see conversation state docs. (OpenAI Platform)

Stage 3 — Agent signals (finish / request_more_info), loop via function_call_output

Build

  • Add two tools:

    • finish(summary: string)
    • request_more_info(question: string)
  • Implement the function‑call loop:

    • Parse any function_call items.
    • For request_more_info, print the question and wait for input; continue by sending a user message item in input (you can send it alongside any function_call_output items).
    • For finish, print the summary and stop the action.

Checks

  • Prompt: “Ask me one clarifying question, then summarize and finish.” → Model calls request_more_info → collects answer → model calls finish → summary printed + footer.
  • Confirm there’s no top‑level tool_outputs; only input items with type:"function_call_output" on continuations. (OpenAI Platform)

Stage 4 — Bash tool (guardrails), self‑testing, yolo awareness

Build

  • Add run_bash(command, cwd?):

    • Default approval: Run? [Y/n] (Enter=Yes).
    • --yolo: auto‑approve.
    • Execute via bash -lc; capture {stdout, stderr, exitCode}; stringify as the output field in function_call_output.
  • System prompt and runtime header explicitly say: agent never asks for permission; yolo=true means pre‑approved.

  • After code changes, agent must self‑test: swift build, optional swift test, then swift run swagent ….

Checks

  • swagent --yolo -p "Echo hello" → model calls run_bash("echo hello") immediately (no extra prompt), CLI runs, continuation sends function_call_output, finalizes with a reply + footer.
  • swagent -p "Echo hello" (non‑yolo) → agent still does not ask; CLI prompts Y/n; run completes.
  • Tool loop uses previous_response_id + input items, streaming on. (OpenAI Platform)

Stage 5 — Sessions, /status, tests, format/lint

Build

  • Persist sessions under ~/.swagent/<uuid>.json using an actor.

  • /status prints: masked key; per‑session token totals; estimated context left (model limit minus running total).

  • On exit: “To resume this session, call swagent --session <uuid>.”

  • Tests with Swift Testing for:

    • Arg parsing (-v, --version, -p, --yolo, --session).
    • Session store save/load roundtrip (concurrent writes protected by actor).
    • Tool approval logic (Y/n default vs --yolo).
  • Add swift-format and SwiftLint targets (make fmt, make lint, make check).

Checks

  • Two turns, then /status shows totals; /exit persists a JSON containing the latest previous_response_id, cumulative usage, timestamps.
  • --session <uuid> resumes and continues chaining.
  • make check runs format, lint, and tests cleanly.

Minimal streaming cURL (for the slides)

curl https://api.openai.com/v1/responses \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "gpt-5-codex",
    "instructions": "…system prompt…",
    "input": "Say hello, slowly.",
    "stream": true
  }'
# Expect SSE events like: response.created, response.output_text.delta, response.completed

SSE event names and flow: Responses streaming docs (plus Realtime guide for event taxonomy). (OpenAI Platform)


References

Want a tiny Swift snippet that shows parsing SSE lines and switching the UI between “🧠 thinking…”, streaming text, and tool execution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment