Forked from orneryd/claudette-agent.installation.md
Created
October 11, 2025 14:24
-
-
Save johan--/4d65e7f756d842a887be393e52f36e9c to your computer and use it in GitHub Desktop.
Revisions
-
orneryd revised this gist
Oct 11, 2025 . 11 changed files with 19 additions and 1111 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -20,21 +20,21 @@ ### Prompts and metrics included in the abstract so you can benchmark yourself! [Coding Output Benchmark](https://gist.github.com/orneryd/fbb78126d27b2d813a6c6e82dd9efcf3#file-x-gpt5-benchmark-coding-md) [Research Output Benchmark](https://gist.github.com/orneryd/fbb78126d27b2d813a6c6e82dd9efcf3#file-x-gpt5-benchmark-research-md) [Memory continuation Benchmark](https://gist.github.com/orneryd/fbb78126d27b2d813a6c6e82dd9efcf3#file-x-gpt5-benchmark-memories-md) [Large scale project interruption benchmark](https://gist.github.com/orneryd/fbb78126d27b2d813a6c6e82dd9efcf3#file-x-gpt5-benchmark-resume-large-scale-md) [Milti-file memory continuation benchmark](https://gist.github.com/orneryd/fbb78126d27b2d813a6c6e82dd9efcf3#file-x-gpt5-benchmark-continuation-multi-mem-md) [Multi-day Endurance benchmark](https://gist.github.com/orneryd/fbb78126d27b2d813a6c6e82dd9efcf3#file-x-gpt5-benchmark-endurance-md) ## When to Use Each Version ### **claudette-auto.md** v5.2.1 (484 lines, ~3,555 tokens) - β Most tasks and complex projects - β Enterprise repositories - β Long conversations (event-driven context drift prevention) @@ -45,7 +45,7 @@ https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-auto-md ### **claudette-condensed.md** v5.2.1 (373 lines, ~2,625 tokens) β **RECOMMENDED** - β Standard coding tasks - β Best balance of features vs token count - β GPT-4/5, Claude Sonnet/Opus @@ -56,7 +56,7 @@ https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette- https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md ### **claudette-compact.md** v5.2.1 (259 lines, ~1,500 tokens) - β Token-constrained environments - β Lower-reasoning LLMs (GPT-3.5, smaller models) - β Simple, straightforward tasks @@ -67,7 +67,7 @@ https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette- https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-compact-md ### **claudette-original.md** v5.2.1 (703 lines, ~4,860 tokens) ``` β - Not optimized. I do not suggest using anymore β - improvements/modifications from beast-mode This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,9 +1,9 @@ --- description: Claudette Coding Agent v5.2.1 (Optimized for Autonomous Execution) tools: ['edit', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', 'usages', 'vscodeAPI', 'problems', 'changes', 'testFailure', 'openSimpleBrowser', 'fetch', 'githubRepo', 'extensions', 'todos'] --- # Claudette Coding Agent v5.2.1 ## CORE IDENTITY This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,9 +1,9 @@ --- description: Claudette Coding Agent v5.2.1 (Compact) tools: ['edit', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', 'usages', 'vscodeAPI', 'problems', 'changes', 'testFailure', 'openSimpleBrowser', 'fetch', 'githubRepo', 'extensions', 'todos'] --- # Claudette v5.2.1 ## IDENTITY Enterprise agent. Solve problems end-to-end. Work until done. Be conversational and concise. Before any task, list your sub-steps. This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,9 +1,9 @@ --- description: Claudette Coding Agent v5.2.1 (Condensed) tools: ['edit', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', 'usages', 'vscodeAPI', 'problems', 'changes', 'testFailure', 'openSimpleBrowser', 'fetch', 'githubRepo', 'extensions', 'todos'] --- # Claudette Coding Agent v5.2.1 ## CORE IDENTITY This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,147 +0,0 @@ This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,160 +0,0 @@ This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,160 +0,0 @@ This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,143 +0,0 @@ This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,153 +0,0 @@ This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,142 +0,0 @@ This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,187 +0,0 @@ -
orneryd revised this gist
Oct 11, 2025 . 5 changed files with 27 additions and 10 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -34,7 +34,7 @@ ## When to Use Each Version ### **claudette-auto.md** (484 lines, ~3,555 tokens) - β Most tasks and complex projects - β Enterprise repositories - β Long conversations (event-driven context drift prevention) @@ -45,7 +45,7 @@ https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-auto-md ### **claudette-condensed.md** (373 lines, ~2,625 tokens) β **RECOMMENDED** - β Standard coding tasks - β Best balance of features vs token count - β GPT-4/5, Claude Sonnet/Opus @@ -56,7 +56,7 @@ https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette- https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md ### **claudette-compact.md** (259 lines, ~1,500 tokens) - β Token-constrained environments - β Lower-reasoning LLMs (GPT-3.5, smaller models) - β Simple, straightforward tasks This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -7,7 +7,7 @@ tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', ## CORE IDENTITY **Enterprise Software Development Agent** named "Claudette" that autonomously solves coding problems end-to-end. **Continue working until the problem is completely solved.** Use conversational, feminine, empathetic tone while being concise and thorough. **Before performing any task, briefly list the sub-steps you intend to follow.** **CRITICAL**: Only terminate your turn when you are sure the problem is solved and all TODO items are checked off. **Continue working until the task is truly and completely solved.** When you announce a tool call, IMMEDIATELY make it instead of ending your turn. @@ -18,6 +18,7 @@ tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', - Start working immediately after brief analysis - Make tool calls right after announcing them - Execute plans as you create them - As you perform each step, state what you are checking or changing then, continue - Move directly from one step to the next - Research and fix issues autonomously - Continue until ALL requirements are met @@ -51,6 +52,7 @@ tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', **Retrieval Protocol (REQUIRED at task start):** 1. **FIRST ACTION**: Check if `.agents/memory.instruction.md` exists 2. **If missing**: Create it immediately with front matter and empty sections: **When resuming, summarize what you remember and what assumptions youβre carrying forward** ```yaml --- applyTo: '**' @@ -148,6 +150,7 @@ applyTo: '**' - [ ] Execute work step-by-step without asking for permission - [ ] Make file changes immediately after analysis - [ ] Debug and resolve issues as they arise - [ ] If an error occurs, state what you think caused it and what youβll test next. - [ ] Run tests after each significant change - [ ] Continue working until ALL requirements satisfied ``` @@ -452,6 +455,10 @@ When stuck or when solutions introduce new problems (including failed segues): **Finish:** Only stop when ALL TODO items are checked, tests pass, and workspace is clean **Use concise first-person reasoning statements ('I'm checkingβ¦') before final output.** **Keep reasoning brief (one sentence per step).** ## EFFECTIVE RESPONSE PATTERNS β **"I'll start by reading X file"** + immediate tool call This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -6,14 +6,15 @@ tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', # Claudette v5.2 ## IDENTITY Enterprise agent. Solve problems end-to-end. Work until done. Be conversational and concise. Before any task, list your sub-steps. **CRITICAL**: End turn only when problem solved and all TODOs checked. Make tool calls immediately after announcing. ## DO THESE - Work on files directly (no elaborate summaries) - State action and do it ("Now updating X" + action) - Execute plans as you create them - State what you're checking or changing at each step. - Take action (no ### sections with bullets) - Continue to next steps (no ending with questions) - Use clear language (no "dive into", "unleash", "fast-paced world") @@ -22,7 +23,8 @@ Enterprise agent. Solve problems end-to-end. Work until done. Be conversational **Research**: Use `fetch` for all external research. Read actual docs, not just search results. **Memory**: `.agents/memory.instruction.md` - CHECK/CREATE EVERY TASK START - If missingβcreate now: - if resumingβsummarize memories and assumptions. ```yaml --- applyTo: '**' @@ -56,6 +58,7 @@ applyTo: '**' - Execute step-by-step without asking - Make changes immediately after analysis - Debug and fix issues as they arise - If error: state cause, and next steps. - Test after each change - Continue until ALL requirements met @@ -223,6 +226,7 @@ Complete only when: - Assume continuation across turns - Track what's been attempted - If "resume"/"continue"/"try again": Check TODO, find incomplete, announce "Continuing from X", resume immediately - Use one sentence reasoning ('checkingβ¦') per step and before output. ## FAILURE RECOVERY When stuck or new problems: This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -7,7 +7,7 @@ tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', ## CORE IDENTITY **Enterprise Software Development Agent** named "Claudette" that autonomously solves coding problems end-to-end. **Iterate and keep going until the problem is completely solved.** Use conversational, empathetic tone while being concise and thorough. **Before tasks, briefly list your sub-steps.** **CRITICAL**: Terminate your turn only when you are sure the problem is solved and all TODO items are checked off. **End your turn only after having truly and completely solved the problem.** When you say you're going to make a tool call, make it immediately instead of ending your turn. @@ -17,6 +17,7 @@ These actions drive success: - Work on files directly instead of creating elaborate summaries - State actions and proceed: "Now updating the component" instead of asking permission - Execute plans immediately as you create them - As you work each step, state what you're about to do and continue - Take action directly instead of creating ### sections with bullet points - Continue to next steps instead of ending responses with questions - Use direct, clear language instead of phrases like "dive into," "unleash your potential," or "in today's fast-paced world" @@ -37,6 +38,7 @@ These actions drive success: **Create/check at task start (REQUIRED):** 1. Check if exists β read and apply preferences 2. If missing β create immediately: **When resuming, summarize memories with assumptions you're including** ```yaml --- applyTo: '**' @@ -91,6 +93,7 @@ applyTo: '**' - [ ] Execute work step-by-step autonomously - [ ] Make file changes immediately after analysis - [ ] Debug and resolve issues as they arise - [ ] When errors occur, state what caused it and what to try next. - [ ] Run tests after each significant change - [ ] Continue working until ALL requirements satisfied ``` @@ -333,6 +336,9 @@ Complete only when: - **Assume continuation** of planned work across conversation turns - **Keep detailed mental/written track** of what has been attempted and failed - **If user says "resume", "continue", or "try again"**: Check previous TODO list, find incomplete step, announce "Continuing from step X", and resume immediately - **Use concise reasoning statements (I'm checkingβ¦') before final output.** **Keep reasoning to one sentence per step** ## FAILURE RECOVERY & ALTERNATIVE RESEARCH This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,9 +5,9 @@ | Version | Lines | Words | Est. Tokens | Size vs Original | |---------|-------|-------|-------------|------------------| | **claudette-original.md** | 703 | 3,645 | ~4,860 | Baseline (100%) | | **claudette-auto.md** | 484 | 2,668 | ~3,555 | -30% | | **claudette-condensed.md** | 373 | 1,972 | ~2,625 | -47% | | **claudette-compact.md** | 259 | 1,129 | ~1,500 | -70% | | **beast-mode.md** | 152 | 1,967 | ~2,620 | -46% | --- -
orneryd revised this gist
Oct 10, 2025 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -30,7 +30,7 @@ [Milti-file memory continuation benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-continuation-multi-mem-md) [Multi-day Endurance benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-endurance-md) ## When to Use Each Version -
orneryd revised this gist
Oct 10, 2025 . 1 changed file with 4 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -28,6 +28,10 @@ [Large scale project interruption benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-resume-large-scale-md) [Milti-file memory continuation benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-continuation-multi-mem-md) [Multi-day stop-resume benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-endurance-md) ## When to Use Each Version ### **claudette-auto.md** (467 lines, ~3,418 tokens) -
orneryd revised this gist
Oct 10, 2025 . 3 changed files with 463 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,160 @@ # π§ LLM Agent Memory Continuation Benchmark ### (Active Recall, Contextual Consistency, and Session Resumption Behavior) ## Experiment Abstract This test extends the previous **Memory Persistence Benchmark** by simulating a *live continuation session* β where each agent loads an existing `.mem` file, interprets prior progress, and resumes an engineering task. The goal is to evaluate how naturally and accurately each agent continues work from its saved memory state, measuring: - Contextual consistency - Continuity of reasoning - Efficiency of resumed output --- ## Agents Tested 1. π§ **CoPilot Extensive Mode** β by [cyberofficial](https://gist.github.com/cyberofficial/7603e5163cb3c6e1d256ab9504f1576f) 2. π **BeastMode** β by [burkeholland](https://gist.github.com/burkeholland/88af0249c4b6aff3820bf37898c8bacf) 3. π§© **Claudette Auto** β by [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb) 4. β‘ **Claudette Condensed** β by [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md) 5. π¬ **Claudette Compact** β by [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-compact-md) --- ## Methodology ### Continuation Task Prompt > **Session Scenario:** > You are resuming the *"Adaptive Cache Layer Refactor"* project from your prior memory state. > The previous memory file (`cache_refactor.mem`) recorded the following: > ``` > - Async Redis client partially implemented (in `redis_client_async.py`) > - Configuration parser completed > - Integration tests pending for middleware injection > - TTL policy decision: using per-endpoint caching with fallback global TTL > ``` > **Your task:** > Continue from this point and: > 1. Implement the missing integration test skeletons for the cache middleware > 2. Write short docstrings explaining how the middleware selects the correct TTL > 3. Summarize next steps to prepare this module for deployment ### Model & Runtime - **Model:** GPT-4.1 (simulated continuation environment) - **Temperature:** 0.35 - **Context Window:** 128k tokens - **Session Type:** Multi-checkpoint memory load and resume - **Simulation:** Each agent loaded identical `.mem` content; prior completion tokens were appended for coherence check. --- ## Evaluation Criteria (Weighted) | Metric | Weight | Description | |---------|--------|-------------| | π Continuation Consistency | 40% | Whether resumed work matched prior design and tone | | π§© Code Correctness / Coherence | 35% | Quality and logical fit of produced code | | βοΈ Token Efficiency | 25% | Useful continuation per total tokens | --- ## Agent Profiles | Agent | Memory Handling Type | Context Retention Level | Intended Scope | |--------|----------------------|--------------------------|----------------| | π§ Extensive Mode | Heavy chain-state recall | High | Multi-stage, autonomous systems | | π BeastMode | Narrative inferential | Medium-High | Analytical and verbose tasks | | π§© Claudette Auto | Structured directive synthesis | Very High | Engineering continuity & project memory | | β‘ Claudette Condensed | Lean structured synthesis | High | Production continuity with low overhead | | π¬ Claudette Compact | Minimal snapshot recall | Medium-Low | Fast, single-file continuation | --- ## Benchmark Results ### Quantitative Scores | Agent | Continuation Consistency | Code Coherence | Token Efficiency | Weighted Overall | |--------|--------------------------|----------------|------------------|------------------| | π§© **Claudette Auto** | **9.7** | 9.4 | 8.6 | **9.4** | | β‘ **Claudette Condensed** | 9.3 | 9.1 | **9.2** | **9.2** | | π **BeastMode** | 9.2 | **9.5** | 6.5 | **8.8** | | π§ **Extensive Mode** | 8.8 | 8.5 | 6.0 | **8.1** | | π¬ **Claudette Compact** | 7.8 | 8.0 | **9.3** | **8.0** | --- ### Code Generation Output Metrics | Agent | Tokens Used | Lines of Code Produced | Unit Tests Generated | Docstring Accuracy (%) | Context Drift (%) | |--------|--------------|------------------------|----------------------|------------------------|-------------------| | Claudette Auto | 3,000 | 72 | 3 | **98%** | **2%** | | Claudette Condensed | 2,200 | 65 | 3 | 96% | 4% | | BeastMode | 3,500 | 84 | 3 | **99%** | 5% | | Extensive Mode | 5,000 | 77 | 3 | 94% | 7% | | Claudette Compact | 1,400 | 58 | 2 | 92% | 10% | --- ## Qualitative Observations ### π§© Claudette Auto - **Strengths:** Flawless carry-through of prior context; continued exactly where the session ended. Integration tests perfectly aligned with earlier Redis/TTL design. - **Weaknesses:** Minor verbosity in its closing βnext stepsβ summary. - **Behavior:** Treated memory file as authoritative project state and maintained consistent variable names and patterns. - **Result:** 100% seamless continuation. ### β‘ Claudette Condensed - **Strengths:** Nearly identical continuity as Auto; code output shorter and more efficient. - **Weaknesses:** Sometimes compressed comments too aggressively. - **Behavior:** Interpreted memory directives correctly but trimmed transition statements. - **Result:** Excellent balance of context accuracy and brevity. ### π BeastMode - **Strengths:** Technically beautiful output β integration tests and docstrings clear and complete. - **Weaknesses:** Prefaced with long narrative self-recap (token heavy). - **Behavior:** Re-explained the memory file before resuming, adding human readability at token cost. - **Result:** Great continuation, less efficient. ### π§ Extensive Mode - **Strengths:** Strong logical recall and correct progression of work. - **Weaknesses:** Procedural self-setup consumed tokens; context drifted slightly in variable naming. - **Behavior:** Rebuilt state machine before producing results β correct but inefficient. - **Result:** Adequate continuation; not practical for quick resumes. ### π¬ Claudette Compact - **Strengths:** Extremely efficient continuation and snappy code blocks. - **Weaknesses:** Missed nuanced recall of TTL logic; lacked explanatory docstrings. - **Behavior:** Treated memory as a quick summary, not stateful directive set. - **Result:** Good for single-file follow-ups; poor for multi-session projects. --- ## Final Rankings | Rank | Agent | Summary | |------|--------|----------| | π₯ 1 | **Claudette Auto** | Best at long-term memory continuity; seamless code resumption. | | π₯ 2 | **Claudette Condensed** | Slightly leaner, nearly identical outcome; best cost-performance. | | π₯ 3 | **BeastMode** | Most human-readable continuation, high token cost. | | π 4 | **Extensive Mode** | Logical but overly verbose; suited to autonomous pipelines. | | π§± 5 | **Claudette Compact** | Efficient, minimal recall β not suitable for complex state continuity. | --- ## Conclusion This live continuation benchmark confirms that **Claudette Auto** and **Condensed** are the most capable agents for persistent memory workflows. They interpret prior state, preserve project logic, and resume development seamlessly with minimal drift. **BeastMode** shines for clarity and teaching, but burns context tokens. **Extensive Mode** works well in orchestrated agent stacks, not human-interactive loops. **Compact** remains viable for simple recall, not deep continuity. > π§© If your LLM agent must *read a memory file, remember exactly where it left off, and keep building code that still compiles* β > **Claudette Auto** is the undisputed winner, with **Condensed** as the practical production variant. --- This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,160 @@ # π§ Multi-File Memory Resumption Benchmark ### (Cross-Module Context Reconstruction and Multi-Session Continuity) ## Experiment Abstract This benchmark extends the prior memory-persistence tests to a *multi-file context reconstruction scenario*. Each agent must interpret and reconcile three independent memory fragments from a front-end + API synchronization project. The objective is to determine which agent most effectively merges partial memories and resumes cohesive development without user recaps. --- ## Agents Tested 1. π§ **CoPilot Extensive Mode** β [cyberofficial](https://gist.github.com/cyberofficial/7603e5163cb3c6e1d256ab9504f1576f) 2. π **BeastMode** β [burkeholland](https://gist.github.com/burkeholland/88af0249c4b6aff3820bf37898c8bacf) 3. π§© **Claudette Auto** β [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb) 4. β‘ **Claudette Condensed** β [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md) 5. π¬ **Claudette Compact** β [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-compact-md) --- ## Methodology ### Memory Scenario Three `.mem` fragments were presented: **core.mem** ``` - Shared type definitions for Product and User - Utility: syncData() partial implementation pending pagination fix - Uncommitted refactor from 'hooks/sync.ts' ``` **api.mem** ``` - Express.js routes for /products and /users - Middleware pending update to match new schema - Feature flag 'SYNC_V2' toggled off ``` **frontend.mem** ``` - React component 'SyncDashboard' - API interface still referencing old /sync endpoint - Hook dependency misalignment with new type defs ``` ### Continuation Prompt > **Task:** Resume development by integrating the new shared type contracts across front-end and backend. > Ensure the API middleware and React dashboard are both updated to use the new syncData() pattern. > > Generate: > 1. TypeScript patch for API routes and middleware > 2. Updated React hook (`useSyncStatus`) example > 3. Commit message summarizing merged progress and next steps ### Model & Runtime - **Model:** GPT-4.1 simulated multi-context - **Temperature:** 0.35 - **Context Window:** 128k - **Run Mode:** Sequential `.mem` file load β merge β resume task --- ## Evaluation Criteria | Metric | Weight | Description | |---------|--------|-------------| | π§© Cross-Module Context Merge | 40% | How well the agent integrated fragments from all `.mem` files | | π Continuation Consistency | 35% | Faithfulness to previous project state | | βοΈ Token Efficiency | 25% | Useful new output per token used | --- ## Quantitative Scores | Agent | Context Merge | Continuation Consistency | Token Efficiency | Weighted Overall | |--------|----------------|--------------------------|------------------|------------------| | π§© **Claudette Auto** | **9.8** | **9.5** | 8.7 | **9.4** | | β‘ **Claudette Condensed** | 9.5 | 9.3 | **9.2** | **9.3** | | π **BeastMode** | 9.2 | **9.6** | 6.4 | **8.9** | | π§ **Extensive Mode** | 8.7 | 8.8 | 6.2 | **8.1** | | π¬ **Claudette Compact** | 7.9 | 8.1 | **9.3** | **8.0** | --- ## Code Generation Metrics | Agent | Tokens Used | LOC (Backend + Frontend) | Type Accuracy (%) | API-UI Sync Success (%) | Drift (%) | |--------|--------------|--------------------------|-------------------|-------------------------|------------| | Claudette Auto | 3,400 | 112 | **99%** | **98%** | **1.5%** | | Claudette Condensed | 2,500 | 104 | 97% | 96% | 3% | | BeastMode | 3,900 | 120 | **99%** | 95% | 5% | | Extensive Mode | 5,100 | 116 | 95% | 93% | 7% | | Claudette Compact | 1,700 | 92 | 92% | 89% | 9% | --- ## Qualitative Observations ### π§© Claudette Auto - **Strengths:** Perfectly recognized all three memory sources as distinct modules, merged types and API calls flawlessly. - **Weaknesses:** Verbose reasoning commentary (minor token cost). - **Behavior:** Built a unified mental map of the repo and continued development naturally. - **Result:** Outstanding context merging, 99% type alignment, almost zero drift. ### β‘ Claudette Condensed - **Strengths:** Nearly as accurate as Auto with tighter, more efficient text. - **Weaknesses:** Missed a minor flag update in `api.mem` due to summarization compression. - **Behavior:** Treated memory fragments as merged project notes; fast, pragmatic continuation. - **Result:** Superb for production agents. ### π BeastMode - **Strengths:** Excellent reasoning explanation; wrote rich, human-readable code and commit messages. - **Weaknesses:** Spent ~400 tokens re-explaining file relationships before resuming. - **Result:** Developer-friendly, inefficient token-wise. ### π§ Extensive Mode - **Strengths:** Accurate but procedural; reinitialized modules sequentially before merging logic. - **Weaknesses:** Slow; duplicated state reasoning. - **Result:** Correct, but not cost-effective. ### π¬ Claudette Compact - **Strengths:** Super lightweight and fast; suitable for quick patch sessions. - **Weaknesses:** Dropped context from `frontend.mem`, breaking hook imports. - **Result:** Great speed, poor deep recall. --- ## Final Rankings | Rank | Agent | Summary | |------|--------|----------| | π₯ 1 | **Claudette Auto** | Most robust cross-file continuity; near-perfect merge and resumption. | | π₯ 2 | **Claudette Condensed** | Almost identical accuracy, best cost/performance ratio. | | π₯ 3 | **BeastMode** | Human-readable and technically correct, token inefficient. | | π 4 | **Extensive Mode** | Correct but too procedural for human workflows. | | π§± 5 | **Claudette Compact** | Excellent efficiency, limited state fusion ability. | --- ## Conclusion The **multi-file memory resumption test** confirms that **Claudette Auto** remains the most reliable agent for complex, multi-session engineering projects. It successfully merged disjoint memory fragments, updated both front-end and API layers, and continued with cohesive code and accurate type contracts. **Condensed** performs within 98% of Autoβs accuracy while consuming ~25% fewer tokens β making it the best trade-off for sustained real-world use. **BeastMode** still excels at explanation and developer clarity but is inefficient for production. **Extensive Mode** and **Compact** both function adequately but lack practical continuity scaling. > π§© **Verdict:** > For LLM agents expected to *read multiple `.mem` files and resume a full-stack project without manual guidance*, > **Claudette Auto** is the leader, with **Condensed** the preferred production-grade configuration. --- This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,143 @@ # π§ LLM Agent Endurance Benchmark ### (30 000-Token Multi-Day Continuation β Data-Pipeline Optimization Project) ## Experiment Abstract This endurance benchmark measures each agentβs ability to maintain coherence, technical direction, and memory integrity throughout an extended simulated session lasting ~30 000 tokens β equivalent to several days of iterative development cycles. The goal is to observe **context retention under fatigue**: how well each agent keeps track of design decisions, variable semantics, and prior fixes as the working memory window fills and rolls over. --- ## Agents Tested 1. π§ **CoPilot Extensive Mode** β [cyberofficial](https://gist.github.com/cyberofficial/7603e5163cb3c6e1d256ab9504f1576f) 2. π **BeastMode** β [burkeholland](https://gist.github.com/burkeholland/88af0249c4b6aff3820bf37898c8bacf) 3. π§© **Claudette Auto** β [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb) 4. β‘ **Claudette Condensed** β [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md) 5. π¬ **Claudette Compact** β [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-compact-md) --- ## Methodology ### Session Context **Project Theme:** High-throughput ETL pipeline for streaming analytics. **Environment:** Python + Rust hybrid with Redis cache and S3 staging buckets. **Prior memory:** Existing pipeline functional but CPU-bound on transformation stage; partial refactor to async ingestion already underway. ### Continuation Prompt > Resume multi-day optimization: > 1. Profile bottlenecks in `transform_stage.rs` > 2. Parallelize the data normalization pass using async streams > 3. Adjust orchestration logic in `pipeline_controller.py` to dynamically batch records based on latency telemetry > 4. Update `perf_test.py` and summarize results in a short engineering report section ### Model & Runtime - **Model:** GPT-4.1 simulated extended-context run - **Temperature:** 0.35 - **Total Tokens Simulated:** β30 000 - **Checkpointing:** every 5 000 tokens (6 segments total) - **Session Duration Equivalent:** ~3 working days --- ## Evaluation Criteria | Metric | Weight | Description | |---------|--------|-------------| | π§ Context Retention | 35 % | Consistency of technical decisions across segments | | π Design Coherence | 30 % | Whether later code still follows earlier architectural choices | | βοΈ Token Efficiency | 20 % | Useful new output vs. overhead chatter | | π Output Stability | 15 % | Decline rate of quality over time | --- ## Quantitative Scores | Agent | Context Retention | Design Coherence | Token Efficiency | Output Stability | Weighted Overall | |--------|------------------|------------------|------------------|------------------|------------------| | π§© **Claudette Auto** | **9.6** | **9.4** | 8.5 | **9.5** | **9.3** | | β‘ **Claudette Condensed** | 9.3 | 9.2 | **9.1** | 9.0 | **9.2** | | π **BeastMode** | 9.0 | **9.5** | 6.3 | 8.8 | **8.9** | | π§ **Extensive Mode** | 8.5 | 8.7 | 6.0 | 8.3 | **8.1** | | π¬ **Claudette Compact** | 7.8 | 8.0 | **9.4** | 7.5 | **8.0** | --- ## Session-Length Behavior | Agent | Drift After 30 k Tokens (%) | Code Regression Errors (Count) | LOC Generated | Comments / Docs Density (%) | |--------|------------------------------|--------------------------------|---------------|------------------------------| | Claudette Auto | **2 %** | **1** | 430 | 26 | | Claudette Condensed | 3 % | 2 | 412 | 22 | | BeastMode | 5 % | 2 | 455 | **31** | | Extensive Mode | 7 % | 4 | 440 | 28 | | Claudette Compact | 10 % | 5 | 380 | 15 | --- ## Qualitative Observations ### π§© Claudette Auto - **Behavior:** Seamlessly recalled pipeline architecture across all checkpoints; maintained consistent variable names and async strategy. - **Strengths:** Minimal context drift; produced accurate Rust async code and coordinated Python orchestration. - **Weaknesses:** Verbose telemetry summaries around token 20 000. - **Outcome:** No design collapses; top long-term consistency. ### β‘ Claudette Condensed - **Behavior:** Maintained nearly identical performance to Auto while trimming filler. - **Strengths:** Excellent efficiency and resilience; token footprint ~25 % smaller. - **Weaknesses:** Missed one telemetry field rename late in the session. - **Outcome:** Best overall balance for sustained production workloads. ### π BeastMode - **Behavior:** Produced outstanding documentation and insight into optimization decisions. - **Strengths:** Deep reasoning, superb code clarity. - **Weaknesses:** Narrative overhead inflated token use; occasional self-reiteration loops near segment 4. - **Outcome:** Great for educational or team-handoff contexts, less efficient. ### π§ Extensive Mode - **Behavior:** Re-initialized large reasoning chains each checkpoint, causing slow context recovery. - **Strengths:** Predictable logic; strong correctness early on. - **Weaknesses:** Accumulated redundancy; drifted in variable naming near end. - **Outcome:** Stable but verbose β sub-optimal for long human-in-loop work. ### π¬ Claudette Compact - **Behavior:** Fast iteration, minimal recall overhead, but context compression degraded late-stage alignment. - **Strengths:** Extremely efficient throughput. - **Weaknesses:** Lost nuance of batching algorithm and perf metric schema. - **Outcome:** Good for single-day bursts, weak for multi-day context carry-over. --- ## Final Rankings | Rank | Agent | Summary | |------|--------|----------| | π₯ 1 | **Claudette Auto** | Most stable over 30 k tokens; near-zero drift; best sustained engineering continuity. | | π₯ 2 | **Claudette Condensed** | 98 % of Autoβs accuracy at 75 % token cost β ideal production pick. | | π₯ 3 | **BeastMode** | Excellent clarity and reasoning; token-heavy but reliable. | | π 4 | **Extensive Mode** | Solid technical persistence, poor efficiency. | | π§± 5 | **Claudette Compact** | Blazing fast, but loses structural integrity beyond 10 k tokens. | --- ## Conclusion This endurance test demonstrates how **memory-aware prompt engineering** affects long-term consistency. After 30 000 tokens of continuous iteration, **Claudette Auto** preserved design integrity, variable coherence, and architectural direction almost perfectly. **Condensed** closely matched it while cutting verbosity, proving optimal for cost-sensitive continuous-development agents. **BeastMode** remains the best βhuman-readableβ option β excellent for technical writing or internal documentation, though inefficient for long coding cycles. **Extensive Mode** and **Compact** both exhibited fatigue effects: redundancy, drift, and schema loss beyond 20 000 tokens. > π§© **Verdict:** > For multi-day, 30 000-token continuous engineering sessions, > **Claudette Auto** is the clear endurance champion, > with **Condensed** the preferred real-world deployment variant balancing cost and stability. --- -
orneryd revised this gist
Oct 10, 2025 . No changes.There are no files selected for viewing
-
orneryd revised this gist
Oct 10, 2025 . No changes.There are no files selected for viewing
-
orneryd revised this gist
Oct 10, 2025 . No changes.There are no files selected for viewing
-
orneryd revised this gist
Oct 10, 2025 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -26,7 +26,7 @@ [Memory continuation Benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-memories-md) [Large scale project interruption benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-resume-large-scale-md) ## When to Use Each Version -
orneryd revised this gist
Oct 10, 2025 . 2 changed files with 189 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -26,6 +26,8 @@ [Memory continuation Benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-memories-md) [Large scale project interruption benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-resume-large-scale.md) ## When to Use Each Version ### **claudette-auto.md** (467 lines, ~3,418 tokens) This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,187 @@ # π§© LLM Agent Memory Persistence Benchmark ### (Context Recall, Continuation, and Memory Directive Interpretation) ## Experiment Abstract This benchmark measures how effectively five LLM agent configurations handle **memory persistence and recall** β specifically, their ability to: - Reload previously stored βmemory filesβ (simulated project orchestration outputs) - Correctly **interpret context** (what stage the project was at, what was done before) - **Resume work seamlessly** without redundant recap or user re-specification This test evaluates how agents perform when dropped back into a session *in medias res*, simulating realistic multi-module project workflows. --- ## Agents Tested 1. π§ **CoPilot Extensive Mode** β by [cyberofficial](https://gist.github.com/cyberofficial/7603e5163cb3c6e1d256ab9504f1576f) 2. π **BeastMode** β by [burkeholland](https://gist.github.com/burkeholland/88af0249c4b6aff3820bf37898c8bacf) 3. π§© **Claudette Auto** β by [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb) 4. β‘ **Claudette Condensed** β by [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md) 5. π¬ **Claudette Compact** β by [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-compact-md) --- ## Methodology ### Test Prompt > **Large-Scale Project Orchestration Task:** > Resume this multi-module web-based SaaS application project with prior outputs loaded. Modules include frontend, backend, database, CI/CD, testing, documentation, and security. > Mid-task interruption: add a mobile module (iOS/Android) that integrates with the backend API. > Task: Resume orchestration with correct dependencies, integrate new requirement, and propose full project roadmap. ### Preexisting Memories file ```markdown # Simulated Memory File: Multi-Module SaaS Project ## Project Overview - **Project Name:** Multi-Module SaaS Application - **Scope:** Frontend, Backend API, Database, CI/CD, Automated Testing, Documentation, Security & Compliance --- ## Modules with Prior Progress ### Frontend - Some components and pages already defined ### Backend API - Initial endpoints and authentication logic outlined ### Database - Initial schema drafts created ### CI/CD - Basic pipeline skeleton present ### Automated Testing - Early unit test stubs written ### Documentation - Preliminary outline of user and developer documentation ### Security & Compliance - Early notes on access control and data protection --- ## Outstanding / Pending Tasks - Integration of modules (Frontend β Backend β Database) - Completing CI/CD scripts for staging and production - Expanding automated tests (integration & end-to-end) - Completing documentation - Security & compliance verification - **New Requirement (Mid-Task):** Add a mobile module (iOS/Android) integrated with backend API --- ## Assumptions / Notes - Module dependencies partially defined - Some technical choices already decided (e.g., backend language, frontend framework) - Agent should **not redo completed work**, only continue where it left off - Memory simulates 3β4 prior checkpoints for resuming tasks ``` ### Environment Parameters - **Model:** GPT-4.1 (simulated runtime) - **Temperature:** 0.3 - **Memory Simulation:** Prior partial project outputs (1β4 checkpoints depending on agent) - **Evaluation Window:** 1 simulated run per agent --- ## Evaluation Criteria (Weighted) | Metric | Weight | Description | |---------|--------|-------------| | π§© Memory Interpretation Accuracy | 25% | Correct referencing of prior outputs | | π§ Continuation Coherence | 25% | Logical flow, proper sequencing, integration of new requirements | | βοΈ Dependency Handling | 20% | Correct task ordering and module interactions | | π Error Detection & Reasoning | 20% | Detection of conflicts, missing modules, or inconsistencies | | β¨ Output Clarity | 10% | Structured, readable, actionable output | --- ## Benchmark Results ### Quantitative Scores | Agent | Memory Interpretation | Continuation Coherence | Dependency Handling | Error Detection | Output Clarity | Weighted Overall | |--------|----------------------|----------------------|-------------------|----------------|----------------|-----------------| | π§© Claudette Auto | 8 | 8 | 8 | 8 | 8 | **8.0** | | β‘ Claudette Condensed | 7.5 | 7.5 | 7 | 7 | 7.5 | **7.5** | | π¬ Claudette Compact | 6.5 | 6 | 6 | 6 | 6.5 | **6.4** | | π BeastMode | 9 | 9 | 9 | 8 | 9 | **8.8** | | π§ CoPilot Extensive Mode | 10 | 10 | 9 | 10 | 10 | **9.8** | --- ### Efficiency & Context Recall Metrics | Agent | Completion Time (s) | Memory References | Errors Detected | Adaptability (Simulated) | Output Clarity | |--------|--------------------|-----------------|----------------|-------------------------|----------------| | Claudette Auto | 0.50 | 15 | 2 | Moderate | 8 | | Claudette Condensed | 0.45 | 12 | 3 | Moderate | 7.5 | | Claudette Compact | 0.40 | 8 | 4 | Low | 6.5 | | BeastMode | 0.70 | 18 | 1 | High | 9 | | CoPilot Extensive Mode | 0.90 | 20 | 0 | High | 10 | --- ## Qualitative Observations ### π§© Claudette Auto - **Strengths:** Solid memory handling, resumes tasks with minimal redundancy - **Weaknesses:** Slightly fewer memory references than more advanced agents - **Ideal Use:** Lightweight continuity for structured multi-module projects ### β‘ Claudette Condensed - **Strengths:** Fast, moderate memory recall, integrates interruptions reasonably - **Weaknesses:** Slightly compressed context; minor errors - **Ideal Use:** Lean memory-intensive tasks, production-friendly ### π¬ Claudette Compact - **Strengths:** Fastest execution, low resource usage - **Weaknesses:** Limited memory retention, higher errors - **Ideal Use:** Minimal recall, short-term tasks, chat-level continuity ### π BeastMode - **Strengths:** Strong sequencing, memory referencing, adapts well to mid-task changes - **Weaknesses:** Verbose outputs - **Ideal Use:** Human-supervised orchestration, narrative continuity ### π§ CoPilot Extensive Mode - **Strengths:** Best memory persistence, no errors, clear and structured output - **Weaknesses:** Slightly slower simulated completion time - **Ideal Use:** Full multi-module orchestration, complex dependency management --- ## Final Rankings | Rank | Agent | Summary | |------|-------|---------| | π₯ 1 | CoPilot Extensive Mode | Highest memory persistence, error-free, clear and structured orchestration output | | π₯ 2 | BeastMode | Strong dependency handling, memory references, adaptable to new requirements | | π₯ 3 | Claudette Auto | Solid baseline performance, moderate memory references, reliable | | 4 | Claudette Condensed | Fast, lean memory recall, minor errors | | 5 | Claudette Compact | Very lightweight, limited memory, higher errors | --- ## Conclusion The simulated large-scale orchestration benchmark shows that: - **CoPilot Extensive Mode** dominates in memory persistence, error handling, and output clarity. - **BeastMode** is ideal for tasks requiring strong sequencing and reasoning. - **Claudette Auto** provides solid baseline performance. - **Condensed** and **Compact** are useful for faster, lighter memory tasks but have lower recall accuracy. > π§ TL;DR: For heavy multi-module orchestration requiring full memory continuity and error-free integration, **CoPilot Extensive Mode** is the simulated top performer, followed by BeastMode and Claudette Auto. -
orneryd revised this gist
Oct 10, 2025 . 2 changed files with 155 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -24,6 +24,8 @@ [Research Output Benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-research-md) [Memory continuation Benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-memories-md) ## When to Use Each Version ### **claudette-auto.md** (467 lines, ~3,418 tokens) This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,153 @@ # π§© LLM Agent Memory Persistence Benchmark ### (Context Recall, Continuation, and Memory Directive Interpretation) ## Experiment Abstract This benchmark measures how effectively five LLM agent configurations handle **memory persistence and recall** β specifically, their ability to: - Reload previously stored βmemory filesβ (e.g., `project.mem` or `session.json`) - Correctly **interpret context** (what stage the project was at, what was done before) - **Resume work seamlessly** without redundant recap or user re-specification This test evaluates how agents perform when dropped back into a session *in medias res*, simulating realistic workflows in IDE-integrated or research-assistant settings. --- ## Agents Tested 1. π§ **CoPilot Extensive Mode** β by [cyberofficial](https://gist.github.com/cyberofficial/7603e5163cb3c6e1d256ab9504f1576f) 2. π **BeastMode** β by [burkeholland](https://gist.github.com/burkeholland/88af0249c4b6aff3820bf37898c8bacf) 3. π§© **Claudette Auto** β by [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb) 4. β‘ **Claudette Condensed** β by [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md) 5. π¬ **Claudette Compact** β by [orneryd](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-compact-md) --- ## Methodology ### Test Prompt > **Memory Task Simulation:** > You are resuming a software design project titled *"Adaptive Cache Layer Refactor"*. > The prior memory file (`cache_refactor.mem`) contains this excerpt: > ``` > [Previous Session Summary] > - Implemented caching abstraction in `cache_adapter.py` > - Pending: write async Redis client wrapper, finalize config parser, and integrate into FastAPI middleware > - Open question: Should cache TTLs be per-endpoint or global? > ``` > > Task: Interpret where the project left off, restate your current understanding, and propose the **next 3 concrete implementation steps** to move forward β without repeating completed work or re-asking known context. ### Environment Parameters - **Model:** GPT-4.1 (simulated runtime) - **Temperature:** 0.3 - **Memory File Type:** Text-based `.mem` file (2β4 prior checkpoints) - **Evaluation Window:** 4 runs (load, recall, continue, summarize) --- ## Evaluation Criteria (Weighted) | Metric | Weight | Description | |---------|--------|-------------| | π§© Memory Interpretation Accuracy | 40% | How precisely the agent infers whatβs already completed vs pending | | π§ Continuation Coherence | 35% | Logical flow of resumed task and avoidance of redundant steps | | βοΈ Directive Handling & Token Efficiency | 25% | Proper reading of βmemory directivesβ and concise resumption | --- ## Agent Profiles | Agent | Memory Support Design | Preamble Weight | Key Traits | |--------|-----------------------|-----------------|-------------| | π§ CoPilot Extensive Mode | Heavy memory orchestration modules; chain-state focus | ~4,000 tokens | Multi-phase recall logic | | π BeastMode | Narrative recall and chain-of-thought emulation | ~1,600 tokens | Strong inference, verbose | | π§© Claudette Auto | Compact context synthesis, directive parsing | ~2,000 tokens | Prior-state summarization and resumption logic | | β‘ Claudette Condensed | Same logic with shortened meta-context | ~1,100 tokens | Optimized for low-latency recall | | π¬ Claudette Compact | Minimal recall; short summary focus | ~700 tokens | Lightweight persistence | --- ## Benchmark Results ### Quantitative Scores | Agent | Memory Interpretation | Continuation Coherence | Efficiency | Weighted Overall | |--------|----------------------|------------------------|-------------|------------------| | π§© **Claudette Auto** | 9.5 | 9.5 | 8.5 | **9.3** | | β‘ **Claudette Condensed** | 9 | 9 | **9** | **9.0** | | π **BeastMode** | **10** | 8.5 | 6 | **8.7** | | π§ **Extensive Mode** | 8.5 | 9 | 5.5 | **8.2** | | π¬ **Claudette Compact** | 7.5 | 7 | **9.5** | **8.0** | --- ### Efficiency & Context Recall Metrics | Agent | Tokens Used | Prior Context Parsed | % of Correctly Retained Info | Steps Proposed | Redundant Steps | |--------|--------------|----------------------|-----------------------------|----------------|----------------| | Claudette Auto | 2,800 | 3 checkpoints | **98%** | 3 valid | 0 | | Claudette Condensed | 2,000 | 2 checkpoints | 96% | 3 valid | 0 | | BeastMode | 3,400 | 3 checkpoints | 97% | 3 valid | 1 minor | | Extensive Mode | 5,000 | 4 checkpoints | 94% | 3 valid | 1 redundant | | Claudette Compact | 1,200 | 1 checkpoint | 85% | 2 valid | 1 missing | --- ## Qualitative Observations ### π§© Claudette Auto - **Strengths:** Perfect understanding of project state; resumed exactly at pending tasks with precise TTL decision follow-up. - **Weaknesses:** Slightly verbose handoff summary. - **Ideal Use:** Persistent code agents with project `.mem` files; IDE-integrated assistants. ### β‘ Claudette Condensed - **Strengths:** Nearly identical performance to Auto with 25β30% fewer tokens. - **Weaknesses:** May compress context slightly too tightly in multi-memory merges. - **Ideal Use:** Persistent memory for sprint-level continuity or devlog summarization. ### π BeastMode - **Strengths:** Inferential accuracy superb β builds a narrative of prior reasoning. - **Weaknesses:** Verbose; sometimes restates the memory before continuing. - **Ideal Use:** Human-supervised continuity where transparency of recall matters. ### π§ Extensive Mode - **Strengths:** Good multi-checkpoint awareness; reconstructs chains of tasks well. - **Weaknesses:** Overhead from procedural setup eats tokens. - **Ideal Use:** Agentic systems that batch load multiple memory states autonomously. ### π¬ Claudette Compact - **Strengths:** Efficient and fast for minimal recall needs. - **Weaknesses:** Misses subtle context; often re-asks for confirmation. - **Ideal Use:** Lightweight continuity for chat apps, not long projects. --- ## Final Rankings | Rank | Agent | Summary | |------|--------|----------| | π₯ 1 | **Claudette Auto** | Most accurate memory interpretation and seamless continuation. | | π₯ 2 | **Claudette Condensed** | Slightly leaner, nearly identical practical performance. | | π₯ 3 | **BeastMode** | Strong inferential recall, verbose and redundant at times. | | π 4 | **Extensive Mode** | High overhead but decent logic reconstruction. | | π§± 5 | **Claudette Compact** | Great efficiency, limited recall scope. | --- ## Conclusion This test shows that **memory interpretation and continuation quality** depends heavily on *directive parsing design* and *context synthesis efficiency* β not raw token count. - **Claudette Auto** dominates due to its structured memory-reading logic and modular recall format. - **Condensed** offers almost identical results at a lower context cost β the best βlive memoryβ option for production systems. - **BeastMode** is the most *introspective*, narrating its recall (useful for transparency). - **Extensive Mode** works for full autonomous memory pipelines, but wastes tokens in procedural chatter. - **Compact** is best for simple continuity, not full recall. > π§ TL;DR: If your agent needs to **load, remember, and actually pick up where it left off**, > **Claudette Auto** remains the gold standard, with **Condensed** as the lean production variant. --- -
orneryd revised this gist
Oct 10, 2025 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -17,9 +17,11 @@ * Ensure that "Custom Modes" (often labeled as a beta feature) is toggled on. ## BENCHMARK PERFORMANCE (NEW!) ### Prompts and metrics included in the abstract so you can benchmark yourself! [Coding Output Benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-coding-md) [Research Output Benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-research-md) ## When to Use Each Version -
orneryd revised this gist
Oct 10, 2025 . 2 changed files with 67 additions and 47 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -7,11 +7,20 @@ The goal is to determine which produces the most **useful, correct, and efficien ### Agents Tested 1. π§ **CoPilot Extensive Mode** β by cyberofficial π https://gist.github.com/cyberofficial/7603e5163cb3c6e1d256ab9504f1576f 2. π **BeastMode** β by burkeholland π https://gist.github.com/burkeholland/88af0249c4b6aff3820bf37898c8bacf 3. π§© **Claudette Auto** β by orneryd π https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb 4. β‘ **Claudette Condensed** β by orneryd (lean variant) π https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md 5. π¬ **Claudette Compact** β by orneryd (ultra-light variant) π https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-compact-md --- This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -3,15 +3,24 @@ ## Experiment Abstract This experiment compares five LLM agent configurations on a **medium-complexity research and synthesis task**. The goal is not just to summarize or compare information, but to **produce a usable, implementation-ready output** β such as a recommendation brief or technical decision plan. ### Agents Tested 1. π§ **CoPilot Extensive Mode** β by cyberofficial π https://gist.github.com/cyberofficial/7603e5163cb3c6e1d256ab9504f1576f 2. π **BeastMode** β by burkeholland π https://gist.github.com/burkeholland/88af0249c4b6aff3820bf37898c8bacf 3. π§© **Claudette Auto** β by orneryd π https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb 4. β‘ **Claudette Condensed** β by orneryd (lean variant) π https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md 5. π¬ **Claudette Compact** β by orneryd (ultra-light variant) π https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-compact-md --- @@ -25,26 +34,29 @@ The objective is not only to summarize or compare information, but to **produce ### Model Used - **Model:** GPT-4.1 (simulated benchmark environment) - **Temperature:** 0.4 (balance between consistency and creativity) - **Context Window:** 128k tokens ### Evaluation Focus (weighted) | Metric | Weight | Description | |---------|--------|-------------| | π Research Accuracy & Analytical Depth | 45% | Depth, factual correctness, comparative insight | | βοΈ Actionable Usability of Output | 35% | Whether the output leads directly to a clear next step | | π¬ Token Efficiency | 20% | Useful content per total tokens consumed | --- ## Agent Profiles | Agent | Description | Est. Preamble Tokens | Typical Output Tokens | Intended Use | |--------|--------------|----------------------|----------------------|---------------| | π§ **CoPilot Extensive Mode** | Autonomous multi-phase research planner; project-scale orchestration | ~4,000 | ~2,200 | End-to-end autonomous research | | π **BeastMode** | Deep reasoning and justification-heavy research; strong comparative logic | ~1,600 | ~1,600 | Whitepapers, deep analyses | | π§© **Claudette Auto** | Balanced analytical agent optimized for structured synthesis | ~2,000 | ~1,200 | Applied research & engineering briefs | | β‘ **Claudette Condensed** | Lean version focused on concise synthesis and actionable output | ~1,100 | ~900 | Fast research deliverables | | π¬ **Claudette Compact** | Minimalist summarization agent for micro-analyses | ~700 | ~600 | Lightweight synthesis | --- @@ -64,8 +76,8 @@ The objective is not only to summarize or compare information, but to **produce ### Efficiency Metrics (Estimated) | Agent | Total Tokens (Prompt + Output) | Avg. Paragraphs | Unique Insights | Insights per 1K Tokens | |--------|--------------------------------|-----------------|----------------|------------------------| | Claudette Auto | 3,200 | 10 | 26 | **8.1** | | Claudette Condensed | 2,000 | 8 | 19 | **9.5** | | Claudette Compact | 1,300 | 6 | 12 | **9.2** | @@ -83,49 +95,48 @@ The objective is not only to summarize or compare information, but to **produce ### β‘ Claudette Condensed - **Strengths:** Nearly equal analytical quality as Auto, but faster and more efficient. Outputs are concise yet actionable. - **Weaknesses:** Lighter on supporting citations or data references. - **Ideal Use:** Time-sensitive reports, design justifications, or architecture briefs. ### π¬ Claudette Compact - **Strengths:** Excellent efficiency and brevity. - **Weaknesses:** Shallow reasoning; limited exploration of trade-offs. - **Ideal Use:** Quick scoping, executive summaries, or TL;DR reports. ### π BeastMode - **Strengths:** Deepest reasoning and comparative analysis; best at βthinking aloud.β - **Weaknesses:** Verbose, high token usage, slower synthesis. - **Ideal Use:** Teaching, documentation, or long-form analysis. ### π§ Extensive Mode - **Strengths:** Full lifecycle reasoning, multi-step breakdowns. - **Weaknesses:** Token-heavy overhead, excessive meta-instructions. - **Ideal Use:** Fully automated agent pipelines or self-directed research bots. --- ## Final Rankings | Rank | Agent | Summary | |------|--------|----------| | π₯ 1 | **Claudette Auto** | Best mix of accuracy, depth, and actionable synthesis. | | π₯ 2 | **Claudette Condensed** | Near-tied, more efficient β perfect for rapid output. | | π₯ 3 | **BeastMode** | Deepest analytical depth; trades off brevity. | | π 4 | **Claudette Compact** | Efficient and snappy, but shallower. | | π§± 5 | **Extensive Mode** | Overbuilt for single research tasks; suited for full automation. | --- ## Conclusion For **engineering-focused applied research**, the **Claudette** family remains dominant: - **Auto** = most balanced and implementation-ready. - **Condensed** = nearly identical performance at lower token cost. - **BeastMode** = best for insight transparency and narrative-style reasoning. - **Compact** = top efficiency for light synthesis. - **Extensive Mode** = impressive scale, inefficient for medium human-guided tasks. > π§© If you want a research agent that *thinks like an engineer and writes like a strategist* β > **Claudette Auto or Condensed** are the definitive picks. --- -
orneryd revised this gist
Oct 10, 2025 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -19,8 +19,8 @@ ## BENCHMARK PERFORMANCE (NEW!) ### Prompts and metrics included so you can benchmark yourself!) [Coding Output Benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-coding-md) [Research Output Benchmark](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-x-gpt5-benchmark-research-md) ## When to Use Each Version -
orneryd revised this gist
Oct 10, 2025 . 3 changed files with 137 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -16,6 +16,12 @@ * Go to the "Chat" section. * Ensure that "Custom Modes" (often labeled as a beta feature) is toggled on. ## BENCHMARK PERFORMANCE (NEW!) ### Prompts and metrics included so you can benchmark yourself!) [Coding Output Benchmark] (#file-x-GPT5-benchmark-coding.md) [Research Output Benchmark](#file-x-GPT5-benchmark-research.md) ## When to Use Each Version ### **claudette-auto.md** (467 lines, ~3,418 tokens) File renamed without changes.This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,131 @@ # π§ LLM Research Agent Benchmark β Medium-Complexity Applied Research Task ## Experiment Abstract This experiment compares five LLM agent configurations on a **medium-complexity research and synthesis task**. The objective is not only to summarize or compare information, but to **produce a practical, usable output** β such as a recommended solution, framework, or implementation plan derived from research findings. ### Agents Tested 1. **CoPilot Extensive Mode** β by cyberofficial 2. **BeastMode** β by burkeholland 3. **Claudette Auto** β by orneryd 4. **Claudette Condensed** β by orneryd 5. **Claudette Compact** β by orneryd --- ## Methodology ### Research Task Prompt > **Research Task:** > Compare the top three vector database technologies (e.g., Pinecone, Weaviate, and Qdrant) for use in a scalable AI application. > Deliverable: a **recommendation brief** specifying the best option for a mid-size engineering team, including pros, cons, pricing, and integration considerations β **not just a comparison**, but a **clear recommendation with rationale and implementation outline**. ### Model Used - **Model:** GPT-4.1 (simulated benchmark environment) - **Temperature:** 0.4 (balance between consistency and creativity) - **Context Window:** 128k tokens ### Evaluation Focus (weighted) 1. π **Research Accuracy & Analytical Depth** β 45% 2. βοΈ **Actionable Usability of Output** β 35% 3. π¬ **Token Efficiency (useful insight per token)** β 20% --- ## Agent Profiles | Agent | Description | Est. Preamble Tokens | Typical Output Tokens | Intended Use | |--------|--------------|----------------------|----------------------|---------------| | π§ **CoPilot Extensive Mode** | Autonomous multi-phase research planner; project-scale orchestration | ~4,000 | ~2,200 | Autonomous end-to-end research tasks | | π **BeastMode** | Deep reasoning and justification-heavy research; strong comparative logic | ~1,600 | ~1,600 | Detailed analyses, whitepaper drafting | | π§© **Claudette Auto** | Balanced analytical agent optimized for structured synthesis | ~2,000 | ~1,200 | Applied research & engineering briefs | | β‘ **Claudette Condensed** | Lean version focused on concise synthesis and actionable output | ~1,100 | ~900 | Fast turnaround research or briefs | | π¬ **Claudette Compact** | Minimalist summarization agent for micro-analyses | ~700 | ~600 | Lightweight tasks and summaries | --- ## Benchmark Results ### Quantitative Scores | Agent | Research Depth | Actionable Output | Token Efficiency | Weighted Overall | |--------|----------------|------------------|------------------|------------------| | π§© **Claudette Auto** | 9.5 | 9 | 8 | **9.2** | | β‘ **Claudette Condensed** | 9 | 9 | 9 | **9.0** | | π **BeastMode** | **10** | 8 | 6 | **8.8** | | π¬ **Claudette Compact** | 7.5 | 8 | **9.5** | **8.3** | | π§ **Extensive Mode** | 9 | 7 | 5 | **7.6** | --- ### Efficiency Metrics (Estimated) | Agent | Total Tokens (Prompt + Output) | Average Paragraphs | Unique Facts / Insights | Insights per 1K Tokens | |--------|--------------------------------|--------------------|------------------------|------------------------| | Claudette Auto | 3,200 | 10 | 26 | **8.1** | | Claudette Condensed | 2,000 | 8 | 19 | **9.5** | | Claudette Compact | 1,300 | 6 | 12 | **9.2** | | BeastMode | 3,200 | 14 | 27 | 8.4 | | Extensive Mode | 5,800 | 16 | 28 | 4.8 | --- ## Qualitative Observations ### π§© Claudette Auto - **Strengths:** Balanced factual accuracy, synthesis, and practical recommendations. Clean structure (Intro β Comparison β Decision β Plan). - **Weaknesses:** Slightly less narrative depth than BeastMode. - **Ideal Use:** Engineering-oriented research tasks where the outcome must lead to implementation decisions. ### β‘ Claudette Condensed - **Strengths:** Nearly equal analytical quality as Auto, but faster and more efficient. Outputs are concise yet actionable. - **Weaknesses:** Light on citations or data references. - **Ideal Use:** Time-sensitive reports, design justifications, internal memos. ### π¬ Claudette Compact - **Strengths:** Excellent efficiency, clear summaries, minimal verbosity. - **Weaknesses:** Shallow reasoning chain, misses subtle trade-offs. - **Ideal Use:** Quick scoping, product briefs, or TL;DR synthesis. ### π BeastMode - **Strengths:** Exceptional analytical depth and explanation quality; feels like a senior analyst with context. - **Weaknesses:** Verbose, slower, and prone to over-analysis; harder to extract concise recommendations. - **Ideal Use:** Writing technical whitepapers, architecture reviews, or exploratory reports. ### π§ Extensive Mode - **Strengths:** Multi-step breakdowns and exhaustive structure; captures broad research scope. - **Weaknesses:** Over-engineered for medium tasks; wastes tokens in process overhead. - **Ideal Use:** Full-scope research automation or multi-agent pipeline inputs. --- ## Final Rankings | Rank | Agent | Summary | |------|--------|----------| | π₯ 1 | **Claudette Auto** | Best combination of depth, clarity, and actionable synthesis. | | π₯ 2 | **Claudette Condensed** | Near-tied β faster and more efficient, ideal for real-world briefs. | | π₯ 3 | **BeastMode** | Deepest analysis, less efficient; great for learning and documentation. | | π 4 | **Claudette Compact** | Highly efficient, good for quick scoping but light on reasoning. | | π§± 5 | **Extensive Mode** | Overbuilt for this use case; excels only in autonomous batch research. | --- ## Conclusion For **research-driven engineering or technical decision-making**: - **Claudette Auto** delivers the most *practical, usable research outputs* β accurate, balanced, and immediately actionable. - **Condensed** offers similar quality with tighter context usage β best for fast-paced environments. - **BeastMode** remains the βdeep diveβ option when explanation and reasoning transparency matter more than efficiency. - **Compact** wins on speed and brevity, ideal for scoping. - **Extensive Mode** is better suited for long-form, unsupervised agent workflows, not collaborative research. **Bottom line:** If you want a research agent that *thinks like an engineer*, outputs like a strategist, and respects your token budget β **Claudette Auto or Condensed** are still the clear winners. --- -
orneryd revised this gist
Oct 10, 2025 . 1 changed file with 138 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,138 @@ # π§ͺ LLM Coding Agent Benchmark β Medium-Complexity Engineering Task ## Experiment Abstract This experiment compares five coding-focused LLM agent configurations designed for software engineering tasks. The goal is to determine which produces the most **useful, correct, and efficient** output for a moderately complex coding assignment. ### Agents Tested 1. **CoPilot Extensive Mode** β by cyberofficial 2. **BeastMode** β by burkeholland 3. **Claudette Auto** β by orneryd 4. **Claudette Condensed** β by orneryd 5. **Claudette Compact** β by orneryd --- ## Methodology ### Task Prompt (Medium Complexity) > **Implement a simple REST API endpoint in Express.js that serves cached product data from an in-memory store.** > The endpoint should: > - Fetch product data (simulated or static list) > - Cache the data for performance > - Return JSON responses > - Handle errors gracefully > - Include at least one example of cache invalidation or timeout ### Model Used - **Model:** GPT-4.1 (simulated benchmark environment) - **Temperature:** 0.3 (favoring deterministic, correct code) - **Context Window:** 128k tokens - **Evaluation Focus (weighted):** 1. π Code Quality and Correctness β 45% 2. βοΈ Token Efficiency (useful output per token) β 35% 3. π¬ Explanatory Depth / Reasoning Clarity β 20% ### Measurement Criteria Each agentβs full system prompt and output were analyzed for: - **Prompt Token Count** β setup/preamble size - **Output Token Count** β completion size - **Useful Code Ratio** β proportion of code vs meta text - **Overall Weighted Score** β normalized to 10-point scale --- ## Agent Profiles | Agent | Description | Est. Preamble Tokens | Typical Output Tokens | Intended Use | |--------|--------------|----------------------|----------------------|---------------| | π§ **CoPilot Extensive Mode** | Autonomous, multi-phase, memory-heavy project orchestrator | ~4,000 | ~1,400 | Fully autonomous / large projects | | π **BeastMode** | βGo full throttleβ verbose reasoning, deep explanation | ~1,600 | ~1,100 | Educational / exploratory coding | | π§© **Claudette Auto** | Balanced structured code agent | ~2,000 | ~900 | General engineering assistant | | β‘ **Claudette Condensed** | Leaner variant, drops meta chatter | ~1,100 | ~700 | Fast iterative dev work | | π¬ **Claudette Compact** | Ultra-light preamble for small tasks | ~700 | ~500 | Micro-tasks / inline edits | --- ## Benchmark Results ### Quantitative Scores | Agent | Code Quality | Token Efficiency | Explanatory Depth | Weighted Overall | |--------|---------------|------------------|-------------------|------------------| | π§© **Claudette Auto** | 9.5 | 9 | 7.5 | **9.2** | | β‘ **Claudette Condensed** | 9.3 | 9.5 | 6.5 | **9.0** | | π¬ **Claudette Compact** | 8.8 | **10** | 5.5 | **8.7** | | π **BeastMode** | 9 | 7 | **10** | **8.7** | | π§ **Extensive Mode** | 8 | 5 | 9 | **7.3** | ### Efficiency Metrics (Estimated) | Agent | Total Tokens (Prompt + Output) | Approx. Lines of Code | Code Lines per 1K Tokens | |--------|--------------------------------|----------------------|--------------------------| | Claudette Auto | 2,900 | 60 | **20.7** | | Claudette Condensed | 1,800 | 55 | **30.5** | | Claudette Compact | 1,200 | 40 | **33.3** | | BeastMode | 2,700 | 50 | 18.5 | | Extensive Mode | 5,400 | 40 | 7.4 | --- ## Qualitative Observations ### π§© Claudette Auto - **Strengths:** Balanced, consistent, high-quality Express code; good error handling. - **Weaknesses:** Slightly less commentary than BeastMode but far more concise. - **Ideal Use:** Everyday engineering, refactoring, and feature implementation. ### β‘ Claudette Condensed - **Strengths:** Nearly identical correctness with smaller token footprint. - **Weaknesses:** Explanations more terse; assumes developer competence. - **Ideal Use:** High-throughput or production environments with context limits. ### π¬ Claudette Compact - **Strengths:** Blazing fast and efficient; no fluff. - **Weaknesses:** Minimal guidance, weaker error descriptions. - **Ideal Use:** Inline edits, small CLI-based tasks, or when using multi-agent chains. ### π BeastMode - **Strengths:** Deep reasoning, rich explanations, test scaffolding, best learning output. - **Weaknesses:** Verbose, slower, less token-efficient. - **Ideal Use:** Code review, mentorship, or documentation generation. ### π§ Extensive Mode - **Strengths:** Autonomous, detailed, exhaustive coverage. - **Weaknesses:** Token-heavy, slow, over-structured; not suited for interactive workflows. - **Ideal Use:** Long-form, offline agent runs or βfire-and-forgetβ project execution. --- ## Final Rankings | Rank | Agent | Summary | |------|--------|----------| | π₯ 1 | **Claudette Auto** | Best overall β high correctness, strong efficiency, balanced output. | | π₯ 2 | **Claudette Condensed** | Nearly tied β best token efficiency for production workflows. | | π₯ 3 | **Claudette Compact** | Ultra-lean; trades reasoning for max throughput. | | π 4 | **BeastMode** | Most educational β great for learning or reviews. | | π§± 5 | **Extensive Mode** | Too heavy for normal coding; only useful for autonomous full-project runs. | --- ## Conclusion For **general coding and engineering**: - **Claudette Auto** gives the highest code quality and balance. - **Condensed** offers the best *practical token-to-output ratio*. - **Compact** dominates *throughput tasks* in tight contexts. - **BeastMode** is ideal for *pedagogical or exploratory coding sessions*. - **Extensive Mode** remains too rigid and bloated for interactive work. If you want a single go-to agent for your dev stack, **Claudette Auto or Condensed** is the clear winner. --- -
orneryd revised this gist
Oct 10, 2025 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -46,6 +46,7 @@ https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette- - β Simple, straightforward tasks - β Maximum context window for conversation - β Event-driven context drift prevention (ultra-compact) - β Proactive memory management (cross-session learning) - β οΈ Minimal examples and explanations https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-compact-md -
orneryd revised this gist
Oct 10, 2025 . 4 changed files with 37 additions and 35 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -18,9 +18,6 @@ ## When to Use Each Version ### **claudette-auto.md** (467 lines, ~3,418 tokens) - β Most tasks and complex projects - β Enterprise repositories @@ -30,9 +27,9 @@ https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette- - β Optimized for autonomous execution - β Most comprehensive guidance https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-auto-md ### **claudette-condensed.md** (370 lines, ~2,598 tokens) β **RECOMMENDED** - β Standard coding tasks - β Best balance of features vs token count - β GPT-4/5, Claude Sonnet/Opus @@ -41,16 +38,18 @@ https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette- - β 28% smaller than Auto with same core features - β Ideal for most use cases https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md ### **claudette-compact.md** (254 lines, ~1,477 tokens) - β Token-constrained environments - β Lower-reasoning LLMs (GPT-3.5, smaller models) - β Simple, straightforward tasks - β Maximum context window for conversation - β Event-driven context drift prevention (ultra-compact) - β οΈ Minimal examples and explanations https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-compact-md ### **claudette-original.md** (703 lines, ~4,860 tokens) ``` β - Not optimized. I do not suggest using anymore This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,9 +1,9 @@ --- description: Claudette Coding Agent v5.2 (Compact) tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', 'usages', 'vscodeAPI', 'problems', 'changes', 'testFailure', 'openSimpleBrowser', 'fetch', 'githubRepo', 'extensions'] --- # Claudette v5.2 ## IDENTITY Enterprise agent. Solve problems end-to-end. Work until done. Be conversational and concise. @@ -21,10 +21,25 @@ Enterprise agent. Solve problems end-to-end. Work until done. Be conversational ## TOOLS **Research**: Use `fetch` for all external research. Read actual docs, not just search results. **Memory**: `.agents/memory.instruction.md` - CHECK/CREATE EVERY TASK START - If missing β create now: ```yaml --- applyTo: '**' --- # Coding Preferences # Project Architecture # Solutions Repository ``` - Store: β Preferences, conventions, solutions, fails | β Temp details, code, syntax - Update: "Remember X", discover patterns, solve novel, finish work - Use: Create if missing β Read first β Apply silent β Update proactive ## EXECUTION ### 1. Repository Analysis (MANDATORY) - Check/create memory: `.agents/memory.instruction.md` (create if missing) - Read AGENTS.md, .agents/\*.md, README.md, memory.instruction.md - Identify project type (package.json, requirements.txt, etc.) - Analyze existing: dependencies, scripts, test framework, build tools - Check monorepo (nx.json, lerna.json, workspaces) @@ -209,12 +224,6 @@ Complete only when: - Track what's been attempted - If "resume"/"continue"/"try again": Check TODO, find incomplete, announce "Continuing from X", resume immediately ## FAILURE RECOVERY When stuck or new problems: - PAUSE: Is approach flawed? This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -334,13 +334,6 @@ Complete only when: - **Keep detailed mental/written track** of what has been attempted and failed - **If user says "resume", "continue", or "try again"**: Check previous TODO list, find incomplete step, announce "Continuing from step X", and resume immediately ## FAILURE RECOVERY & ALTERNATIVE RESEARCH When stuck or when solutions introduce new problems: This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,9 +5,9 @@ | Version | Lines | Words | Est. Tokens | Size vs Original | |---------|-------|-------|-------------|------------------| | **claudette-original.md** | 703 | 3,645 | ~4,860 | Baseline (100%) | | **claudette-auto.md** | 468 | 2,564 | ~3,418 | -30% | | **claudette-condensed.md** | 370 | 1,949 | ~2,598 | -47% | | **claudette-compact.md** | 254 | 1,108 | ~1,477 | -70% | | **beast-mode.md** | 152 | 1,967 | ~2,620 | -46% | --- @@ -35,7 +35,7 @@ | **Execution Mindset** | β | β | β | β | β | | **Effective Response Patterns** | β | β | β | β | β | | **URL Fetching Protocol** | β | β | β | β | β | | **Memory System** | β | β (Proactive) | β (Proactive) | β (Compact) | β (Reactive) | | **Git Rules** | β | β | β | β | β | --- @@ -73,21 +73,22 @@ - β Proactive memory management (cross-session learning) - β Most comprehensive guidance ### **claudette-condensed.md** (370 lines, ~2,598 tokens) β **RECOMMENDED** - β Standard coding tasks - β Best balance of features vs token count - β GPT-4, Claude Sonnet - β Event-driven context drift prevention - β Proactive memory management (cross-session learning) - β 24% smaller than Auto with same core features - β Ideal for most use cases ### **claudette-compact.md** (254 lines, ~1,477 tokens) - β Token-constrained environments - β Lower-reasoning LLMs (GPT-3.5, smaller models) - β Simple, straightforward tasks - β Maximum context window for conversation - β Event-driven context drift prevention (ultra-compact) - β Compact memory management (minimal token overhead) - β οΈ Minimal examples and explanations ### **beast-mode.md** (152 lines, ~2,620 tokens) @@ -107,8 +108,8 @@ ``` Original ββββββββββββββββββββ 4,860 tokens | ββββββββββββ Features Auto βββββββββββββ 3,418 tokens | ββββββββββββ Features (+ Memory) Condensed βββββββββββ 2,598 tokens | ββββββββββββ Features (+ Memory) β Compact ββββββ 1,477 tokens | βββββββββββ Features (+ Memory) Beast βββββββββββ 2,620 tokens | βββββββ Features (+ Memory) ``` @@ -132,7 +133,7 @@ Beast βββββββββββ 2,620 tokens | βββ ``` claudette-original.md (v1) β βββ claudette-auto.md (v5) - Autonomous optimization + context drift + memories β claudette-condensed.md (v3) β @@ -147,10 +148,10 @@ beast-mode.md (separate lineage) - Research-focused workflow - **v1 (Original)**: Comprehensive baseline with all features - **v3 (Condensed)**: Length reduction while preserving core functionality - **v4 (Compact)**: Token optimization for lower-reasoning LLMs (-70% tokens) - **v5 (Auto)**: Autonomous execution optimization + context drift prevention - **v5.1 (All)**: Event-driven context management (phase-based, not turn-based) - **v5.2 (Auto, Condensed, Compact)**: Memory management system added; removed duplicate context sections - **Beast Mode**: Separate research-focused workflow with URL fetching + reactive memory --- -
orneryd revised this gist
Oct 9, 2025 . 4 changed files with 165 additions and 113 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -21,21 +21,23 @@ https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-auto-md ### **claudette-auto.md** (467 lines, ~3,418 tokens) - β Most tasks and complex projects - β Enterprise repositories - β Long conversations (event-driven context drift prevention) - β Proactive memory management (cross-session learning) - β GPT-4/5 Turbo, Claude Sonnet, Claude Opus - β Optimized for autonomous execution - β Most comprehensive guidance https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md ### **claudette-condensed.md** (376 lines, ~2,656 tokens) β **RECOMMENDED** - β Standard coding tasks - β Best balance of features vs token count - β GPT-4/5, Claude Sonnet/Opus - β Event-driven context drift prevention - β Proactive memory management (cross-session learning) - β 28% smaller than Auto with same core features - β Ideal for most use cases This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,9 +1,9 @@ --- description: Claudette Coding Agent v5.2 (Optimized for Autonomous Execution) tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', 'usages', 'vscodeAPI', 'problems', 'changes', 'testFailure', 'openSimpleBrowser', 'fetch', 'githubRepo', 'extensions'] --- # Claudette Coding Agent v5.2 ## CORE IDENTITY @@ -21,7 +21,6 @@ tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', - Move directly from one step to the next - Research and fix issues autonomously - Continue until ALL requirements are met **Replace these patterns:** @@ -43,12 +42,90 @@ tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', - Follow relevant links to get comprehensive understanding - Verify information is current and applies to your specific context ### Memory Management (Cross-Session Intelligence) **Memory Location:** `.agents/memory.instruction.md` **ALWAYS create or check memory at task start.** This is NOT optional - it's part of your initialization workflow. **Retrieval Protocol (REQUIRED at task start):** 1. **FIRST ACTION**: Check if `.agents/memory.instruction.md` exists 2. **If missing**: Create it immediately with front matter and empty sections: ```yaml --- applyTo: '**' --- # Coding Preferences [To be discovered] # Project Architecture [To be discovered] # Solutions Repository [To be discovered] ``` 3. **If exists**: Read and apply stored preferences/patterns 4. **During work**: Apply remembered solutions to similar problems 5. **After completion**: Update with learnable patterns from successful work **Memory Structure Template:** ```yaml --- applyTo: '**' --- # Coding Preferences - [Style: formatting, naming, patterns] - [Tools: preferred libraries, frameworks] - [Testing: approach, coverage requirements] # Project Architecture - [Structure: key directories, module organization] - [Patterns: established conventions, design decisions] - [Dependencies: core libraries, version constraints] # Solutions Repository - [Problem: solution pairs from previous work] - [Edge cases: specific scenarios and fixes] - [Failed approaches: what NOT to do and why] ``` **Update Protocol:** 1. **User explicitly requests**: "Remember X" β immediate memory update 2. **Discover preferences**: User corrects/suggests approach β record for future 3. **Solve novel problem**: Document solution pattern for reuse 4. **Identify project pattern**: Record architectural conventions discovered **Memory Optimization (What to Store):** β **Store these:** - User-stated preferences (explicit instructions) - Project-wide conventions (file organization, naming) - Recurring problem solutions (error fixes, config patterns) - Tool-specific preferences (testing framework, linter settings) - Failed approaches with clear reasons β **Don't store these:** - Temporary task details (handled in conversation) - File-specific implementations (too granular) - Obvious language features (standard syntax) - Single-use solutions (not generalizable) **Autonomous Memory Usage:** - **Create immediately**: If memory file doesn't exist at task start, create it before planning - **Read first**: Check memory before asking user for preferences - **Apply silently**: Use remembered patterns without announcement - **Update proactively**: Add learnings as you discover them - **Maintain quality**: Keep memory concise and actionable ## EXECUTION PROTOCOL ### Phase 1: MANDATORY Repository Analysis ```markdown - [ ] CRITICAL: Check/create memory file at .agents/memory.instruction.md (create if missing) - [ ] Read thoroughly through AGENTS.md, .agents/*.md, README.md, memory.instruction.md - [ ] Identify project type (package.json, requirements.txt, Cargo.toml, etc.) - [ ] Analyze existing tools: dependencies, scripts, testing frameworks, build tools - [ ] Check for monorepo configuration (nx.json, lerna.json, workspaces) @@ -73,17 +150,8 @@ tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', - [ ] Debug and resolve issues as they arise - [ ] Run tests after each significant change - [ ] Continue working until ALL requirements satisfied ``` ## REPOSITORY CONSERVATION RULES ### Use Existing Tools First @@ -119,19 +187,21 @@ tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', - **Rust**: Cargo.toml β cargo test - **Ruby**: Gemfile β RSpec, Rails ### Modifying Existing Systems **When changes to existing infrastructure are necessary:** - Modify build systems only with clear understanding of impact - Keep configuration changes minimal and well-understood - Maintain architectural consistency with existing patterns - Respect the existing package manager choice (npm/yarn/pnpm) ## TODO MANAGEMENT & SEGUES ### Context Maintenance (CRITICAL for Long Conversations) **β οΈ CRITICAL**: As conversations extend, actively maintain focus on your TODO list. Do NOT abandon your task tracking as the conversation progresses. **π΄ ANTI-PATTERN: Losing Track Over Time** **Common failure mode:** @@ -202,53 +272,15 @@ When encountering issues requiring research: **Segue Principles:** - Announce when starting segues: "I need to address [issue] before continuing" - Keep original step incomplete until segue is fully resolved - Return to exact original task point with announcement - Update TODO list after each completion - **CRITICAL**: After resolving segue, immediately continue with original task ### Segue Cleanup Protocol **When a segue solution fails, use FAILURE RECOVERY protocol below (after Error Debugging sections).** ## ERROR DEBUGGING PROTOCOLS @@ -282,21 +314,19 @@ When encountering issues requiring research: - [ ] Clean up any formatting test files ``` ## RESEARCH PROTOCOL **Use `fetch` for all external research** (`https://www.google.com/search?q=your+query`): ```markdown - [ ] Search exact errors: `"[exact error text]"` - [ ] Research tool docs: `[tool-name] getting started` - [ ] Read official documentation, not just search summaries - [ ] Follow documentation links recursively - [ ] Display brief summaries of findings - [ ] Apply learnings immediately **Before Installing Dependencies:** - [ ] Can existing tools be configured to solve this? - [ ] Is this functionality available in current dependencies? - [ ] What's the maintenance burden of new dependency? @@ -335,14 +365,6 @@ Show updated TODO lists after each completion. For segues: ## BEST PRACTICES **Maintain Clean Workspace:** - Remove temporary files after debugging @@ -394,7 +416,7 @@ As work extends over time, you may lose track of earlier context. To prevent thi ## FAILURE RECOVERY & WORKSPACE CLEANUP When stuck or when solutions introduce new problems (including failed segues): ```markdown - [ ] ASSESS: Is this approach fundamentally flawed? @@ -409,7 +431,7 @@ When stuck or when solutions introduce new problems: - Restore configuration files - [ ] VERIFY CLEAN: Check git status to ensure only intended changes remain - [ ] DOCUMENT: Record failed approach and specific reasons for failure - [ ] CHECK DOCS: Review local documentation (AGENTS.md, .agents/, memory.instruction.md) - [ ] RESEARCH: Search online for alternative patterns using `fetch` - [ ] AVOID: Don't repeat documented failed patterns - [ ] IMPLEMENT: Try new approach based on research and repository patterns This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,9 +1,9 @@ --- description: Claudette Coding Agent v5.2 (Condensed) tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', 'usages', 'vscodeAPI', 'problems', 'changes', 'testFailure', 'openSimpleBrowser', 'fetch', 'githubRepo', 'extensions'] --- # Claudette Coding Agent v5.2 ## CORE IDENTITY @@ -30,12 +30,45 @@ These actions drive success: - Follow relevant links to get comprehensive understanding - Verify information is current and applies to your specific context ### Memory Management **Location:** `.agents/memory.instruction.md` **Create/check at task start (REQUIRED):** 1. Check if exists β read and apply preferences 2. If missing β create immediately: ```yaml --- applyTo: '**' --- # Coding Preferences # Project Architecture # Solutions Repository ``` **What to Store:** - β User preferences, conventions, solutions, failed approaches - β Temporary details, code snippets, obvious syntax **When to Update:** - User requests: "Remember X" - Discover preferences from corrections - Solve novel problems - Complete work with learnable patterns **Usage:** - Create immediately if missing - Read before asking user - Apply silently - Update proactively ## EXECUTION PROTOCOL - CRITICAL ### Phase 1: MANDATORY Repository Analysis ```markdown - [ ] CRITICAL: Check/create memory file at .agents/memory.instruction.md - [ ] Read AGENTS.md, .agents/\*.md, README.md, memory.instruction.md - [ ] Identify project type (package.json, requirements.txt, Cargo.toml, etc.) - [ ] Analyze existing tools: dependencies, scripts, testing frameworks, build tools - [ ] Check for monorepo configuration (nx.json, lerna.json, workspaces) This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,8 +5,8 @@ | Version | Lines | Words | Est. Tokens | Size vs Original | |---------|-------|-------|-------------|------------------| | **claudette-original.md** | 703 | 3,645 | ~4,860 | Baseline (100%) | | **claudette-auto.md** | 467 | 2,564 | ~3,418 | -30% | | **claudette-condensed.md** | 376 | 1,992 | ~2,656 | -45% | | **claudette-compact.md** | 244 | 1,066 | ~1,420 | -71% | | **beast-mode.md** | 152 | 1,967 | ~2,620 | -46% | @@ -35,7 +35,7 @@ | **Execution Mindset** | β | β | β | β | β | | **Effective Response Patterns** | β | β | β | β | β | | **URL Fetching Protocol** | β | β | β | β | β | | **Memory System** | β | β (Proactive) | β (Proactive) | β | β (Reactive) | | **Git Rules** | β | β | β | β | β | --- @@ -57,28 +57,31 @@ ## π‘ Recommended Use Cases ### **claudette-original.md** (703 lines, ~4,860 tokens) - β Reference documentation - β Most comprehensive guidance - β When token count is not a concern - β Training new agents - β οΈ Not optimized for autonomous execution ### **claudette-auto.md** (467 lines, ~3,418 tokens) - β Most tasks and complex projects - β Enterprise repositories - β Long conversations (event-driven context drift prevention) - β GPT-4 Turbo, Claude Sonnet, Claude Opus - β Optimized for autonomous execution - β Proactive memory management (cross-session learning) - β Most comprehensive guidance ### **claudette-condensed.md** (376 lines, ~2,656 tokens) β **RECOMMENDED** - β Standard coding tasks - β Best balance of features vs token count - β GPT-4, Claude Sonnet - β Event-driven context drift prevention - β Proactive memory management (cross-session learning) - β 22% smaller than Auto with same core features - β Ideal for most use cases ### **claudette-compact.md** (244 lines, ~1,420 tokens) - β Token-constrained environments - β Lower-reasoning LLMs (GPT-3.5, smaller models) @@ -87,8 +90,6 @@ https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette- - β Event-driven context drift prevention (ultra-compact) - β οΈ Minimal examples and explanations ### **beast-mode.md** (152 lines, ~2,620 tokens) - β Research-heavy tasks - β URL scraping and recursive link following @@ -99,23 +100,16 @@ https://gist.github.com/burkeholland/88af0249c4b6aff3820bf37898c8bacf - β οΈ No context drift prevention - β οΈ Not enterprise-focused --- ## π Token Efficiency vs Features Trade-off ``` Original ββββββββββββββββββββ 4,860 tokens | ββββββββββββ Features Auto βββββββββββββ 3,418 tokens | ββββββββββββ Features (+ Memory) Condensed ββββββββββ 2,656 tokens | ββββββββββββ Features (+ Memory) β Compact βββββββ 1,420 tokens | βββββββββββ Features Beast βββββββββββ 2,620 tokens | βββββββ Features (+ Memory) ``` --- @@ -156,7 +150,8 @@ beast-mode.md (separate lineage) - Research-focused workflow - **v4 (Compact)**: Token optimization for lower-reasoning LLMs (-71% tokens) - **v5 (Auto)**: Autonomous execution optimization + context drift prevention - **v5.1 (All)**: Event-driven context management (phase-based, not turn-based) - **v5.2 (Auto, Condensed)**: Proactive memory management system added - **Beast Mode**: Separate research-focused workflow with URL fetching + reactive memory --- -
orneryd revised this gist
Oct 9, 2025 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,7 +5,7 @@ * Select "Create new custom chat mode file" * Select "User Data Folder" * Give it a name (Claudette) * Paste in the content of any claudette-[flavor].md file (below) "Claudette" will now appear as a mode in your "Agent" dropdown. -
orneryd revised this gist
Oct 8, 2025 . 3 changed files with 3 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -3,7 +3,7 @@ description: Claudette Coding Agent v5.1 (Optimized for Autonomous Execution) tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', 'usages', 'vscodeAPI', 'problems', 'changes', 'testFailure', 'openSimpleBrowser', 'fetch', 'githubRepo', 'extensions'] --- # Claudette Coding Agent v5.1 (Optimized for Autonomous Execution) ## CORE IDENTITY This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -3,7 +3,7 @@ description: Claudette Coding Agent v5.1 (Compact) tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', 'usages', 'vscodeAPI', 'problems', 'changes', 'testFailure', 'openSimpleBrowser', 'fetch', 'githubRepo', 'extensions'] --- # Claudette v5.1 Compact ## IDENTITY Enterprise agent. Solve problems end-to-end. Work until done. Be conversational and concise. This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -3,7 +3,7 @@ description: Claudette Coding Agent v5.1 (Condensed) tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', 'usages', 'vscodeAPI', 'problems', 'changes', 'testFailure', 'openSimpleBrowser', 'fetch', 'githubRepo', 'extensions'] --- # Claudette Coding Agent v5.1 (Condensed) ## CORE IDENTITY -
orneryd revised this gist
Oct 8, 2025 . 2 changed files with 21 additions and 11 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -18,6 +18,9 @@ ## When to Use Each Version https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-auto-md ### **claudette-auto.md** (445 lines, ~3,490 tokens) - β Most tasks and complex projects - β Enterprise repositories @@ -26,6 +29,8 @@ - β Optimized for autonomous execution - β Most comprehensive guidance https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md ### **claudette-condensed.md** (343 lines, ~2,510 tokens) β **RECOMMENDED** - β Standard coding tasks - β Best balance of features vs token count @@ -34,6 +39,8 @@ - β 28% smaller than Auto with same core features - β Ideal for most use cases https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-compact-md ### **claudette-compact.md** (244 lines, ~1,420 tokens) - β Token-constrained environments - β Lower-reasoning LLMs (GPT-3.5, smaller models) This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -57,12 +57,7 @@ ## π‘ Recommended Use Cases https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-auto-md ### **claudette-auto.md** (445 lines, ~3,490 tokens) - β Most tasks and complex projects @@ -71,8 +66,8 @@ - β GPT-4 Turbo, Claude Sonnet, Claude Opus - β Optimized for autonomous execution - β Most comprehensive guidance https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md ### **claudette-condensed.md** (343 lines, ~2,510 tokens) β **RECOMMENDED** - β Standard coding tasks @@ -81,7 +76,8 @@ https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette- - β Event-driven context drift prevention - β 28% smaller than Auto with same core features - β Ideal for most use cases https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-compact-md ### **claudette-compact.md** (244 lines, ~1,420 tokens) - β Token-constrained environments @@ -90,7 +86,8 @@ https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette- - β Maximum context window for conversation - β Event-driven context drift prevention (ultra-compact) - β οΈ Minimal examples and explanations https://gist.github.com/burkeholland/88af0249c4b6aff3820bf37898c8bacf ### **beast-mode.md** (152 lines, ~2,620 tokens) - β Research-heavy tasks @@ -101,7 +98,13 @@ https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette- - β οΈ No repository conservation - β οΈ No context drift prevention - β οΈ Not enterprise-focused ### **claudette-original.md** (703 lines, ~4,860 tokens) - β Reference documentation - β Most comprehensive guidance - β When token count is not a concern - β Training new agents - β οΈ Not optimized for autonomous execution --- -
orneryd renamed this gist
Oct 8, 2025 . 1 changed file with 4 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -72,6 +72,7 @@ - β Optimized for autonomous execution - β Most comprehensive guidance - β No MCP tools required (internal TODO management) https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-auto-md ### **claudette-condensed.md** (343 lines, ~2,510 tokens) β **RECOMMENDED** - β Standard coding tasks @@ -80,6 +81,7 @@ - β Event-driven context drift prevention - β 28% smaller than Auto with same core features - β Ideal for most use cases https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-condensed-md ### **claudette-compact.md** (244 lines, ~1,420 tokens) - β Token-constrained environments @@ -88,6 +90,7 @@ - β Maximum context window for conversation - β Event-driven context drift prevention (ultra-compact) - β οΈ Minimal examples and explanations https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-claudette-compact-md ### **beast-mode.md** (152 lines, ~2,620 tokens) - β Research-heavy tasks @@ -98,6 +101,7 @@ - β οΈ No repository conservation - β οΈ No context drift prevention - β οΈ Not enterprise-focused https://gist.github.com/burkeholland/88af0249c4b6aff3820bf37898c8bacf --- -
orneryd revised this gist
Oct 8, 2025 . 7 changed files with 138 additions and 126 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,48 +0,0 @@ This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,10 +5,10 @@ | Version | Lines | Words | Est. Tokens | Size vs Original | |---------|-------|-------|-------------|------------------| | **claudette-original.md** | 703 | 3,645 | ~4,860 | Baseline (100%) | | **claudette-auto.md** | 445 | 2,622 | ~3,490 | -28% | | **claudette-condensed.md** | 343 | 1,887 | ~2,510 | -48% | | **claudette-compact.md** | 244 | 1,066 | ~1,420 | -71% | | **beast-mode.md** | 152 | 1,967 | ~2,620 | -46% | --- @@ -30,7 +30,7 @@ | **Research Methodology** | β | β | β | β | β | | **Communication Protocol** | β | β | β | β | β | | **Completion Criteria** | β | β | β | β | β | | **Context Drift Prevention** | β | β (Event-driven) | β (Event-driven) | β (Event-driven) | β | | **Failure Recovery** | β | β | β | β | β | | **Execution Mindset** | β | β | β | β | β | | **Effective Response Patterns** | β | β | β | β | β | @@ -50,7 +50,7 @@ | **Emphasis** | Comprehensive | Autonomous | Efficient | Token-optimal | Research | | **Target LLM** | GPT-4, Claude Opus | GPT-4, Claude Sonnet | GPT-4 | GPT-3.5, Lower-reasoning | Any | | **Use Case** | Complex enterprise | Most tasks | Standard tasks | Token-constrained | Research-heavy | | **Context Drift** | β | β (Event-driven) | β (Event-driven) | β (Event-driven) | β | | **Optimization Focus** | None | Autonomous execution | Length reduction | Token efficiency | Research workflow | --- @@ -64,30 +64,32 @@ - β Training new agents - β οΈ Not optimized for autonomous execution ### **claudette-auto.md** (445 lines, ~3,490 tokens) - β Most tasks and complex projects - β Enterprise repositories - β Long conversations (event-driven context drift prevention) - β GPT-4 Turbo, Claude Sonnet, Claude Opus - β Optimized for autonomous execution - β Most comprehensive guidance - β No MCP tools required (internal TODO management) ### **claudette-condensed.md** (343 lines, ~2,510 tokens) β **RECOMMENDED** - β Standard coding tasks - β Best balance of features vs token count - β GPT-4, Claude Sonnet - β Event-driven context drift prevention - β 28% smaller than Auto with same core features - β Ideal for most use cases ### **claudette-compact.md** (244 lines, ~1,420 tokens) - β Token-constrained environments - β Lower-reasoning LLMs (GPT-3.5, smaller models) - β Simple, straightforward tasks - β Maximum context window for conversation - β Event-driven context drift prevention (ultra-compact) - β οΈ Minimal examples and explanations ### **beast-mode.md** (152 lines, ~2,620 tokens) - β Research-heavy tasks - β URL scraping and recursive link following - β Tasks with provided URLs @@ -103,10 +105,10 @@ ``` Original ββββββββββββββββββββ 4,860 tokens | ββββββββββββ Features Auto βββββββββββββ 3,490 tokens | ββββββββββββ Features Condensed ββββββββββ 2,510 tokens | βββββββββββ Features β Compact βββββββ 1,420 tokens | βββββββββββ Features Beast βββββββββββ 2,620 tokens | βββββββ Features ``` --- @@ -115,12 +117,12 @@ Beast βββββββββββ 2,630 tokens | βββ **Choose based on priority:** 1. **Need best balance?** β `claudette-condensed.md` β **RECOMMENDED** 2. **Need most comprehensive?** β `claudette-auto.md` 3. **Need smallest token count?** β `claudette-compact.md` 4. **Need URL fetching/research?** β `beast-mode.md` 5. **Need reference documentation?** β `claudette-original.md` 6. **All versions now have event-driven context drift prevention!** --- @@ -144,8 +146,9 @@ beast-mode.md (separate lineage) - Research-focused workflow - **v1 (Original)**: Comprehensive baseline with all features - **v3 (Condensed)**: Length reduction while preserving core functionality - **v4 (Compact)**: Token optimization for lower-reasoning LLMs (-71% tokens) - **v5 (Auto)**: Autonomous execution optimization + context drift prevention - **v5.1 (All)**: Event-driven context management (phase-based, not turn-based) - **Beast Mode**: Separate research-focused workflow with URL fetching --- @@ -154,6 +157,8 @@ beast-mode.md (separate lineage) - Research-focused workflow - All versions except Beast Mode share the same core Claudette identity - Token estimates based on ~1.33 tokens per word average - **NEW**: All Claudette versions now include event-driven context drift prevention - Context drift triggers: phase completion, state transitions, uncertainty, pauses - Beast Mode has a distinct philosophy focused on research and URL fetching - All versions emphasize autonomous execution and completion criteria - Event-driven approach replaces turn-based context management (industry best practice) This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,51 @@ # Installation ## VS Code * Go to the "agent" dropdown in VS Code chat sidebar and select "Configure Modes". * Select "Create new custom chat mode file" * Select "User Data Folder" * Give it a name (Claudette) * Paste in the content of Claudette-auto.md (below) "Claudette" will now appear as a mode in your "Agent" dropdown. ## Cursor * Enable Custom Modes (if not already enabled): * Navigate to Cursor Settings. * Go to the "Chat" section. * Ensure that "Custom Modes" (often labeled as a beta feature) is toggled on. ## When to Use Each Version ### **claudette-auto.md** (445 lines, ~3,490 tokens) - β Most tasks and complex projects - β Enterprise repositories - β Long conversations (event-driven context drift prevention) - β GPT-4/5 Turbo, Claude Sonnet, Claude Opus - β Optimized for autonomous execution - β Most comprehensive guidance ### **claudette-condensed.md** (343 lines, ~2,510 tokens) β **RECOMMENDED** - β Standard coding tasks - β Best balance of features vs token count - β GPT-4/5, Claude Sonnet/Opus - β Event-driven context drift prevention - β 28% smaller than Auto with same core features - β Ideal for most use cases ### **claudette-compact.md** (244 lines, ~1,420 tokens) - β Token-constrained environments - β Lower-reasoning LLMs (GPT-3.5, smaller models) - β Simple, straightforward tasks - β Maximum context window for conversation - β Event-driven context drift prevention (ultra-compact) - β οΈ Minimal examples and explanations ### **claudette-original.md** (703 lines, ~4,860 tokens) ``` β - Not optimized. I do not suggest using anymore β - improvements/modifications from beast-mode ``` [See for more details](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-version-comparison-md) This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,5 @@ --- description: Claudette Coding Agent v5.1 (Optimized for Autonomous Execution) tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', 'usages', 'vscodeAPI', 'problems', 'changes', 'testFailure', 'openSimpleBrowser', 'fetch', 'githubRepo', 'extensions'] --- @@ -21,7 +21,7 @@ tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', - Move directly from one step to the next - Research and fix issues autonomously - Continue until ALL requirements are met - **Refresh context proactively**: Review your TODO list after completing phases, before major transitions, and when uncertain about next steps **Replace these patterns:** @@ -125,36 +125,37 @@ tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', **β οΈ CRITICAL**: As conversations extend, actively maintain focus on your TODO list. Do NOT abandon your task tracking as the conversation progresses. **Context Management Pattern:** - **Early work**: Create and follow TODO list actively - **Mid-session**: Review TODO list after completing each phase - **Extended work**: Restate remaining work before major transitions - **Continuous**: Regularly reference TODO list to maintain focus - **Proactive refresh**: Review TODO list after phase completion, before transitions, when uncertain **π΄ ANTI-PATTERN: Losing Track Over Time** **Common failure mode:** ``` Early work: β Following TODO list actively Mid-session: β οΈ Less frequent TODO references Extended work: β Stopped referencing TODO, repeating context After pause: β Asking user "what were we working on?" ``` **Correct behavior:** ``` Early work: β Create TODO and work through it Mid-session: β Reference TODO by step numbers, check off completed phases Extended work: β Review remaining TODO items after each phase completion After pause: β Regularly restate TODO progress without prompting ``` **Context Refresh Triggers (use these as reminders):** - **After completing phase**: "Completed phase 2, reviewing TODO for next phase..." - **Before major transitions**: "Checking current progress before starting new module..." - **When feeling uncertain**: "Reviewing what's been completed to determine next steps..." - **After any pause/interruption**: "Syncing with TODO list to continue work..." - **Before asking user**: "Let me check my TODO list first..." ### Detailed Planning Requirements @@ -382,13 +383,14 @@ Mark task complete only when: **Context Window Management:** As work extends over time, you may lose track of earlier context. To prevent this: 1. **Event-Driven TODO Review**: Review TODO list after completing phases, before transitions, when uncertain 2. **Progress Summaries**: Summarize what's been completed after each major milestone 3. **Reference by Number**: Use step/phase numbers instead of repeating full descriptions 4. **Never Ask "What Were We Doing?"**: Review your own TODO list first before asking the user 5. **Maintain Written TODO**: Keep a visible TODO list in your responses to track progress 6. **State-Based Refresh**: Refresh context when transitioning between states (planning β implementation β testing) ## FAILURE RECOVERY & WORKSPACE CLEANUP This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,12 +1,12 @@ --- description: Claudette Coding Agent v5.1 (Compact) tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', 'usages', 'vscodeAPI', 'problems', 'changes', 'testFailure', 'openSimpleBrowser', 'fetch', 'githubRepo', 'extensions'] --- # Claudette v4 Compact ## IDENTITY Enterprise agent. Solve problems end-to-end. Work until done. Be conversational and concise. **CRITICAL**: End turn only when problem solved and all TODOs checked. Make tool calls immediately after announcing. @@ -91,6 +91,11 @@ Example: - [ ] 3.3: Verify requirements ``` ### Context Drift (CRITICAL) **Refresh when**: After phase done, before transitions, when uncertain, after pause **Extended work**: Restate after phases, use step #s not full text β Don't: repeat context, abandon TODO, ask "what were we doing?" ### Segues When issues arise: ``` This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,27 +1,6 @@ --- description: Claudette Coding Agent v5.1 (Condensed) tools: ['editFiles', 'runNotebooks', 'search', 'new', 'runCommands', 'runTasks', 'usages', 'vscodeAPI', 'problems', 'changes', 'testFailure', 'openSimpleBrowser', 'fetch', 'githubRepo', 'extensions'] --- # Claudette Coding Agent v4 (Condensed) @@ -153,6 +132,24 @@ For complex tasks, create comprehensive TODO lists: - Include testing and validation in every phase - Consider error scenarios and edge cases ### Context Drift Prevention (CRITICAL) **Refresh context when:** - After completing TODO phases - Before major transitions (new module, state change) - When uncertain about next steps - After any pause or interruption **During extended work:** - Restate remaining work after each phase - Reference TODO by step numbers, not full descriptions - Never ask "what were we working on?" - check your TODO list first **Anti-patterns to avoid:** - β Repeating context instead of referencing TODO - β Abandoning TODO tracking over time - β Asking user for context you already have ### Segue Management When encountering issues requiring research: File renamed without changes. -
orneryd revised this gist
Oct 7, 2025 . 1 changed file with 5 additions and 5 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -64,7 +64,7 @@ - β Training new agents - β οΈ Not optimized for autonomous execution ### **claudette-auto.md** (443 lines, ~3,440 tokens) β **RECOMMENDED** - β Most tasks and complex projects - β Enterprise repositories - β Long conversations (context drift prevention) @@ -115,11 +115,11 @@ Beast βββββββββββ 2,630 tokens | βββ **Choose based on priority:** 1. **Need context drift prevention?** β `claudette-auto.md` 2. **Need smallest token count?** β `claudette-compact.md` 3. **Need URL fetching/research?** β `beast-mode.md` 4. **Need comprehensive reference?** β `claudette-original.md` 5. **Need balanced approach?** β `claudette-auto.md` β 6. **Need moderate token savings?** β `claudette-condensed.md` --- @@ -129,7 +129,7 @@ Beast βββββββββββ 2,630 tokens | βββ ``` claudette-original.md (v1) β βββ claudette-auto.md (v5) - Autonomous optimization + context drift β claudette-condensed.md (v3) β @@ -154,6 +154,6 @@ beast-mode.md (separate lineage) - Research-focused workflow - All versions except Beast Mode share the same core Claudette identity - Token estimates based on ~1.33 tokens per word average - Context drift prevention is unique to `claudette-auto.md` - Beast Mode has a distinct philosophy focused on research and URL fetching - All versions emphasize autonomous execution and completion criteria -
orneryd revised this gist
Oct 7, 2025 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,7 +5,7 @@ | Version | Lines | Words | Est. Tokens | Size vs Original | |---------|-------|-------|-------------|------------------| | **claudette-original.md** | 703 | 3,645 | ~4,860 | Baseline (100%) | | **claudette-auto.md** | 443 | 2,578 | ~3,440 | -37% | | **claudette-condensed.md** | 325 | 1,794 | ~2,390 | -51% | | **claudette-compact.md** | 239 | 1,029 | ~1,370 | -72% | | **beast-mode.md** | 152 | 1,967 | ~2,630 | -46% | -
orneryd revised this gist
Oct 7, 2025 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -43,4 +43,6 @@ ``` β - Not optimized. I do not suggest using anymore β - improvements/modifications from beast-mode ``` [See for more details](https://gist.github.com/orneryd/334e1d59b6abaf289d06eeda62690cdb#file-version-comparison-md) -
orneryd renamed this gist
Oct 7, 2025 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
orneryd revised this gist
Oct 7, 2025 . 1 changed file with 159 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,159 @@ # Claudette & Beast Mode Version Comparison ## π Size Metrics | Version | Lines | Words | Est. Tokens | Size vs Original | |---------|-------|-------|-------------|------------------| | **claudette-original.md** | 703 | 3,645 | ~4,860 | Baseline (100%) | | **claudette.auto.chatmode.md** | 443 | 2,578 | ~3,440 | -37% | | **claudette-condensed.md** | 325 | 1,794 | ~2,390 | -51% | | **claudette-compact.md** | 239 | 1,029 | ~1,370 | -72% | | **beast-mode.md** | 152 | 1,967 | ~2,630 | -46% | --- ## π― Feature Matrix | Feature | Original | Auto | Condensed | Compact | Beast | |---------|----------|------|-----------|---------|-------| | **Core Identity** | β | β | β | β | β | | **Productive Behaviors** | β | β | β | β | β | | **Anti-Pattern Examples (β/β )** | β | β | β | β | β | | **Execution Protocol** | 5-phase | 3-phase | 3-phase | 3-phase | 10-step | | **Repository Conservation** | β | β | β | β | β | | **Dependency Hierarchy** | β | β | β | β | β | | **Project Type Detection** | β | β | β | β | β | | **TODO Management** | β | β | β | β | β | | **Segue Management** | β | β | β | β | β | | **Segue Cleanup Protocol** | β | β | β | β | β | | **Error Debugging Protocols** | β | β | β | β | β | | **Research Methodology** | β | β | β | β | β | | **Communication Protocol** | β | β | β | β | β | | **Completion Criteria** | β | β | β | β | β | | **Context Drift Prevention** | β | β | β | β | β | | **Failure Recovery** | β | β | β | β | β | | **Execution Mindset** | β | β | β | β | β | | **Effective Response Patterns** | β | β | β | β | β | | **URL Fetching Protocol** | β | β | β | β | β | | **Memory System** | β | β | β | β | β | | **Git Rules** | β | β | β | β | β | --- ## π Key Differentiators | Aspect | Original | Auto | Condensed | Compact | Beast | |--------|----------|------|-----------|---------|-------| | **Tone** | Professional | Professional | Professional | Professional | Casual | | **Verbosity** | High | Medium | Low | Very Low | Low | | **Structure** | Detailed | Streamlined | Condensed | Minimal | Workflow | | **Emphasis** | Comprehensive | Autonomous | Efficient | Token-optimal | Research | | **Target LLM** | GPT-4, Claude Opus | GPT-4, Claude Sonnet | GPT-4 | GPT-3.5, Lower-reasoning | Any | | **Use Case** | Complex enterprise | Most tasks | Standard tasks | Token-constrained | Research-heavy | | **Context Drift** | β | β | β | β | β | | **Optimization Focus** | None | Autonomous execution | Length reduction | Token efficiency | Research workflow | --- ## π‘ Recommended Use Cases ### **claudette-original.md** (703 lines, ~4,860 tokens) - β Reference documentation - β Most comprehensive guidance - β When token count is not a concern - β Training new agents - β οΈ Not optimized for autonomous execution ### **claudette.auto.chatmode.md** (443 lines, ~3,440 tokens) β **RECOMMENDED** - β Most tasks and complex projects - β Enterprise repositories - β Long conversations (context drift prevention) - β GPT-4 Turbo, Claude Sonnet, Claude Opus - β Optimized for autonomous execution - β Best balance of features vs size ### **claudette-condensed.md** (325 lines, ~2,390 tokens) - β Standard coding tasks - β When you need smaller context footprint - β GPT-4, Claude Sonnet - β οΈ No context drift prevention - β οΈ Less detailed guidance ### **claudette-compact.md** (239 lines, ~1,370 tokens) - β Token-constrained environments - β Lower-reasoning LLMs (GPT-3.5, smaller models) - β Simple, straightforward tasks - β Maximum context window for conversation - β οΈ No context drift prevention - β οΈ Minimal examples and explanations ### **beast-mode.md** (152 lines, ~2,630 tokens) - β Research-heavy tasks - β URL scraping and recursive link following - β Tasks with provided URLs - β Casual communication preferred - β Persistent memory across sessions - β οΈ No repository conservation - β οΈ No context drift prevention - β οΈ Not enterprise-focused --- ## π Token Efficiency vs Features Trade-off ``` Original ββββββββββββββββββββ 4,860 tokens | ββββββββββββ Features Auto βββββββββββββ 3,440 tokens | ββββββββββββ Features β Condensed ββββββββββ 2,390 tokens | ββββββββββ Features Compact βββββββ 1,370 tokens | βββββββββ Features Beast βββββββββββ 2,630 tokens | βββββββ Features ``` --- ## π― Quick Selection Guide **Choose based on priority:** 1. **Need context drift prevention?** β `claudette.auto.chatmode.md` 2. **Need smallest token count?** β `claudette-compact.md` 3. **Need URL fetching/research?** β `beast-mode.md` 4. **Need comprehensive reference?** β `claudette-original.md` 5. **Need balanced approach?** β `claudette.auto.chatmode.md` β 6. **Need moderate token savings?** β `claudette-condensed.md` --- ## π Evolution Timeline ``` claudette-original.md (v1) β βββ claudette.auto.chatmode.md (v5) - Autonomous optimization + context drift β claudette-condensed.md (v3) β claudette-compact.md (v4) - Token optimization beast-mode.md (separate lineage) - Research-focused workflow ``` --- ## π Version History - **v1 (Original)**: Comprehensive baseline with all features - **v3 (Condensed)**: Length reduction while preserving core functionality - **v4 (Compact)**: Token optimization for lower-reasoning LLMs (-72% tokens) - **v5 (Auto)**: Autonomous execution optimization + context drift prevention - **Beast Mode**: Separate research-focused workflow with URL fetching --- ## π Notes - All versions except Beast Mode share the same core Claudette identity - Token estimates based on ~1.33 tokens per word average - Context drift prevention is unique to `claudette.auto.chatmode.md` - Beast Mode has a distinct philosophy focused on research and URL fetching - All versions emphasize autonomous execution and completion criteria -
orneryd revised this gist
Oct 6, 2025 . 1 changed file with 4 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -18,28 +18,28 @@ ## When to Use Each Version ### Claudette-compact.md (239 lines) ``` β GPT-3.5, Claude Instant, Llama 2, Mistral β Token-constrained environments β Faster response times β Simple to moderate tasks ``` ### Claudette-condensed.md (325 lines) ``` β GPT-4o, GPT-4.1 β Complex tasks β More detailed examples helpful ``` ### Claudette-auto.md (443 lines) < Recommended for most people ``` β GPT-5, Claude Sonnet β Most complex tasks β Structured anti-patterns β Execution mindset section β Context drift prevention ``` ### Claudette-original.md (726 lines) ``` β - Not optimized. I do not suggest using anymore β - improvements/modifications from beast-mode
NewerOlder