Iterative planning, code generation, and code review. This is my observation as to what works when developing code using a large language model (mostly Claude).
Iterative planning, code generation, and code review. This is my observation as to what works when developing code using a large language model (mostly Claude).
It is interesting to independently rediscover that the best approach to writing code is to work in rapid iterations on small, well-reviewed improvements. Because large, detailed, and complicated prompts do not work well, or at least not always.
The academic literature refers to this as “prompt chaining,” and there are numerous documented cases where iterative prompting yields better results than so-called one–shot prompting with an extensive and detailed prompt. There are also lots of examples where prompt chaining across models delivers a great result.
This aligns with what my team has found as we increasingly use models to write code. We began by trying to keep our work at the forefront of the context window of the model. Our initial observation was that the context window of even a large language model has a recency bias: the most recent context has outsized influence on the output. So even though the context window of an LLM has the capacity to hold hundreds of thousands of tokens, the model’s attention span is limited. You can see this happening when GPT or Claude starts to leave out code from the solution you are working on — some of the code is less recent in its context and the model has stopped paying attention to it. As a workaround we made a habit of starting a fresh session every hour or so to reset the conversation history and reclaim the most recent part of the model’s context window.
At the same time, people in our network were telling us that the best results are derived from taking the output of one model, such as GPT, and feeding it into a different model family (Claude) for evaluation.
Because we were trying to keep context small, and because we were trying to derive results from multiple models, this eventually led to a working style where we have many sessions open at once, across several different model families, all in the carrying out of a single task.
It is almost a multi–agent approach except that there is no direct interaction between any of the open sessions. Each session is holding a series of prompts that themselves are part of a larger chain of prompts that extends across all of the open sessions and model families. It is like the way a software engineer typically works, with many files and terminals open, moving between them in an iterative sequence of plan-execute-review-execute, the details of which are constantly modified according to feedback from the code.