Last active
October 10, 2025 13:33
-
-
Save textarcana/05276c9a1a19209f10bef4332adb7431 to your computer and use it in GitHub Desktop.
Revisions
-
textarcana revised this gist
Oct 10, 2025 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -43,7 +43,7 @@ This document consolidates highly cited foundational papers and their citing wor --- [ai-chains]: https://dl.acm.org/doi/abs/10.1145/3491102.3517582 [prompt-stepwise]: https://arxiv.org/pdf/2406.00507 -
textarcana created this gist
Oct 10, 2025 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,60 @@ # Cross-Model Prompt Chaining: Expanded and Filtered Literature Review This document consolidates highly cited foundational papers and their citing works relevant to **cross-model prompt chaining** across different LLM families (e.g., GPT, Claude, Qwen). Each entry includes a link to its source. --- ## Highly-cited seed papers (≥ 5 citations) 1. [**AI Chains (CHI’22)**][ai-chains] — formalizes prompt chaining; tooling makes swapping steps/models straightforward. 2. [**Prompt Chaining vs Stepwise (Findings ACL’24)**][prompt-stepwise] — chaining empirically outperforms single long prompts; supports staged flows that can be mapped onto different models. 3. [**Mixture-of-Agents (ICLR’25)**][moa] — layered ensembles of different LLMs; strong heterogeneous results. 4. [**Exchange-of-Thought (EMNLP’23)**][eot] — explicit cross-model communication (Memory / Report / Relay / Debate) to pass reasoning between models. 5. [**FrugalGPT (TMLR / ICLR’24)**][frugalgpt] — routing / cascades select among multiple LLM APIs per query (router + scorer + stop-judger). --- ## New citing papers relevant to cross-model chaining - [**Rethinking Mixture-of-Agents (2025)**][rethinking-moa] — evaluates when heterogeneous mixing (different families) helps vs a “Self-MoA” using only the single best model; decision insights are directly useful for GPT→Claude→Qwen handoffs. (Cites MoA.) - [**Deep Research Agents (2025)**][deep-research] — surveys agent systems that blend multiple model families in pipelines (e.g., GPT-4.x, Claude-Sonnet, Gemini, DeepSeek); practical cross-model orchestration patterns. (Cites MoA and related multi-agent work.) - [**When Two LLMs Debate (2025)**][llm-debate] — analyzes inter-model debate dynamics and confidence revision; applicable as a handoff stage where models critique each other’s code/patches. (Cites / extends debate-style cross-model interaction lines that also reference EoT.) - [**From Standalone LLMs to Integrated Intelligence (CAIS Survey 2025)**][integrated-intel] — taxonomy of orchestration strategies (components / roles / routing) for multi-model systems; design references for chained, cross-family pipelines. (Surveys and cites routing / ensemble literature incl. MoA / FrugalGPT-style methods.) - [**Knowledge-Empowered, Collaborative, and Co-Evolving LLMs (2024)**][knowledge-collab] — focuses on model collaboration and co-evolution, covering mechanisms to combine different LLMs / tools; relevant for deciding what role each family plays in a chain. - [**Human Intervention in LLM Multi-Agent Debate (2024)**][human-intervention] — studies human-in-the-loop control in multi-agent (often cross-model) debate pipelines; helpful guardrails for cross-model code handoffs. (Cites multi-agent debate lines related to EoT-style setups.) - [**ChainBuddy (2024)**][chainbuddy] — assistant that generates evaluative LLM pipelines in ChainForge; supports planning / evaluating multi-step chains where models can be swapped — useful for implementing cross-family stage assignments. (Builds atop prompt-chaining HCI work such as AI Chains.) - [**Advances & Open Problems for LLMs (2025 Survey)**][advances-llm] — synthesizes evidence around MoA and heterogeneous teaming; extracts conditions where mixing different models is beneficial, informing when to escalate across families. --- ## How to use these for GPT→Claude→Qwen handoffs - **[Design the chain][ai-chains]** with AI Chains / ChainBuddy patterns; assign roles per family (e.g., GPT for drafting / spec-aware scaffolds, Claude for safety / compliance critique, Qwen for refactor / optimization). - **[Add routing / cascades][frugalgpt]** to escalate to stronger / more expensive families only if a cheap pass (e.g., Qwen-small) or an automated scorer flags low quality / uncertainty. - **[Enable cross-model reasoning transfer (EoT)][eot]**: pass not just code but rationales / diffs / tests between models; optionally add a short debate round before merging. - **[Sanity-check mixing][rethinking-moa]** with MoA + Rethinking-MoA insights: in some contexts, a single strong model with self-aggregation can beat mixing; measure before committing to heavy cross-family ensembles. --- ## Reference Links [ai-chains]: https://dl.acm.org/doi/abs/10.1145/3491102.3517582 [prompt-stepwise]: https://arxiv.org/pdf/2406.00507 [moa]: https://arxiv.org/abs/2406.04692 [eot]: https://arxiv.org/abs/2312.01823 [frugalgpt]: https://arxiv.org/abs/2305.05176 [rethinking-moa]: https://arxiv.org/abs/2501.00064 [deep-research]: https://arxiv.org/abs/2503.10007 [llm-debate]: https://arxiv.org/abs/2504.02888 [integrated-intel]: https://arxiv.org/abs/2502.00643 [knowledge-collab]: https://arxiv.org/abs/2407.05619 [human-intervention]: https://arxiv.org/abs/2410.09077 [chainbuddy]: https://arxiv.org/abs/2403.18417 [advances-llm]: https://arxiv.org/abs/2503.02401