Cross-Model Prompt Chaining: Expanded and Filtered Literature Review

This document consolidates highly cited foundational papers and their citing works relevant to cross-model prompt chaining across different LLM families (e.g., GPT, Claude, Qwen). Each entry includes a link to its source.

Highly-cited seed papers (≥ 5 citations)

AI Chains (CHI’22) — formalizes prompt chaining; tooling makes swapping steps/models straightforward.
Prompt Chaining vs Stepwise (Findings ACL’24) — chaining empirically outperforms single long prompts; supports staged flows that can be mapped onto different models.
Mixture-of-Agents (ICLR’25) — layered ensembles of different LLMs; strong heterogeneous results.
Exchange-of-Thought (EMNLP’23) — explicit cross-model communication (Memory / Report / Relay / Debate) to pass reasoning between models.
FrugalGPT (TMLR / ICLR’24) — routing / cascades select among multiple LLM APIs per query (router + scorer + stop-judger).

New citing papers relevant to cross-model chaining

Rethinking Mixture-of-Agents (2025) — evaluates when heterogeneous mixing (different families) helps vs a “Self-MoA” using only the single best model; decision insights are directly useful for GPT→Claude→Qwen handoffs. (Cites MoA.)
Deep Research Agents (2025) — surveys agent systems that blend multiple model families in pipelines (e.g., GPT-4.x, Claude-Sonnet, Gemini, DeepSeek); practical cross-model orchestration patterns. (Cites MoA and related multi-agent work.)
When Two LLMs Debate (2025) — analyzes inter-model debate dynamics and confidence revision; applicable as a handoff stage where models critique each other’s code/patches. (Cites / extends debate-style cross-model interaction lines that also reference EoT.)
From Standalone LLMs to Integrated Intelligence (CAIS Survey 2025) — taxonomy of orchestration strategies (components / roles / routing) for multi-model systems; design references for chained, cross-family pipelines. (Surveys and cites routing / ensemble literature incl. MoA / FrugalGPT-style methods.)
Knowledge-Empowered, Collaborative, and Co-Evolving LLMs (2024) — focuses on model collaboration and co-evolution, covering mechanisms to combine different LLMs / tools; relevant for deciding what role each family plays in a chain.
Human Intervention in LLM Multi-Agent Debate (2024) — studies human-in-the-loop control in multi-agent (often cross-model) debate pipelines; helpful guardrails for cross-model code handoffs. (Cites multi-agent debate lines related to EoT-style setups.)
ChainBuddy (2024) — assistant that generates evaluative LLM pipelines in ChainForge; supports planning / evaluating multi-step chains where models can be swapped — useful for implementing cross-family stage assignments. (Builds atop prompt-chaining HCI work such as AI Chains.)
Advances & Open Problems for LLMs (2025 Survey) — synthesizes evidence around MoA and heterogeneous teaming; extracts conditions where mixing different models is beneficial, informing when to escalate across families.

How to use these for GPT→Claude→Qwen handoffs

Design the chain with AI Chains / ChainBuddy patterns; assign roles per family (e.g., GPT for drafting / spec-aware scaffolds, Claude for safety / compliance critique, Qwen for refactor / optimization).
Add routing / cascades to escalate to stronger / more expensive families only if a cheap pass (e.g., Qwen-small) or an automated scorer flags low quality / uncertainty.
Enable cross-model reasoning transfer (EoT): pass not just code but rationales / diffs / tests between models; optionally add a short debate round before merging.
Sanity-check mixing with MoA + Rethinking-MoA insights: in some contexts, a single strong model with self-aggregation can beat mixing; measure before committing to heavy cross-family ensembles.

textarcana/Prompt_chaining_across_model_families.Md

Cross-Model Prompt Chaining: Expanded and Filtered Literature Review

Highly-cited seed papers (≥ 5 citations)

New citing papers relevant to cross-model chaining

How to use these for GPT→Claude→Qwen handoffs