Skip to content

Instantly share code, notes, and snippets.

@textarcana
Last active October 10, 2025 13:33
Show Gist options
  • Save textarcana/05276c9a1a19209f10bef4332adb7431 to your computer and use it in GitHub Desktop.
Save textarcana/05276c9a1a19209f10bef4332adb7431 to your computer and use it in GitHub Desktop.

Revisions

  1. textarcana revised this gist Oct 10, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion Prompt_chaining_across_model_families.Md
    Original file line number Diff line number Diff line change
    @@ -43,7 +43,7 @@ This document consolidates highly cited foundational papers and their citing wor

    ---

    ## Reference Links


    [ai-chains]: https://dl.acm.org/doi/abs/10.1145/3491102.3517582
    [prompt-stepwise]: https://arxiv.org/pdf/2406.00507
  2. textarcana created this gist Oct 10, 2025.
    60 changes: 60 additions & 0 deletions Prompt_chaining_across_model_families.Md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,60 @@
    # Cross-Model Prompt Chaining: Expanded and Filtered Literature Review

    This document consolidates highly cited foundational papers and their citing works relevant to **cross-model prompt chaining** across different LLM families (e.g., GPT, Claude, Qwen). Each entry includes a link to its source.

    ---

    ## Highly-cited seed papers (≥ 5 citations)

    1. [**AI Chains (CHI’22)**][ai-chains] — formalizes prompt chaining; tooling makes swapping steps/models straightforward.
    2. [**Prompt Chaining vs Stepwise (Findings ACL’24)**][prompt-stepwise] — chaining empirically outperforms single long prompts; supports staged flows that can be mapped onto different models.
    3. [**Mixture-of-Agents (ICLR’25)**][moa] — layered ensembles of different LLMs; strong heterogeneous results.
    4. [**Exchange-of-Thought (EMNLP’23)**][eot] — explicit cross-model communication (Memory / Report / Relay / Debate) to pass reasoning between models.
    5. [**FrugalGPT (TMLR / ICLR’24)**][frugalgpt] — routing / cascades select among multiple LLM APIs per query (router + scorer + stop-judger).

    ---

    ## New citing papers relevant to cross-model chaining

    - [**Rethinking Mixture-of-Agents (2025)**][rethinking-moa] — evaluates when heterogeneous mixing (different families) helps vs a “Self-MoA” using only the single best model; decision insights are directly useful for GPT→Claude→Qwen handoffs. (Cites MoA.)

    - [**Deep Research Agents (2025)**][deep-research] — surveys agent systems that blend multiple model families in pipelines (e.g., GPT-4.x, Claude-Sonnet, Gemini, DeepSeek); practical cross-model orchestration patterns. (Cites MoA and related multi-agent work.)

    - [**When Two LLMs Debate (2025)**][llm-debate] — analyzes inter-model debate dynamics and confidence revision; applicable as a handoff stage where models critique each other’s code/patches. (Cites / extends debate-style cross-model interaction lines that also reference EoT.)

    - [**From Standalone LLMs to Integrated Intelligence (CAIS Survey 2025)**][integrated-intel] — taxonomy of orchestration strategies (components / roles / routing) for multi-model systems; design references for chained, cross-family pipelines. (Surveys and cites routing / ensemble literature incl. MoA / FrugalGPT-style methods.)

    - [**Knowledge-Empowered, Collaborative, and Co-Evolving LLMs (2024)**][knowledge-collab] — focuses on model collaboration and co-evolution, covering mechanisms to combine different LLMs / tools; relevant for deciding what role each family plays in a chain.

    - [**Human Intervention in LLM Multi-Agent Debate (2024)**][human-intervention] — studies human-in-the-loop control in multi-agent (often cross-model) debate pipelines; helpful guardrails for cross-model code handoffs. (Cites multi-agent debate lines related to EoT-style setups.)

    - [**ChainBuddy (2024)**][chainbuddy] — assistant that generates evaluative LLM pipelines in ChainForge; supports planning / evaluating multi-step chains where models can be swapped — useful for implementing cross-family stage assignments. (Builds atop prompt-chaining HCI work such as AI Chains.)

    - [**Advances & Open Problems for LLMs (2025 Survey)**][advances-llm] — synthesizes evidence around MoA and heterogeneous teaming; extracts conditions where mixing different models is beneficial, informing when to escalate across families.

    ---

    ## How to use these for GPT→Claude→Qwen handoffs

    - **[Design the chain][ai-chains]** with AI Chains / ChainBuddy patterns; assign roles per family (e.g., GPT for drafting / spec-aware scaffolds, Claude for safety / compliance critique, Qwen for refactor / optimization).
    - **[Add routing / cascades][frugalgpt]** to escalate to stronger / more expensive families only if a cheap pass (e.g., Qwen-small) or an automated scorer flags low quality / uncertainty.
    - **[Enable cross-model reasoning transfer (EoT)][eot]**: pass not just code but rationales / diffs / tests between models; optionally add a short debate round before merging.
    - **[Sanity-check mixing][rethinking-moa]** with MoA + Rethinking-MoA insights: in some contexts, a single strong model with self-aggregation can beat mixing; measure before committing to heavy cross-family ensembles.

    ---

    ## Reference Links

    [ai-chains]: https://dl.acm.org/doi/abs/10.1145/3491102.3517582
    [prompt-stepwise]: https://arxiv.org/pdf/2406.00507
    [moa]: https://arxiv.org/abs/2406.04692
    [eot]: https://arxiv.org/abs/2312.01823
    [frugalgpt]: https://arxiv.org/abs/2305.05176
    [rethinking-moa]: https://arxiv.org/abs/2501.00064
    [deep-research]: https://arxiv.org/abs/2503.10007
    [llm-debate]: https://arxiv.org/abs/2504.02888
    [integrated-intel]: https://arxiv.org/abs/2502.00643
    [knowledge-collab]: https://arxiv.org/abs/2407.05619
    [human-intervention]: https://arxiv.org/abs/2410.09077
    [chainbuddy]: https://arxiv.org/abs/2403.18417
    [advances-llm]: https://arxiv.org/abs/2503.02401