Skip to content

Instantly share code, notes, and snippets.

@gokuljs
Last active August 26, 2025 00:14
Show Gist options
  • Save gokuljs/3ab7ac70968d8626bd709828922341d5 to your computer and use it in GitHub Desktop.
Save gokuljs/3ab7ac70968d8626bd709828922341d5 to your computer and use it in GitHub Desktop.

Revisions

  1. gokuljs revised this gist Aug 26, 2025. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion stichfin.md
    Original file line number Diff line number Diff line change
    @@ -7,8 +7,9 @@ This example demonstrates how to implement a conversation agent that collects co
    1. Clone the repository and navigate to the LLM complete chunks example:
    ```bash
    git clone https://github.com/rimelabs/rime-pipecat-agents.git
    cd rime-pipecat-agents
    git checkout stichfin-llm-complete-chunk-tts
    cd rime-pipecat-agents/llm-complete-chunks-example
    cd llm-complete-chunks-example
    ```

    2. Set up your environment variables in `.env`:
  2. gokuljs revised this gist Aug 25, 2025. 1 changed file with 0 additions and 2 deletions.
    2 changes: 0 additions & 2 deletions stichfin.md
    Original file line number Diff line number Diff line change
    @@ -2,8 +2,6 @@

    This example demonstrates how to implement a conversation agent that collects complete LLM responses before sending them to Text-to-Speech (TTS). The implementation provides a simple yet effective approach to ensure smooth and natural speech output.

    To see how this is implemented, take a look at the code changes in this [PR](https://github.com/rimelabs/rime-pipecat-agents/pull/16)

    ## Quick Setup Guide

    1. Clone the repository and navigate to the LLM complete chunks example:
  3. gokuljs revised this gist Aug 25, 2025. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions stichfin.md
    Original file line number Diff line number Diff line change
    @@ -9,6 +9,7 @@ To see how this is implemented, take a look at the code changes in this [PR](htt
    1. Clone the repository and navigate to the LLM complete chunks example:
    ```bash
    git clone https://github.com/rimelabs/rime-pipecat-agents.git
    git checkout stichfin-llm-complete-chunk-tts
    cd rime-pipecat-agents/llm-complete-chunks-example
    ```

  4. gokuljs revised this gist Aug 25, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion stichfin.md
    Original file line number Diff line number Diff line change
    @@ -29,7 +29,7 @@ To see how this is implemented, take a look at the code changes in this [PR](htt

    This example extends the basic STT → LLM → TTS pipeline by introducing a custom frame processor (`LLMCompleteResponseProcessor`) that aggregates LLM responses before sending them to TTS.

    To implement a custom FrameProcessor, check out the guide at [Pipecat's Custom FrameProcessor documentation](https://docs.pipecat.ai/guides/fundamentals/custom-frame-processorhttps://docs.pipecat.ai/guides/fundamentals/custom-frame-processor)
    To implement a custom FrameProcessor, check out the guide at [Pipecat's Custom FrameProcessor documentation](https://docs.pipecat.ai/guides/fundamentals/custom-frame-processor)

    The key component is the `LLMCompleteResponseProcessor` class, which:
    - Acts as a text collector between LLM and TTS
  5. gokuljs revised this gist Aug 25, 2025. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions stichfin.md
    Original file line number Diff line number Diff line change
    @@ -29,6 +29,8 @@ To see how this is implemented, take a look at the code changes in this [PR](htt

    This example extends the basic STT → LLM → TTS pipeline by introducing a custom frame processor (`LLMCompleteResponseProcessor`) that aggregates LLM responses before sending them to TTS.

    To implement a custom FrameProcessor, check out the guide at [Pipecat's Custom FrameProcessor documentation](https://docs.pipecat.ai/guides/fundamentals/custom-frame-processorhttps://docs.pipecat.ai/guides/fundamentals/custom-frame-processor)

    The key component is the `LLMCompleteResponseProcessor` class, which:
    - Acts as a text collector between LLM and TTS
    - Accumulates text frames from LLM until it receives an `LLMFullResponseEndFrame`
  6. gokuljs revised this gist Aug 25, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion stichfin.md
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,7 @@

    This example demonstrates how to implement a conversation agent that collects complete LLM responses before sending them to Text-to-Speech (TTS). The implementation provides a simple yet effective approach to ensure smooth and natural speech output.

    For the complete implementation and setup instructions, please refer to [here](https://github.com/rimelabs/rime-pipecat-agents/pull/16/files).
    To see how this is implemented, take a look at the code changes in this [PR](https://github.com/rimelabs/rime-pipecat-agents/pull/16)

    ## Quick Setup Guide

  7. gokuljs revised this gist Aug 25, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion stichfin.md
    Original file line number Diff line number Diff line change
    @@ -67,4 +67,4 @@ We encourage you to look for approaches to determine which better suits your nee

    ## Additional Issue: Handling Short Text Chunks

    Regarding another issue that's different from the above - you've come across a situation where text like "Oh that's exciting! You got 50%" gets sent as two separate chunks: first "Oh that's exciting!" then "You got 50%". Our suggestion is that you need to implement another custom logic using a custom frame processor or text aggregator. This function should take a minimum character length parameter, and as frames come in, it stores them first then chunks them. Right after chunking, check the length of the chunked text - if it's less than the minimum character length, store it in a buffer variable. When the next chunk is produced, merge both the buffer variable and recently chunked text before sending it to TTS. This approach ensures that short, related text fragments stay together for more natural speech output, and you can implement this logic alongside your existing `LLMCompleteResponseProcessor`.
    Regarding another issue that's different from the above - you've come across a situation where text like "Oh that's exciting! You got 50%" gets sent as two separate chunks: first "Oh that's exciting!" then "You got 50%". Our suggestion is that you need to implement another custom logic using a custom frame processor or text aggregator. This function should take a minimum character length parameter, and as frames come in, it stores them first then chunks them. Right after chunking, check the length of the chunked text - if it's less than the minimum character length, store it in a buffer variable. When the next chunk is produced, merge both the buffer variable and recently chunked text before sending it to TTS. This approach ensures that short, related text fragments stay together for more natural speech output.
  8. gokuljs renamed this gist Aug 25, 2025. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  9. gokuljs revised this gist Aug 25, 2025. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions Readme.md
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,7 @@

    This example demonstrates how to implement a conversation agent that collects complete LLM responses before sending them to Text-to-Speech (TTS). The implementation provides a simple yet effective approach to ensure smooth and natural speech output.

    For the complete implementation and setup instructions, please refer to [PR #15](https://github.com/rimelabs/rime-pipecat-agents/pull/15).
    For the complete implementation and setup instructions, please refer to [here](https://github.com/rimelabs/rime-pipecat-agents/pull/16/files).

    ## Quick Setup Guide

    @@ -22,7 +22,7 @@ For the complete implementation and setup instructions, please refer to [PR #15]
    3. Install and run:
    ```bash
    uv sync
    uv run rime-http.py
    uv run main.py
    ```

    ## Implementation Details
  10. gokuljs created this gist Aug 25, 2025.
    70 changes: 70 additions & 0 deletions Readme.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,70 @@
    # Complete LLM Response Handling for TTS

    This example demonstrates how to implement a conversation agent that collects complete LLM responses before sending them to Text-to-Speech (TTS). The implementation provides a simple yet effective approach to ensure smooth and natural speech output.

    For the complete implementation and setup instructions, please refer to [PR #15](https://github.com/rimelabs/rime-pipecat-agents/pull/15).

    ## Quick Setup Guide

    1. Clone the repository and navigate to the LLM complete chunks example:
    ```bash
    git clone https://github.com/rimelabs/rime-pipecat-agents.git
    cd rime-pipecat-agents/llm-complete-chunks-example
    ```

    2. Set up your environment variables in `.env`:
    ```
    RIME_API_KEY=your_rime_api_key
    OPENAI_API_KEY=your_openai_api_key
    DEEPGRAM_API_KEY=your_deepgram_api_key
    ```

    3. Install and run:
    ```bash
    uv sync
    uv run rime-http.py
    ```

    ## Implementation Details

    This example extends the basic STT → LLM → TTS pipeline by introducing a custom frame processor (`LLMCompleteResponseProcessor`) that aggregates LLM responses before sending them to TTS.

    The key component is the `LLMCompleteResponseProcessor` class, which:
    - Acts as a text collector between LLM and TTS
    - Accumulates text frames from LLM until it receives an `LLMFullResponseEndFrame`
    - Only sends the complete response to TTS once all chunks are collected

    Pipeline configuration:
    ```python
    Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    llm_complete_processor, # Collects complete LLM response
    tts,
    rtvi_processor,
    transport.output(),
    context_aggregator.assistant(),
    ])
    ```

    Note: The positioning of `llm_complete_processor` between `llm` and `tts` is crucial for proper text aggregation.

    ## Important Considerations

    By default, Pipecat streams LLM responses directly to TTS as they're generated, providing immediate audio feedback. Our current implementation waits for the complete response before processing, which means:
    - You'll experience a slight delay while waiting for the full response.
    - Longer responses will take more time to begin playing

    ### Text Length Limitations
    Rime TTS has a character limit of 500 per request. For optimal implementation, you should:
    - Monitor the length of accumulated text
    - Split responses longer than 500 characters into appropriate chunks
    - Ensure proper sentence breaks for natural speech flow

    We encourage you to look for approaches to determine which better suits your needs. The example provides a foundation that you can build upon based on your specific requirements.

    ## Additional Issue: Handling Short Text Chunks

    Regarding another issue that's different from the above - you've come across a situation where text like "Oh that's exciting! You got 50%" gets sent as two separate chunks: first "Oh that's exciting!" then "You got 50%". Our suggestion is that you need to implement another custom logic using a custom frame processor or text aggregator. This function should take a minimum character length parameter, and as frames come in, it stores them first then chunks them. Right after chunking, check the length of the chunked text - if it's less than the minimum character length, store it in a buffer variable. When the next chunk is produced, merge both the buffer variable and recently chunked text before sending it to TTS. This approach ensures that short, related text fragments stay together for more natural speech output, and you can implement this logic alongside your existing `LLMCompleteResponseProcessor`.