Complete LLM Response Handling for TTS

This example demonstrates how to implement a conversation agent that collects complete LLM responses before sending them to Text-to-Speech (TTS). The implementation provides a simple yet effective approach to ensure smooth and natural speech output.

Quick Setup Guide

Clone the repository and navigate to the LLM complete chunks example:

git clone https://github.com/rimelabs/rime-pipecat-agents.git
cd rime-pipecat-agents
git checkout stichfin-llm-complete-chunk-tts
cd llm-complete-chunks-example

Set up your environment variables in .env:

RIME_API_KEY=your_rime_api_key
OPENAI_API_KEY=your_openai_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key

Install and run:
```
uv sync
uv run main.py
```

Implementation Details

This example extends the basic STT → LLM → TTS pipeline by introducing a custom frame processor (LLMCompleteResponseProcessor) that aggregates LLM responses before sending them to TTS.

To implement a custom FrameProcessor, check out the guide at Pipecat's Custom FrameProcessor documentation

The key component is the LLMCompleteResponseProcessor class, which:

Acts as a text collector between LLM and TTS
Accumulates text frames from LLM until it receives an LLMFullResponseEndFrame
Only sends the complete response to TTS once all chunks are collected

Pipeline configuration:

Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    llm_complete_processor,  # Collects complete LLM response
    tts,
    rtvi_processor,
    transport.output(),
    context_aggregator.assistant(),
])

Note: The positioning of llm_complete_processor between llm and tts is crucial for proper text aggregation.

Important Considerations

By default, Pipecat streams LLM responses directly to TTS as they're generated, providing immediate audio feedback. Our current implementation waits for the complete response before processing, which means:

You'll experience a slight delay while waiting for the full response.
Longer responses will take more time to begin playing

Text Length Limitations

Rime TTS has a character limit of 500 per request. For optimal implementation, you should:

Monitor the length of accumulated text
Split responses longer than 500 characters into appropriate chunks
Ensure proper sentence breaks for natural speech flow

We encourage you to look for approaches to determine which better suits your needs. The example provides a foundation that you can build upon based on your specific requirements.

Additional Issue: Handling Short Text Chunks

Regarding another issue that's different from the above - you've come across a situation where text like "Oh that's exciting! You got 50%" gets sent as two separate chunks: first "Oh that's exciting!" then "You got 50%". Our suggestion is that you need to implement another custom logic using a custom frame processor or text aggregator. This function should take a minimum character length parameter, and as frames come in, it stores them first then chunks them. Right after chunking, check the length of the chunked text - if it's less than the minimum character length, store it in a buffer variable. When the next chunk is produced, merge both the buffer variable and recently chunked text before sending it to TTS. This approach ensures that short, related text fragments stay together for more natural speech output.