> ## Documentation Index
> Fetch the complete documentation index at: https://docs.kugelaudio.com/llms.txt
> Use this file to discover all available pages before exploring further.

# PipeCat Integration

> Use KugelAudio TTS with the PipeCat voice AI framework

KugelAudio provides an official TTS service for [PipeCat](https://github.com/pipecat-ai/pipecat), enabling high-quality voice synthesis in your voice AI pipelines.

## Why Use KugelAudio with PipeCat?

* **Native service:** Drop-in `TTSService` for PipeCat pipelines
* **Persistent WebSocket:** Connection reuse keeps the handshake off the hot path
* **Built-in metrics:** Automatic TTFB and usage metrics tracking
* **Ultra-low latency:** streaming TTS built for real-time agents — see [Latency](/latency) for current TTFA figures

## Installation

```bash theme={null}
pip install kugelaudio[pipecat]
```

This installs the KugelAudio SDK along with the required PipeCat dependency (`pipecat-ai>=1.0`).

<Note>
  The PipeCat integration requires Python 3.10 or higher. Pipecat 1.x is supported; use `LLMContext` + `LLMContextAggregatorPair` (see `sdks/python/examples/pipecat_local_bot.py`).
</Note>

## Quick Start

### Basic Pipeline

```python theme={null}
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from kugelaudio.pipecat import KugelAudioTTSService

# Create the TTS service
tts = KugelAudioTTSService(
    api_key="your-api-key",
    model="kugel-3",
    voice_id=1071,
    sample_rate=24000,
    language="en",  # Set language to skip auto-detection (lower latency)
)
tts.prewarm()  # Pre-establish WebSocket connection for faster first request

# Use in a PipeCat pipeline
pipeline = Pipeline([
    transport.input(),   # Audio/text input
    stt,                 # Speech-to-text
    llm,                 # Language model
    tts,                 # KugelAudio TTS
    transport.output(),  # Audio output
])

runner = PipelineRunner()
task = PipelineTask(pipeline)
await runner.run(task)
```

<Note>
  Set the `KUGELAUDIO_API_KEY` environment variable or pass `api_key` directly to the constructor.
</Note>

## Configuration

### Service Parameters

| Parameter        | Type          | Default                      | Description                                                                                            |
| ---------------- | ------------- | ---------------------------- | ------------------------------------------------------------------------------------------------------ |
| `api_key`        | `str`         | `KUGELAUDIO_API_KEY` env     | Your KugelAudio API key                                                                                |
| `model`          | `str`         | `kugel-3`                    | TTS model (`kugel-3`)                                                                                  |
| `voice_id`       | `int`         | **required**                 | Voice ID to use for synthesis                                                                          |
| `sample_rate`    | `int`         | `24000`                      | Output sample rate in Hz                                                                               |
| `cfg_scale`      | `float`       | `2.0`                        | CFG scale for generation quality                                                                       |
| `max_new_tokens` | `int`         | `2048`                       | Maximum tokens to generate                                                                             |
| `language`       | `str \| None` | `None`                       | ISO 639-1 language code (e.g., `en`, `de`). Skips server-side auto-detection — see [Latency](/latency) |
| `normalize`      | `bool`        | `True`                       | Apply text normalization                                                                               |
| `base_url`       | `str`         | `https://api.kugelaudio.com` | API base URL                                                                                           |

### Supported Sample Rates

| Rate    | Notes                     |
| ------- | ------------------------- |
| `24000` | Native rate (recommended) |
| `22050` | CD quality                |
| `16000` | Wideband telephony        |
| `8000`  | Narrowband telephony      |

<Tip>
  Use the native `24000` Hz sample rate for best quality and lowest latency. Lower rates use server-side resampling with negligible impact — see [Latency](/latency).
</Tip>

### Models

Use `kugel-3` — the current production model for all use cases (voice agents,
narration, brand voices). See [Models](/models) for capabilities and
[Latency](/latency) for TTFA figures.

## Performance Optimization

### Pre-warming the Connection

Call `prewarm()` during pipeline setup to establish the WebSocket connection before the first synthesis request. This keeps the TCP+TLS+WebSocket handshake out of the first call — see [Latency](/latency).

```python theme={null}
tts = KugelAudioTTSService(
    model="kugel-3",
    voice_id=1071,
    language="en",
)
tts.prewarm()  # Connects in background, first run_tts() is fast
```

### Turn context pre-provisioning (Pipecat 1.x)

Pipecat 1.x mints a fresh TTS `context_id` on every assistant turn. The service automatically calls the server's `create_context` on `LLMFullResponseStartFrame` (when the LLM starts responding), **before** the first TTS text chunk arrives. That hides the WebSocket round-trip behind LLM time-to-first-token instead of adding it to measured TTFA.

No configuration required — call `prewarm()` as usual and ensure `language` is set.

### Setting the Language

When you know the language of your input text, always set the `language` parameter. Without it, the server auto-detects the language on each request, adding latency — see [Latency](/latency).

```python theme={null}
# Fast: explicit language skips auto-detection
tts = KugelAudioTTSService(language="de")

# Slower: server auto-detects language on every request
tts = KugelAudioTTSService()
```

<Tip>
  For lowest latency, always set `language` and call `prewarm()` — see [Latency](/latency) for what each saves.
</Tip>

### Connection Reuse

The service automatically reuses a persistent WebSocket connection across `run_tts()` calls. This avoids the TCP+TLS+WebSocket handshake overhead on every request. If the connection drops, a new one is established transparently on the next call.

Each Pipecat 1.x turn still opens a **new server-side context** (required for correct turn isolation and to avoid context-cap leaks). Only the WebSocket connection is reused — not the engine KV session across turns.

### TTFA logging

When `KugelAudio TTFA:` appears in logs, it measures **text send → first audio chunk** on the WebSocket (after any turn-context pre-provision). It does not include LLM or STT latency. End-to-end numbers depend heavily on network path — co-located clients see much lower numbers than remote dev machines. See [Latency](/latency) for reference figures and how to measure correctly.

## Usage Patterns

### Updating Voice and Model at Runtime

You can change the voice or model dynamically during a pipeline session:

```python theme={null}
tts = KugelAudioTTSService(
    model="kugel-3",
    voice_id=1071,
)

# Switch voice mid-conversation (closes cached WebSocket first)
await tts.set_voice("300")

# Switch to higher quality model
await tts.set_model("kugel-3")
```

### Pipeline Frame Flow

The `KugelAudioTTSService` emits standard PipeCat frames:

1. `TTSStartedFrame` - Audio generation has begun
2. `TTSAudioRawFrame` - Raw PCM audio chunks (16-bit, mono)
3. `TTSStoppedFrame` - Audio generation is complete
4. `ErrorFrame` - If an error occurs during synthesis

```python theme={null}
from pipecat.frames.frames import (
    TTSStartedFrame,
    TTSAudioRawFrame,
    TTSStoppedFrame,
)

# The TTS service yields frames in this order:
# TTSStartedFrame -> TTSAudioRawFrame* -> TTSStoppedFrame
```

### Metrics Support

KugelAudio's PipeCat service automatically tracks performance metrics:

```python theme={null}
tts = KugelAudioTTSService(
    model="kugel-3",
    voice_id=1071,
)

# Metrics are tracked automatically:
# - TTFB (Time to First Byte): measured from request to first audio chunk
# - TTS Usage: character count per request
print(tts.can_generate_metrics())  # True
```

## Complete Voice Bot Example

Here's a complete voice bot using PipeCat with Daily as the transport:

```python theme={null}
import asyncio
import os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.services.daily import DailyTransport, DailyParams
from pipecat.services.openai import OpenAILLMService
from pipecat.services.deepgram import DeepgramSTTService
from kugelaudio.pipecat import KugelAudioTTSService

async def main():
    # Transport (Daily WebRTC)
    transport = DailyTransport(
        room_url=os.environ["DAILY_ROOM_URL"],
        token=os.environ["DAILY_TOKEN"],
        bot_name="KugelAudio Bot",
        params=DailyParams(audio_out_sample_rate=24000),
    )

    # STT
    stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])

    # LLM
    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        model="gpt-4o-mini",
    )

    # TTS - KugelAudio
    tts = KugelAudioTTSService(
        model="kugel-3",
        voice_id=1071,
        sample_rate=24000,
        language="en",  # Skip auto-detection for lower latency
    )
    tts.prewarm()  # Pre-establish WebSocket connection

    # Build pipeline
    pipeline = Pipeline([
        transport.input(),
        stt,
        llm,
        tts,
        transport.output(),
    ])

    runner = PipelineRunner()
    task = PipelineTask(pipeline)
    await runner.run(task)

if __name__ == "__main__":
    asyncio.run(main())
```

### Running the Bot

```bash theme={null}
# Set environment variables
export KUGELAUDIO_API_KEY="your-api-key"
export DAILY_ROOM_URL="https://your-domain.daily.co/room"
export DAILY_TOKEN="your-daily-token"
export DEEPGRAM_API_KEY="your-deepgram-key"
export OPENAI_API_KEY="your-openai-key"

# Run the bot
python voice_bot.py
```

## Environment Variables

| Variable              | Required | Description                                                                |
| --------------------- | -------- | -------------------------------------------------------------------------- |
| `KUGELAUDIO_API_KEY`  | Yes      | Your KugelAudio API key                                                    |
| `KUGELAUDIO_BASE_URL` | No       | Override API base URL (e.g. `http://127.0.0.1:8002` for local ingress dev) |
| `DAILY_ROOM_URL`      | Yes\*    | Daily room URL (if using Daily transport)                                  |
| `DAILY_TOKEN`         | Yes\*    | Daily room token                                                           |
| `DEEPGRAM_API_KEY`    | Yes\*    | Required if using Deepgram STT                                             |
| `OPENAI_API_KEY`      | Yes\*    | Required if using OpenAI LLM                                               |

## Troubleshooting

<AccordionGroup>
  <Accordion title="API key not found">
    Make sure `KUGELAUDIO_API_KEY` is set in your environment or pass `api_key` directly:

    ```python theme={null}
    tts = KugelAudioTTSService(api_key="your-api-key")
    ```
  </Accordion>

  <Accordion title="Unsupported sample rate error">
    KugelAudio supports these sample rates: `24000`, `22050`, `16000`, `8000`. Make sure your transport output sample rate matches:

    ```python theme={null}
    # Both must match
    tts = KugelAudioTTSService(sample_rate=24000)
    transport = DailyTransport(
        params=DailyParams(audio_out_sample_rate=24000),
    )
    ```
  </Accordion>

  <Accordion title="WebSocket connection fails">
    Verify your `base_url` is correct and the KugelAudio API is reachable. The service connects via WebSocket (`wss://`) for audio streaming. If a persistent connection drops, the service automatically reconnects on the next `run_tts()` call.
  </Accordion>

  <Accordion title="High latency">
    Check in order:

    1. **`language` unset** — every request pays language auto-detection.
    2. **`prewarm()` not called** — the first request pays the WebSocket handshake.
    3. **Network path** — measuring from a laptop against a remote engine adds your full RTT on top of inference. Exec from the ingress pod or use the production API for apples-to-apples TTFA. See [Latency](/latency) for reference figures.
    4. **Pipecat 1.x per-turn contexts** — each turn opens a fresh server context (by design). Turn-context pre-provisioning hides the WS setup cost behind LLM latency; it does not remove engine cold-open per turn.

    See [Performance Optimization](#performance-optimization) and [Measuring TTFA correctly](/latency#measuring-ttfa-correctly).
  </Accordion>

  <Accordion title="Python version incompatibility">
    The PipeCat integration requires Python 3.10 or higher. Check your version:

    ```bash theme={null}
    python --version
    ```
  </Accordion>
</AccordionGroup>

## Next Steps

<CardGroup cols={2}>
  <Card title="LiveKit Integration" icon="signal-stream" href="/integrations/livekit">
    Use KugelAudio with LiveKit Agents
  </Card>

  <Card title="Streaming" icon="wave-pulse" href="/streaming/overview">
    Advanced streaming techniques
  </Card>
</CardGroup>
