Why Use KugelAudio with PipeCat?
- Native service: Drop-in
TTSServicefor PipeCat pipelines - Persistent WebSocket: Connection reuse keeps the handshake off the hot path
- Built-in metrics: Automatic TTFB and usage metrics tracking
- Ultra-low latency: streaming TTS built for real-time agents — see Latency for current TTFA figures
Installation
pipecat-ai>=1.0).
The PipeCat integration requires Python 3.10 or higher. Pipecat 1.x is supported; use
LLMContext + LLMContextAggregatorPair (see sdks/python/examples/pipecat_local_bot.py).Quick Start
Basic Pipeline
Set the
KUGELAUDIO_API_KEY environment variable or pass api_key directly to the constructor.Configuration
Service Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | KUGELAUDIO_API_KEY env | Your KugelAudio API key |
model | str | kugel-3 | TTS model (kugel-3) |
voice_id | int | required | Voice ID to use for synthesis |
sample_rate | int | 24000 | Output sample rate in Hz |
cfg_scale | float | 2.0 | CFG scale for generation quality |
max_new_tokens | int | 2048 | Maximum tokens to generate |
language | str | None | None | ISO 639-1 language code (e.g., en, de). Skips server-side auto-detection — see Latency |
normalize | bool | True | Apply text normalization |
base_url | str | https://api.kugelaudio.com | API base URL |
Supported Sample Rates
| Rate | Notes |
|---|---|
24000 | Native rate (recommended) |
22050 | CD quality |
16000 | Wideband telephony |
8000 | Narrowband telephony |
Models
Usekugel-3 — the current production model for all use cases (voice agents,
narration, brand voices). See Models for capabilities and
Latency for TTFA figures.
Performance Optimization
Pre-warming the Connection
Callprewarm() during pipeline setup to establish the WebSocket connection before the first synthesis request. This keeps the TCP+TLS+WebSocket handshake out of the first call — see Latency.
Turn context pre-provisioning (Pipecat 1.x)
Pipecat 1.x mints a fresh TTScontext_id on every assistant turn. The service automatically calls the server’s create_context on LLMFullResponseStartFrame (when the LLM starts responding), before the first TTS text chunk arrives. That hides the WebSocket round-trip behind LLM time-to-first-token instead of adding it to measured TTFA.
No configuration required — call prewarm() as usual and ensure language is set.
Setting the Language
When you know the language of your input text, always set thelanguage parameter. Without it, the server auto-detects the language on each request, adding latency — see Latency.
Connection Reuse
The service automatically reuses a persistent WebSocket connection acrossrun_tts() calls. This avoids the TCP+TLS+WebSocket handshake overhead on every request. If the connection drops, a new one is established transparently on the next call.
Each Pipecat 1.x turn still opens a new server-side context (required for correct turn isolation and to avoid context-cap leaks). Only the WebSocket connection is reused — not the engine KV session across turns.
TTFA logging
WhenKugelAudio TTFA: appears in logs, it measures text send → first audio chunk on the WebSocket (after any turn-context pre-provision). It does not include LLM or STT latency. End-to-end numbers depend heavily on network path — co-located clients see much lower numbers than remote dev machines. See Latency for reference figures and how to measure correctly.
Usage Patterns
Updating Voice and Model at Runtime
You can change the voice or model dynamically during a pipeline session:Pipeline Frame Flow
TheKugelAudioTTSService emits standard PipeCat frames:
TTSStartedFrame- Audio generation has begunTTSAudioRawFrame- Raw PCM audio chunks (16-bit, mono)TTSStoppedFrame- Audio generation is completeErrorFrame- If an error occurs during synthesis
Metrics Support
KugelAudio’s PipeCat service automatically tracks performance metrics:Complete Voice Bot Example
Here’s a complete voice bot using PipeCat with Daily as the transport:Running the Bot
Environment Variables
| Variable | Required | Description |
|---|---|---|
KUGELAUDIO_API_KEY | Yes | Your KugelAudio API key |
KUGELAUDIO_BASE_URL | No | Override API base URL (e.g. http://127.0.0.1:8002 for local ingress dev) |
DAILY_ROOM_URL | Yes* | Daily room URL (if using Daily transport) |
DAILY_TOKEN | Yes* | Daily room token |
DEEPGRAM_API_KEY | Yes* | Required if using Deepgram STT |
OPENAI_API_KEY | Yes* | Required if using OpenAI LLM |
Troubleshooting
API key not found
API key not found
Make sure
KUGELAUDIO_API_KEY is set in your environment or pass api_key directly:Unsupported sample rate error
Unsupported sample rate error
KugelAudio supports these sample rates:
24000, 22050, 16000, 8000. Make sure your transport output sample rate matches:WebSocket connection fails
WebSocket connection fails
Verify your
base_url is correct and the KugelAudio API is reachable. The service connects via WebSocket (wss://) for audio streaming. If a persistent connection drops, the service automatically reconnects on the next run_tts() call.High latency
High latency
Check in order:
languageunset — every request pays language auto-detection.prewarm()not called — the first request pays the WebSocket handshake.- Network path — measuring from a laptop against a remote engine adds your full RTT on top of inference. Exec from the ingress pod or use the production API for apples-to-apples TTFA. See Latency for reference figures.
- Pipecat 1.x per-turn contexts — each turn opens a fresh server context (by design). Turn-context pre-provisioning hides the WS setup cost behind LLM latency; it does not remove engine cold-open per turn.
Python version incompatibility
Python version incompatibility
The PipeCat integration requires Python 3.10 or higher. Check your version:
Next Steps
LiveKit Integration
Use KugelAudio with LiveKit Agents
Streaming
Advanced streaming techniques