Why Use KugelAudio with LiveKit?
- Native plugin: Drop-in TTS provider for LiveKit’s
AgentSession - Streaming support: Real-time WebSocket-based audio streaming
- Ultra-low latency: streaming TTS built for real-time agents — see Latency for current TTFA figures
- Simple setup: Works with
VoicePipelineAgentand the newAgentSessionAPI
Installation
livekit-agents>=1.0.0).
Quick Start
Minimal Voice Agent
Set the
KUGELAUDIO_API_KEY environment variable or pass api_key directly to the TTS constructor.Configuration
TTS Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | KUGELAUDIO_API_KEY env | Your KugelAudio API key |
model | str | kugel-3 | TTS model (kugel-3) |
voice_id | int | None | None | Voice ID to use (server default if None) |
sample_rate | int | 24000 | Output sample rate in Hz |
cfg_scale | float | 2.0 | CFG scale for generation quality |
max_new_tokens | int | 2048 | Maximum tokens to generate |
normalize | bool | True | Apply loudness normalization to output audio |
language | str | None | None | ISO 639-1 language code (e.g. "de", "en"). Skips auto-detection — see Latency |
base_url | str | https://api.kugelaudio.com | API base URL |
word_timestamps | bool | False | Enable word-level time alignments (opt-in; required for aligned transcript) |
http_session | ClientSession | None | None | Optional aiohttp session to reuse |
Supported Sample Rates
| Rate | Notes |
|---|---|
24000 | Native rate (recommended) |
22050 | CD quality |
16000 | Wideband telephony |
8000 | Narrowband telephony |
Models
Usekugel-3 — the current production model for all use cases (voice agents,
narration, brand voices). See Models for capabilities and
Latency for TTFA figures.
See Models for the full comparison.
Usage Patterns
Non-Streaming Synthesis
Usesynthesize() for one-shot text-to-speech:
Streaming Synthesis
Usestream() for real-time text input (e.g., from an LLM):
Setting the Language
Setlanguage to skip server-side auto-detection on every request (see Latency):
de, en, fr, es, it, pt, nl, pl, sv, da, no, fi, cs, hu, ro, el, uk, bg, tr, vi, ar, hi, zh, ja, ko.
Updating Options at Runtime
You can change TTS options dynamically without creating a new instance:Word-Level Alignment
Word timestamps are off by default (including forkugel-3), which avoids server-side post-processing errors on models where alignment is not yet supported.
When you set word_timestamps=True, the server performs forced alignment on each audio chunk and delivers per-word timing alongside the audio. LiveKit’s AgentSession uses these timings for barge-in and transcript sync via the aligned_transcript capability (advertised only when timestamps are enabled).
Word alignments add no extra audio latency when supported. Timestamps are delivered shortly after each audio chunk — see Word timestamps.
Plugin Registration
You can also register KugelAudio as a LiveKit plugin namespace:Complete Voice Agent Example
Here’s a production-ready voice agent with metrics logging:Running the Agent
Environment Variables
| Variable | Required | Description |
|---|---|---|
KUGELAUDIO_API_KEY | Yes | Your KugelAudio API key |
LIVEKIT_URL | Yes | Your LiveKit server URL |
LIVEKIT_API_KEY | Yes | LiveKit API key |
LIVEKIT_API_SECRET | Yes | LiveKit API secret |
KUGELAUDIO_VOICE_ID | No | Default voice ID to use |
DEEPGRAM_API_KEY | Yes* | Required if using Deepgram STT |
OPENAI_API_KEY | Yes* | Required if using OpenAI LLM |
Troubleshooting
API key not found
API key not found
Make sure
KUGELAUDIO_API_KEY is set in your environment or pass api_key directly:WebSocket connection fails
WebSocket connection fails
Verify your
base_url is correct and the KugelAudio API is reachable. The plugin connects via WebSocket (wss://) for audio streaming.Audio quality issues
Audio quality issues
- Use the native
24000Hz sample rate for best results - Try increasing
cfg_scale(e.g.,2.5) for more expressive output - Switch to
kugel-3model for premium quality
High latency
High latency
- Set
languageexplicitly (e.g.language="de") to skip auto-detection — see Latency - Use
kugel-3for real-time conversations when latency matters more than prosody - Lower
cfg_scale(e.g.,1.5) trades slight quality for speed - Reuse
http_sessionacross requests to avoid connection overhead
Next Steps
PipeCat Integration
Use KugelAudio with PipeCat pipelines
Streaming
Advanced streaming techniques