Skip to main content
KugelAudio provides an official plugin for the LiveKit Agents framework, enabling ultra-low latency text-to-speech in your voice AI agents.

Why Use KugelAudio with LiveKit?

  • Native plugin: Drop-in TTS provider for LiveKit’s AgentSession
  • Streaming support: Real-time WebSocket-based audio streaming
  • Ultra-low latency: streaming TTS built for real-time agents — see Latency for current TTFA figures
  • Simple setup: Works with VoicePipelineAgent and the new AgentSession API

Installation

pip install kugelaudio[livekit]
This installs the KugelAudio SDK along with the required LiveKit Agents dependencies (livekit-agents>=1.0.0).

Quick Start

Minimal Voice Agent

from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import deepgram, openai, silero
from kugelaudio.livekit import TTS as KugelAudioTTS

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    participant = await ctx.wait_for_participant()

    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=KugelAudioTTS(
            model="kugel-3",
            voice_id=1071,
            sample_rate=24000,
        ),
        vad=silero.VAD.load(),
    )

    agent = Agent(
        instructions="You are a helpful voice assistant."
    )

    await session.start(room=ctx.room, agent=agent)
    await session.say("Hello! How can I help you?")

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
Set the KUGELAUDIO_API_KEY environment variable or pass api_key directly to the TTS constructor.

Configuration

TTS Parameters

ParameterTypeDefaultDescription
api_keystrKUGELAUDIO_API_KEY envYour KugelAudio API key
modelstrkugel-3TTS model (kugel-3)
voice_idint | NoneNoneVoice ID to use (server default if None)
sample_rateint24000Output sample rate in Hz
cfg_scalefloat2.0CFG scale for generation quality
max_new_tokensint2048Maximum tokens to generate
normalizeboolTrueApply loudness normalization to output audio
languagestr | NoneNoneISO 639-1 language code (e.g. "de", "en"). Skips auto-detection — see Latency
base_urlstrhttps://api.kugelaudio.comAPI base URL
word_timestampsboolFalseEnable word-level time alignments (opt-in; required for aligned transcript)
http_sessionClientSession | NoneNoneOptional aiohttp session to reuse

Supported Sample Rates

RateNotes
24000Native rate (recommended)
22050CD quality
16000Wideband telephony
8000Narrowband telephony
Use the native 24000 Hz sample rate for best quality and lowest latency. Lower rates use server-side resampling with negligible impact — see Latency.

Models

Use kugel-3 — the current production model for all use cases (voice agents, narration, brand voices). See Models for capabilities and Latency for TTFA figures. See Models for the full comparison.

Usage Patterns

Non-Streaming Synthesis

Use synthesize() for one-shot text-to-speech:
from kugelaudio.livekit import TTS

tts = TTS(model="kugel-3", voice_id=1071)

# Synthesize a complete text
stream = tts.synthesize("Hello, this is a complete sentence.")
async for event in stream:
    # Process audio frames
    pass

Streaming Synthesis

Use stream() for real-time text input (e.g., from an LLM):
from kugelaudio.livekit import TTS

tts = TTS(model="kugel-3", voice_id=1071)

# Create a streaming session
stream = tts.stream()

# Send text chunks as they arrive from an LLM
stream.push_text("Hello, ")
stream.push_text("how are you today?")
stream.flush()
stream.end_input()

# Receive audio frames
async for event in stream:
    # Process audio frames
    pass

Setting the Language

Set language to skip server-side auto-detection on every request (see Latency):
tts = KugelAudioTTS(
    model="kugel-3",
    voice_id=1071,
    language="de",  # German text normalization (e.g. "123" → "einhundertdreiundzwanzig")
)
Supported languages: de, en, fr, es, it, pt, nl, pl, sv, da, no, fi, cs, hu, ro, el, uk, bg, tr, vi, ar, hi, zh, ja, ko.
Always set language when you know the output language in advance. This is especially important for real-time voice agents where every millisecond counts.

Updating Options at Runtime

You can change TTS options dynamically without creating a new instance:
tts = KugelAudioTTS(model="kugel-3", voice_id=1071)

# Switch voice mid-conversation
tts.update_options(voice_id=300)

# Switch to higher quality model
tts.update_options(model="kugel-3")

# Set or change language
tts.update_options(language="de")

# Adjust generation parameters
tts.update_options(cfg_scale=1.5, max_new_tokens=4096)

Word-Level Alignment

Word timestamps are off by default (including for kugel-3), which avoids server-side post-processing errors on models where alignment is not yet supported. When you set word_timestamps=True, the server performs forced alignment on each audio chunk and delivers per-word timing alongside the audio. LiveKit’s AgentSession uses these timings for barge-in and transcript sync via the aligned_transcript capability (advertised only when timestamps are enabled).
tts = KugelAudioTTS(
    model="kugel-3",
    voice_id=1071,
    word_timestamps=True,  # opt-in
)

# LiveKit receives TimedString objects with word boundaries automatically
Word alignments add no extra audio latency when supported. Timestamps are delivered shortly after each audio chunk — see Word timestamps.
If synthesis fails with “Audio post-processing failed”, keep word_timestamps=False (the default) or switch to a model that supports alignment.

Plugin Registration

You can also register KugelAudio as a LiveKit plugin namespace:
from kugelaudio.livekit import register_plugin

# Register the plugin
register_plugin()

# Now available via livekit.plugins namespace
from livekit.plugins import kugelaudio
tts = kugelaudio.TTS(model="kugel-3")

Complete Voice Agent Example

Here’s a production-ready voice agent with metrics logging:
import logging
import os
from livekit.agents import (
    Agent, AgentSession, JobContext,
    WorkerOptions, cli, metrics,
)
from livekit.agents.voice import MetricsCollectedEvent
from livekit.plugins import deepgram, openai, silero
from kugelaudio.livekit import TTS as KugelAudioTTS

logger = logging.getLogger("voice-agent")

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    participant = await ctx.wait_for_participant()

    # Initialize components
    tts = KugelAudioTTS(
        voice_id=int(os.environ.get("KUGELAUDIO_VOICE_ID", "280")),
        model="kugel-3",
        sample_rate=24000,
    )

    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=tts,
        vad=silero.VAD.load(),
    )

    # Log TTS metrics
    @session.on("metrics_collected")
    def on_metrics(ev: MetricsCollectedEvent):
        for metric in ev.metrics:
            if hasattr(metric, "ttfb") and hasattr(metric, "characters_count"):
                logger.info(
                    f"TTS: ttfb={metric.ttfb:.3f}s, "
                    f"duration={metric.duration:.3f}s, "
                    f"chars={metric.characters_count}"
                )
        metrics.log_metrics(ev.metrics)

    agent = Agent(
        instructions="""You are a helpful voice assistant. 
Keep responses concise (1-3 sentences) for natural conversation."""
    )

    await session.start(room=ctx.room, agent=agent)
    await session.say("Hello! How can I help you today?")

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Running the Agent

# Set environment variables
export KUGELAUDIO_API_KEY="your-api-key"
export LIVEKIT_URL="wss://your-livekit-server.com"
export LIVEKIT_API_KEY="your-livekit-key"
export LIVEKIT_API_SECRET="your-livekit-secret"

# Run in console mode (for testing)
python voice_agent.py console

# Run as a worker (for production)
python voice_agent.py start

Environment Variables

VariableRequiredDescription
KUGELAUDIO_API_KEYYesYour KugelAudio API key
LIVEKIT_URLYesYour LiveKit server URL
LIVEKIT_API_KEYYesLiveKit API key
LIVEKIT_API_SECRETYesLiveKit API secret
KUGELAUDIO_VOICE_IDNoDefault voice ID to use
DEEPGRAM_API_KEYYes*Required if using Deepgram STT
OPENAI_API_KEYYes*Required if using OpenAI LLM

Troubleshooting

Make sure KUGELAUDIO_API_KEY is set in your environment or pass api_key directly:
tts = KugelAudioTTS(api_key="your-api-key")
Verify your base_url is correct and the KugelAudio API is reachable. The plugin connects via WebSocket (wss://) for audio streaming.
  • Use the native 24000 Hz sample rate for best results
  • Try increasing cfg_scale (e.g., 2.5) for more expressive output
  • Switch to kugel-3 model for premium quality
  • Set language explicitly (e.g. language="de") to skip auto-detection — see Latency
  • Use kugel-3 for real-time conversations when latency matters more than prosody
  • Lower cfg_scale (e.g., 1.5) trades slight quality for speed
  • Reuse http_session across requests to avoid connection overhead

Next Steps

PipeCat Integration

Use KugelAudio with PipeCat pipelines

Streaming

Advanced streaming techniques