LiveKit Integration

KugelAudio provides an official plugin for the LiveKit Agents framework, enabling ultra-low latency text-to-speech in your voice AI agents.

Why Use KugelAudio with LiveKit?

Native plugin: Drop-in TTS provider for LiveKit’s AgentSession
Streaming support: Real-time WebSocket-based audio streaming
Ultra-low latency: streaming TTS built for real-time agents — see Latency for current TTFA figures
Simple setup: Works with VoicePipelineAgent and the new AgentSession API

Installation

pip install kugelaudio[livekit]

This installs the KugelAudio SDK along with the required LiveKit Agents dependencies (livekit-agents>=1.0.0).

Quick Start

Minimal Voice Agent

from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import deepgram, openai, silero
from kugelaudio.livekit import TTS as KugelAudioTTS

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    participant = await ctx.wait_for_participant()

    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=KugelAudioTTS(
            model="kugel-3",
            voice_id=1071,
            sample_rate=24000,
        ),
        vad=silero.VAD.load(),
    )

    agent = Agent(
        instructions="You are a helpful voice assistant."
    )

    await session.start(room=ctx.room, agent=agent)
    await session.say("Hello! How can I help you?")

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Set the KUGELAUDIO_API_KEY environment variable or pass api_key directly to the TTS constructor.

Configuration

TTS Parameters

Parameter	Type	Default	Description
`api_key`	`str`	`KUGELAUDIO_API_KEY` env	Your KugelAudio API key
`model`	`str`	`kugel-3`	TTS model (`kugel-3`)
`voice_id`	`int \| None`	`None`	Voice ID to use (server default if `None`)
`sample_rate`	`int`	`24000`	Output sample rate in Hz
`cfg_scale`	`float`	`2.0`	CFG scale for generation quality
`max_new_tokens`	`int`	`2048`	Maximum tokens to generate
`normalize`	`bool`	`True`	Apply loudness normalization to output audio
`language`	`str \| None`	`None`	ISO 639-1 language code (e.g. `"de"`, `"en"`). Skips auto-detection — see Latency
`base_url`	`str`	`https://api.kugelaudio.com`	API base URL
`word_timestamps`	`bool`	`False`	Enable word-level time alignments (opt-in; required for aligned transcript)
`http_session`	`ClientSession \| None`	`None`	Optional aiohttp session to reuse

Supported Sample Rates

Rate	Notes
`24000`	Native rate (recommended)
`22050`	CD quality
`16000`	Wideband telephony
`8000`	Narrowband telephony

Use the native 24000 Hz sample rate for best quality and lowest latency. Lower rates use server-side resampling with negligible impact — see Latency.

Models

Use kugel-3 — the current production model for all use cases (voice agents, narration, brand voices). See Models for capabilities and Latency for TTFA figures. See Models for the full comparison.

Usage Patterns

Non-Streaming Synthesis

Use synthesize() for one-shot text-to-speech:

from kugelaudio.livekit import TTS

tts = TTS(model="kugel-3", voice_id=1071)

# Synthesize a complete text
stream = tts.synthesize("Hello, this is a complete sentence.")
async for event in stream:
    # Process audio frames
    pass

Streaming Synthesis

Use stream() for real-time text input (e.g., from an LLM):

from kugelaudio.livekit import TTS

tts = TTS(model="kugel-3", voice_id=1071)

# Create a streaming session
stream = tts.stream()

# Send text chunks as they arrive from an LLM
stream.push_text("Hello, ")
stream.push_text("how are you today?")
stream.flush()
stream.end_input()

# Receive audio frames
async for event in stream:
    # Process audio frames
    pass

Setting the Language

Set language to skip server-side auto-detection on every request (see Latency):

tts = KugelAudioTTS(
    model="kugel-3",
    voice_id=1071,
    language="de",  # German text normalization (e.g. "123" → "einhundertdreiundzwanzig")
)

Supported languages: de, en, fr, es, it, pt, nl, pl, sv, da, no, fi, cs, hu, ro, el, uk, bg, tr, vi, ar, hi, zh, ja, ko.

Always set language when you know the output language in advance. This is especially important for real-time voice agents where every millisecond counts.

Updating Options at Runtime

You can change TTS options dynamically without creating a new instance:

tts = KugelAudioTTS(model="kugel-3", voice_id=1071)

# Switch voice mid-conversation
tts.update_options(voice_id=300)

# Switch to higher quality model
tts.update_options(model="kugel-3")

# Set or change language
tts.update_options(language="de")

# Adjust generation parameters
tts.update_options(cfg_scale=1.5, max_new_tokens=4096)

Word-Level Alignment

Word timestamps are off by default (including for kugel-3), which avoids server-side post-processing errors on models where alignment is not yet supported. When you set word_timestamps=True, the server performs forced alignment on each audio chunk and delivers per-word timing alongside the audio. LiveKit’s AgentSession uses these timings for barge-in and transcript sync via the aligned_transcript capability (advertised only when timestamps are enabled).

tts = KugelAudioTTS(
    model="kugel-3",
    voice_id=1071,
    word_timestamps=True,  # opt-in
)

# LiveKit receives TimedString objects with word boundaries automatically

Word alignments add no extra audio latency when supported. Timestamps are delivered shortly after each audio chunk — see Word timestamps.

If synthesis fails with “Audio post-processing failed”, keep word_timestamps=False (the default) or switch to a model that supports alignment.

Plugin Registration

You can also register KugelAudio as a LiveKit plugin namespace:

from kugelaudio.livekit import register_plugin

# Register the plugin
register_plugin()

# Now available via livekit.plugins namespace
from livekit.plugins import kugelaudio
tts = kugelaudio.TTS(model="kugel-3")

Complete Voice Agent Example

Here’s a production-ready voice agent with metrics logging:

import logging
import os
from livekit.agents import (
    Agent, AgentSession, JobContext,
    WorkerOptions, cli, metrics,
)
from livekit.agents.voice import MetricsCollectedEvent
from livekit.plugins import deepgram, openai, silero
from kugelaudio.livekit import TTS as KugelAudioTTS

logger = logging.getLogger("voice-agent")

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    participant = await ctx.wait_for_participant()

    # Initialize components
    tts = KugelAudioTTS(
        voice_id=int(os.environ.get("KUGELAUDIO_VOICE_ID", "280")),
        model="kugel-3",
        sample_rate=24000,
    )

    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=tts,
        vad=silero.VAD.load(),
    )

    # Log TTS metrics
    @session.on("metrics_collected")
    def on_metrics(ev: MetricsCollectedEvent):
        for metric in ev.metrics:
            if hasattr(metric, "ttfb") and hasattr(metric, "characters_count"):
                logger.info(
                    f"TTS: ttfb={metric.ttfb:.3f}s, "
                    f"duration={metric.duration:.3f}s, "
                    f"chars={metric.characters_count}"
                )
        metrics.log_metrics(ev.metrics)

    agent = Agent(
        instructions="""You are a helpful voice assistant. 
Keep responses concise (1-3 sentences) for natural conversation."""
    )

    await session.start(room=ctx.room, agent=agent)
    await session.say("Hello! How can I help you today?")

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Running the Agent

# Set environment variables
export KUGELAUDIO_API_KEY="your-api-key"
export LIVEKIT_URL="wss://your-livekit-server.com"
export LIVEKIT_API_KEY="your-livekit-key"
export LIVEKIT_API_SECRET="your-livekit-secret"

# Run in console mode (for testing)
python voice_agent.py console

# Run as a worker (for production)
python voice_agent.py start

Environment Variables

Variable	Required	Description
`KUGELAUDIO_API_KEY`	Yes	Your KugelAudio API key
`LIVEKIT_URL`	Yes	Your LiveKit server URL
`LIVEKIT_API_KEY`	Yes	LiveKit API key
`LIVEKIT_API_SECRET`	Yes	LiveKit API secret
`KUGELAUDIO_VOICE_ID`	No	Default voice ID to use
`DEEPGRAM_API_KEY`	Yes*	Required if using Deepgram STT
`OPENAI_API_KEY`	Yes*	Required if using OpenAI LLM

Troubleshooting

API key not found

Make sure KUGELAUDIO_API_KEY is set in your environment or pass api_key directly:

tts = KugelAudioTTS(api_key="your-api-key")

WebSocket connection fails

Verify your base_url is correct and the KugelAudio API is reachable. The plugin connects via WebSocket (wss://) for audio streaming.

Audio quality issues

Use the native 24000 Hz sample rate for best results
Try increasing cfg_scale (e.g., 2.5) for more expressive output
Switch to kugel-3 model for premium quality

High latency

Set language explicitly (e.g. language="de") to skip auto-detection — see Latency
Use kugel-3 for real-time conversations when latency matters more than prosody
Lower cfg_scale (e.g., 1.5) trades slight quality for speed
Reuse http_session across requests to avoid connection overhead

Why Use KugelAudio with LiveKit?

Installation

Quick Start

Minimal Voice Agent

Configuration

TTS Parameters

Supported Sample Rates

Models

Usage Patterns

Non-Streaming Synthesis

Streaming Synthesis

Setting the Language

Updating Options at Runtime

Word-Level Alignment

Plugin Registration

Complete Voice Agent Example

Running the Agent

Environment Variables

Troubleshooting

Next Steps

PipeCat Integration

Streaming

​Why Use KugelAudio with LiveKit?

​Installation

​Quick Start

​Minimal Voice Agent

​Configuration

​TTS Parameters

​Supported Sample Rates

​Models

​Usage Patterns

​Non-Streaming Synthesis

​Streaming Synthesis

​Setting the Language

​Updating Options at Runtime

​Word-Level Alignment

​Plugin Registration

​Complete Voice Agent Example

​Running the Agent

​Environment Variables

​Troubleshooting

​Next Steps

PipeCat Integration

Streaming

Why Use KugelAudio with LiveKit?

Installation

Quick Start

Minimal Voice Agent

Configuration

TTS Parameters

Supported Sample Rates

Models

Usage Patterns

Non-Streaming Synthesis

Streaming Synthesis

Setting the Language

Updating Options at Runtime

Word-Level Alignment

Plugin Registration

Complete Voice Agent Example

Running the Agent

Environment Variables

Troubleshooting

Next Steps