Skip to main content
KugelAudio provides an official plugin for the LiveKit Agents framework, enabling ultra-low latency text-to-speech in your voice AI agents.

Why Use KugelAudio with LiveKit?

  • Native plugin: Drop-in TTS provider for LiveKit’s AgentSession
  • Streaming support: Real-time WebSocket-based audio streaming
  • Ultra-low latency: ~39ms time-to-first-audio with kugel-1-turbo
  • Simple setup: Works with VoicePipelineAgent and the new AgentSession API

Installation

pip install kugelaudio[livekit]
This installs the KugelAudio SDK along with the required LiveKit Agents dependencies (livekit-agents>=1.0.0).

Quick Start

Minimal Voice Agent

from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import deepgram, openai, silero
from kugelaudio.livekit import TTS as KugelAudioTTS

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    participant = await ctx.wait_for_participant()

    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=KugelAudioTTS(
            model="kugel-1-turbo",
            voice_id=280,
            sample_rate=24000,
        ),
        vad=silero.VAD.load(),
    )

    agent = Agent(
        instructions="You are a helpful voice assistant."
    )

    await session.start(room=ctx.room, agent=agent)
    await session.say("Hello! How can I help you?")

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
Set the KUGELAUDIO_API_KEY environment variable or pass api_key directly to the TTS constructor.

Configuration

TTS Parameters

ParameterTypeDefaultDescription
api_keystrKUGELAUDIO_API_KEY envYour KugelAudio API key
modelstrkugel-1-turboTTS model (kugel-1-turbo or kugel-1)
voice_idint | NoneNoneVoice ID to use (server default if None)
sample_rateint24000Output sample rate in Hz
cfg_scalefloat2.0CFG scale for generation quality
max_new_tokensint2048Maximum tokens to generate
base_urlstrhttps://api.kugelaudio.comAPI base URL
word_timestampsboolTrueEnable word-level time alignments
http_sessionClientSession | NoneNoneOptional aiohttp session to reuse

Supported Sample Rates

RateNotes
24000Native rate (recommended)
22050CD quality
16000Wideband telephony
8000Narrowband telephony
Use the native 24000 Hz sample rate for best quality and lowest latency. Lower rates use server-side resampling with minimal impact (~0.1ms per chunk).

Models

ModelParametersLatencyQualityUse Case
kugel-1-turbo1.5B~39ms TTFAHighReal-time conversations
kugel-17B~77ms TTFAExceptionalPremium quality applications

Usage Patterns

Non-Streaming Synthesis

Use synthesize() for one-shot text-to-speech:
from kugelaudio.livekit import TTS

tts = TTS(model="kugel-1-turbo", voice_id=280)

# Synthesize a complete text
stream = tts.synthesize("Hello, this is a complete sentence.")
async for event in stream:
    # Process audio frames
    pass

Streaming Synthesis

Use stream() for real-time text input (e.g., from an LLM):
from kugelaudio.livekit import TTS

tts = TTS(model="kugel-1-turbo", voice_id=280)

# Create a streaming session
stream = tts.stream()

# Send text chunks as they arrive from an LLM
stream.push_text("Hello, ")
stream.push_text("how are you today?")
stream.flush()
stream.end_input()

# Receive audio frames
async for event in stream:
    # Process audio frames
    pass

Updating Options at Runtime

You can change TTS options dynamically without creating a new instance:
tts = KugelAudioTTS(model="kugel-1-turbo", voice_id=280)

# Switch voice mid-conversation
tts.update_options(voice_id=300)

# Switch to higher quality model
tts.update_options(model="kugel-1")

# Adjust generation parameters
tts.update_options(cfg_scale=1.5, max_new_tokens=4096)

Word-Level Alignment

KugelAudio provides word-level time alignments out of the box. When word_timestamps=True (the default), the server performs forced alignment on each audio chunk and delivers per-word timing information alongside the audio. LiveKit’s AgentSession uses these timings automatically for accurate barge-in handling and transcript synchronization via the aligned_transcript capability.
tts = KugelAudioTTS(
    model="kugel-1-turbo",
    voice_id=280,
    word_timestamps=True,  # enabled by default
)

# LiveKit will receive TimedString objects with word boundaries
# No additional code needed — the plugin converts timestamps automatically
Word alignments add no extra audio latency. The alignment runs on the same GPU as the TTS model and timestamps are delivered shortly after each audio chunk (~50-200ms).
To disable word timestamps (e.g., to reduce server-side computation):
tts = KugelAudioTTS(word_timestamps=False)

Plugin Registration

You can also register KugelAudio as a LiveKit plugin namespace:
from kugelaudio.livekit import register_plugin

# Register the plugin
register_plugin()

# Now available via livekit.plugins namespace
from livekit.plugins import kugelaudio
tts = kugelaudio.TTS(model="kugel-1-turbo")

Complete Voice Agent Example

Here’s a production-ready voice agent with metrics logging:
import logging
import os
from livekit.agents import (
    Agent, AgentSession, JobContext,
    WorkerOptions, cli, metrics,
)
from livekit.agents.voice import MetricsCollectedEvent
from livekit.plugins import deepgram, openai, silero
from kugelaudio.livekit import TTS as KugelAudioTTS

logger = logging.getLogger("voice-agent")

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    participant = await ctx.wait_for_participant()

    # Initialize components
    tts = KugelAudioTTS(
        voice_id=int(os.environ.get("KUGELAUDIO_VOICE_ID", "280")),
        model="kugel-1-turbo",
        sample_rate=24000,
    )

    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=tts,
        vad=silero.VAD.load(),
    )

    # Log TTS metrics
    @session.on("metrics_collected")
    def on_metrics(ev: MetricsCollectedEvent):
        for metric in ev.metrics:
            if hasattr(metric, "ttfb") and hasattr(metric, "characters_count"):
                logger.info(
                    f"TTS: ttfb={metric.ttfb:.3f}s, "
                    f"duration={metric.duration:.3f}s, "
                    f"chars={metric.characters_count}"
                )
        metrics.log_metrics(ev.metrics)

    agent = Agent(
        instructions="""You are a helpful voice assistant. 
Keep responses concise (1-3 sentences) for natural conversation."""
    )

    await session.start(room=ctx.room, agent=agent)
    await session.say("Hello! How can I help you today?")

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Running the Agent

# Set environment variables
export KUGELAUDIO_API_KEY="your-api-key"
export LIVEKIT_URL="wss://your-livekit-server.com"
export LIVEKIT_API_KEY="your-livekit-key"
export LIVEKIT_API_SECRET="your-livekit-secret"

# Run in console mode (for testing)
python voice_agent.py console

# Run as a worker (for production)
python voice_agent.py start

Environment Variables

VariableRequiredDescription
KUGELAUDIO_API_KEYYesYour KugelAudio API key
LIVEKIT_URLYesYour LiveKit server URL
LIVEKIT_API_KEYYesLiveKit API key
LIVEKIT_API_SECRETYesLiveKit API secret
KUGELAUDIO_VOICE_IDNoDefault voice ID to use
DEEPGRAM_API_KEYYes*Required if using Deepgram STT
OPENAI_API_KEYYes*Required if using OpenAI LLM

Troubleshooting

Make sure KUGELAUDIO_API_KEY is set in your environment or pass api_key directly:
tts = KugelAudioTTS(api_key="your-api-key")
Verify your base_url is correct and the KugelAudio API is reachable. The plugin connects via WebSocket (wss://) for audio streaming.
  • Use the native 24000 Hz sample rate for best results
  • Try increasing cfg_scale (e.g., 2.5) for more expressive output
  • Switch to kugel-1 model for premium quality
  • Use kugel-1-turbo for real-time conversations
  • Lower cfg_scale (e.g., 1.5) trades slight quality for speed
  • Reuse http_session across requests to avoid connection overhead

Next Steps