Skip to main content
KugelAudio provides an official TTS service for PipeCat, enabling high-quality voice synthesis in your voice AI pipelines.

Why Use KugelAudio with PipeCat?

  • Native service: Drop-in TTSService for PipeCat pipelines
  • Persistent WebSocket: Connection reuse keeps the handshake off the hot path
  • Built-in metrics: Automatic TTFB and usage metrics tracking
  • Ultra-low latency: streaming TTS built for real-time agents — see Latency for current TTFA figures

Installation

pip install kugelaudio[pipecat]
This installs the KugelAudio SDK along with the required PipeCat dependency (pipecat-ai>=1.0).
The PipeCat integration requires Python 3.10 or higher. Pipecat 1.x is supported; use LLMContext + LLMContextAggregatorPair (see sdks/python/examples/pipecat_local_bot.py).

Quick Start

Basic Pipeline

from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from kugelaudio.pipecat import KugelAudioTTSService

# Create the TTS service
tts = KugelAudioTTSService(
    api_key="your-api-key",
    model="kugel-3",
    voice_id=1071,
    sample_rate=24000,
    language="en",  # Set language to skip auto-detection (lower latency)
)
tts.prewarm()  # Pre-establish WebSocket connection for faster first request

# Use in a PipeCat pipeline
pipeline = Pipeline([
    transport.input(),   # Audio/text input
    stt,                 # Speech-to-text
    llm,                 # Language model
    tts,                 # KugelAudio TTS
    transport.output(),  # Audio output
])

runner = PipelineRunner()
task = PipelineTask(pipeline)
await runner.run(task)
Set the KUGELAUDIO_API_KEY environment variable or pass api_key directly to the constructor.

Configuration

Service Parameters

ParameterTypeDefaultDescription
api_keystrKUGELAUDIO_API_KEY envYour KugelAudio API key
modelstrkugel-3TTS model (kugel-3)
voice_idintrequiredVoice ID to use for synthesis
sample_rateint24000Output sample rate in Hz
cfg_scalefloat2.0CFG scale for generation quality
max_new_tokensint2048Maximum tokens to generate
languagestr | NoneNoneISO 639-1 language code (e.g., en, de). Skips server-side auto-detection — see Latency
normalizeboolTrueApply text normalization
base_urlstrhttps://api.kugelaudio.comAPI base URL

Supported Sample Rates

RateNotes
24000Native rate (recommended)
22050CD quality
16000Wideband telephony
8000Narrowband telephony
Use the native 24000 Hz sample rate for best quality and lowest latency. Lower rates use server-side resampling with negligible impact — see Latency.

Models

Use kugel-3 — the current production model for all use cases (voice agents, narration, brand voices). See Models for capabilities and Latency for TTFA figures.

Performance Optimization

Pre-warming the Connection

Call prewarm() during pipeline setup to establish the WebSocket connection before the first synthesis request. This keeps the TCP+TLS+WebSocket handshake out of the first call — see Latency.
tts = KugelAudioTTSService(
    model="kugel-3",
    voice_id=1071,
    language="en",
)
tts.prewarm()  # Connects in background, first run_tts() is fast

Turn context pre-provisioning (Pipecat 1.x)

Pipecat 1.x mints a fresh TTS context_id on every assistant turn. The service automatically calls the server’s create_context on LLMFullResponseStartFrame (when the LLM starts responding), before the first TTS text chunk arrives. That hides the WebSocket round-trip behind LLM time-to-first-token instead of adding it to measured TTFA. No configuration required — call prewarm() as usual and ensure language is set.

Setting the Language

When you know the language of your input text, always set the language parameter. Without it, the server auto-detects the language on each request, adding latency — see Latency.
# Fast: explicit language skips auto-detection
tts = KugelAudioTTSService(language="de")

# Slower: server auto-detects language on every request
tts = KugelAudioTTSService()
For lowest latency, always set language and call prewarm() — see Latency for what each saves.

Connection Reuse

The service automatically reuses a persistent WebSocket connection across run_tts() calls. This avoids the TCP+TLS+WebSocket handshake overhead on every request. If the connection drops, a new one is established transparently on the next call. Each Pipecat 1.x turn still opens a new server-side context (required for correct turn isolation and to avoid context-cap leaks). Only the WebSocket connection is reused — not the engine KV session across turns.

TTFA logging

When KugelAudio TTFA: appears in logs, it measures text send → first audio chunk on the WebSocket (after any turn-context pre-provision). It does not include LLM or STT latency. End-to-end numbers depend heavily on network path — co-located clients see much lower numbers than remote dev machines. See Latency for reference figures and how to measure correctly.

Usage Patterns

Updating Voice and Model at Runtime

You can change the voice or model dynamically during a pipeline session:
tts = KugelAudioTTSService(
    model="kugel-3",
    voice_id=1071,
)

# Switch voice mid-conversation (closes cached WebSocket first)
await tts.set_voice("300")

# Switch to higher quality model
await tts.set_model("kugel-3")

Pipeline Frame Flow

The KugelAudioTTSService emits standard PipeCat frames:
  1. TTSStartedFrame - Audio generation has begun
  2. TTSAudioRawFrame - Raw PCM audio chunks (16-bit, mono)
  3. TTSStoppedFrame - Audio generation is complete
  4. ErrorFrame - If an error occurs during synthesis
from pipecat.frames.frames import (
    TTSStartedFrame,
    TTSAudioRawFrame,
    TTSStoppedFrame,
)

# The TTS service yields frames in this order:
# TTSStartedFrame -> TTSAudioRawFrame* -> TTSStoppedFrame

Metrics Support

KugelAudio’s PipeCat service automatically tracks performance metrics:
tts = KugelAudioTTSService(
    model="kugel-3",
    voice_id=1071,
)

# Metrics are tracked automatically:
# - TTFB (Time to First Byte): measured from request to first audio chunk
# - TTS Usage: character count per request
print(tts.can_generate_metrics())  # True

Complete Voice Bot Example

Here’s a complete voice bot using PipeCat with Daily as the transport:
import asyncio
import os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.services.daily import DailyTransport, DailyParams
from pipecat.services.openai import OpenAILLMService
from pipecat.services.deepgram import DeepgramSTTService
from kugelaudio.pipecat import KugelAudioTTSService

async def main():
    # Transport (Daily WebRTC)
    transport = DailyTransport(
        room_url=os.environ["DAILY_ROOM_URL"],
        token=os.environ["DAILY_TOKEN"],
        bot_name="KugelAudio Bot",
        params=DailyParams(audio_out_sample_rate=24000),
    )

    # STT
    stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])

    # LLM
    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        model="gpt-4o-mini",
    )

    # TTS - KugelAudio
    tts = KugelAudioTTSService(
        model="kugel-3",
        voice_id=1071,
        sample_rate=24000,
        language="en",  # Skip auto-detection for lower latency
    )
    tts.prewarm()  # Pre-establish WebSocket connection

    # Build pipeline
    pipeline = Pipeline([
        transport.input(),
        stt,
        llm,
        tts,
        transport.output(),
    ])

    runner = PipelineRunner()
    task = PipelineTask(pipeline)
    await runner.run(task)

if __name__ == "__main__":
    asyncio.run(main())

Running the Bot

# Set environment variables
export KUGELAUDIO_API_KEY="your-api-key"
export DAILY_ROOM_URL="https://your-domain.daily.co/room"
export DAILY_TOKEN="your-daily-token"
export DEEPGRAM_API_KEY="your-deepgram-key"
export OPENAI_API_KEY="your-openai-key"

# Run the bot
python voice_bot.py

Environment Variables

VariableRequiredDescription
KUGELAUDIO_API_KEYYesYour KugelAudio API key
KUGELAUDIO_BASE_URLNoOverride API base URL (e.g. http://127.0.0.1:8002 for local ingress dev)
DAILY_ROOM_URLYes*Daily room URL (if using Daily transport)
DAILY_TOKENYes*Daily room token
DEEPGRAM_API_KEYYes*Required if using Deepgram STT
OPENAI_API_KEYYes*Required if using OpenAI LLM

Troubleshooting

Make sure KUGELAUDIO_API_KEY is set in your environment or pass api_key directly:
tts = KugelAudioTTSService(api_key="your-api-key")
KugelAudio supports these sample rates: 24000, 22050, 16000, 8000. Make sure your transport output sample rate matches:
# Both must match
tts = KugelAudioTTSService(sample_rate=24000)
transport = DailyTransport(
    params=DailyParams(audio_out_sample_rate=24000),
)
Verify your base_url is correct and the KugelAudio API is reachable. The service connects via WebSocket (wss://) for audio streaming. If a persistent connection drops, the service automatically reconnects on the next run_tts() call.
Check in order:
  1. language unset — every request pays language auto-detection.
  2. prewarm() not called — the first request pays the WebSocket handshake.
  3. Network path — measuring from a laptop against a remote engine adds your full RTT on top of inference. Exec from the ingress pod or use the production API for apples-to-apples TTFA. See Latency for reference figures.
  4. Pipecat 1.x per-turn contexts — each turn opens a fresh server context (by design). Turn-context pre-provisioning hides the WS setup cost behind LLM latency; it does not remove engine cold-open per turn.
See Performance Optimization and Measuring TTFA correctly.
The PipeCat integration requires Python 3.10 or higher. Check your version:
python --version

Next Steps

LiveKit Integration

Use KugelAudio with LiveKit Agents

Streaming

Advanced streaming techniques