Skip to main content
KugelAudio provides an official TTS service for PipeCat, enabling high-quality voice synthesis in your voice AI pipelines.

Why Use KugelAudio with PipeCat?

  • Native service: Drop-in TTSService for PipeCat pipelines
  • Persistent WebSocket: Connection reuse eliminates ~100-220ms handshake overhead per request
  • Built-in metrics: Automatic TTFB and usage metrics tracking
  • Ultra-low latency: ~39ms time-to-first-audio with kugel-1-turbo

Installation

pip install kugelaudio[pipecat]
This installs the KugelAudio SDK along with the required PipeCat dependency (pipecat-ai>=0.0.60).
The PipeCat integration requires Python 3.10 or higher.

Quick Start

Basic Pipeline

from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from kugelaudio.pipecat import KugelAudioTTSService

# Create the TTS service
tts = KugelAudioTTSService(
    api_key="your-api-key",
    model="kugel-1-turbo",
    voice_id=280,
    sample_rate=24000,
    language="en",  # Set language to skip auto-detection (~60-150ms savings)
)
tts.prewarm()  # Pre-establish WebSocket connection for faster first request

# Use in a PipeCat pipeline
pipeline = Pipeline([
    transport.input(),   # Audio/text input
    stt,                 # Speech-to-text
    llm,                 # Language model
    tts,                 # KugelAudio TTS
    transport.output(),  # Audio output
])

task = PipelineTask(pipeline)
await runner.run(task)
Set the KUGELAUDIO_API_KEY environment variable or pass api_key directly to the constructor.

Configuration

Service Parameters

ParameterTypeDefaultDescription
api_keystrKUGELAUDIO_API_KEY envYour KugelAudio API key
modelstrkugel-1-turboTTS model (kugel-1-turbo or kugel-1)
voice_idintrequiredVoice ID to use for synthesis
sample_rateint24000Output sample rate in Hz
cfg_scalefloat2.0CFG scale for generation quality
max_new_tokensint2048Maximum tokens to generate
languagestr | NoneNoneISO 639-1 language code (e.g., en, de). Skips server-side auto-detection, saving ~60-150ms per request
normalizeboolTrueApply text normalization
base_urlstrhttps://api.kugelaudio.comAPI base URL

Supported Sample Rates

RateNotes
24000Native rate (recommended)
22050CD quality
16000Wideband telephony
8000Narrowband telephony
Use the native 24000 Hz sample rate for best quality and lowest latency. Lower rates use server-side resampling with minimal impact (~0.1ms per chunk).

Models

ModelLatencyQualityUse Case
kugel-1-turbo~39ms TTFAHighReal-time conversations
kugel-1~77ms TTFAExceptionalPremium quality applications

Performance Optimization

Pre-warming the Connection

Call prewarm() during pipeline setup to establish the WebSocket connection before the first synthesis request. This eliminates ~100-220ms of TCP+TLS+WebSocket handshake latency from the first call.
tts = KugelAudioTTSService(
    model="kugel-1-turbo",
    voice_id=280,
    language="en",
)
tts.prewarm()  # Connects in background, first run_tts() is fast

Setting the Language

When you know the language of your input text, always set the language parameter. Without it, the server auto-detects the language on each request, adding ~60-150ms to time-to-first-audio.
# Fast: explicit language skips auto-detection
tts = KugelAudioTTSService(language="de")

# Slower: server auto-detects language on every request
tts = KugelAudioTTSService()
For lowest latency, always set language and call prewarm(). Together these can save ~160-370ms on the first request and ~60-150ms on subsequent requests.

Connection Reuse

The service automatically reuses a persistent WebSocket connection across run_tts() calls. This avoids the ~100-220ms TCP+TLS+WebSocket handshake overhead on every request. If the connection drops, a new one is established transparently on the next call.

Usage Patterns

Updating Voice and Model at Runtime

You can change the voice or model dynamically during a pipeline session:
tts = KugelAudioTTSService(
    model="kugel-1-turbo",
    voice_id=280,
)

# Switch voice mid-conversation
tts.set_voice("300")

# Switch to higher quality model
await tts.set_model("kugel-1")

Pipeline Frame Flow

The KugelAudioTTSService emits standard PipeCat frames:
  1. TTSStartedFrame - Audio generation has begun
  2. TTSAudioRawFrame - Raw PCM audio chunks (16-bit, mono)
  3. TTSStoppedFrame - Audio generation is complete
  4. ErrorFrame - If an error occurs during synthesis
from pipecat.frames.frames import (
    TTSStartedFrame,
    TTSAudioRawFrame,
    TTSStoppedFrame,
)

# The TTS service yields frames in this order:
# TTSStartedFrame -> TTSAudioRawFrame* -> TTSStoppedFrame

Metrics Support

KugelAudio’s PipeCat service automatically tracks performance metrics:
tts = KugelAudioTTSService(
    model="kugel-1-turbo",
    voice_id=280,
)

# Metrics are tracked automatically:
# - TTFB (Time to First Byte): measured from request to first audio chunk
# - TTS Usage: character count per request
print(tts.can_generate_metrics())  # True

Complete Voice Bot Example

Here’s a complete voice bot using PipeCat with Daily as the transport:
import asyncio
import os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.services.daily import DailyTransport, DailyParams
from pipecat.services.openai import OpenAILLMService
from pipecat.services.deepgram import DeepgramSTTService
from kugelaudio.pipecat import KugelAudioTTSService

async def main():
    # Transport (Daily WebRTC)
    transport = DailyTransport(
        room_url=os.environ["DAILY_ROOM_URL"],
        token=os.environ["DAILY_TOKEN"],
        bot_name="KugelAudio Bot",
        params=DailyParams(audio_out_sample_rate=24000),
    )

    # STT
    stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])

    # LLM
    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        model="gpt-4o-mini",
    )

    # TTS - KugelAudio
    tts = KugelAudioTTSService(
        model="kugel-1-turbo",
        voice_id=280,
        sample_rate=24000,
        language="en",  # Skip auto-detection for lower latency
    )
    tts.prewarm()  # Pre-establish WebSocket connection

    # Build pipeline
    pipeline = Pipeline([
        transport.input(),
        stt,
        llm,
        tts,
        transport.output(),
    ])

    runner = PipelineRunner()
    task = PipelineTask(pipeline)
    await runner.run(task)

if __name__ == "__main__":
    asyncio.run(main())

Running the Bot

# Set environment variables
export KUGELAUDIO_API_KEY="your-api-key"
export DAILY_ROOM_URL="https://your-domain.daily.co/room"
export DAILY_TOKEN="your-daily-token"
export DEEPGRAM_API_KEY="your-deepgram-key"
export OPENAI_API_KEY="your-openai-key"

# Run the bot
python voice_bot.py

Environment Variables

VariableRequiredDescription
KUGELAUDIO_API_KEYYesYour KugelAudio API key
DAILY_ROOM_URLYes*Daily room URL (if using Daily transport)
DAILY_TOKENYes*Daily room token
DEEPGRAM_API_KEYYes*Required if using Deepgram STT
OPENAI_API_KEYYes*Required if using OpenAI LLM

Troubleshooting

Make sure KUGELAUDIO_API_KEY is set in your environment or pass api_key directly:
tts = KugelAudioTTSService(api_key="your-api-key")
KugelAudio supports these sample rates: 24000, 22050, 16000, 8000. Make sure your transport output sample rate matches:
# Both must match
tts = KugelAudioTTSService(sample_rate=24000)
transport = DailyTransport(
    params=DailyParams(audio_out_sample_rate=24000),
)
Verify your base_url is correct and the KugelAudio API is reachable. The service connects via WebSocket (wss://) for audio streaming. If a persistent connection drops, the service automatically reconnects on the next run_tts() call.
This is usually caused by missing the language parameter (triggers auto-detection) or not calling prewarm(). See the Performance Optimization section for details.
The PipeCat integration requires Python 3.10 or higher. Check your version:
python --version

Next Steps

LiveKit Integration

Use KugelAudio with LiveKit Agents

Streaming

Advanced streaming techniques