Skip to main content

Prerequisites

Before you begin, make sure you have:
  • An API key from kugelaudio.com
  • Python 3.9+, Node.js 18+, Java 17+, or cURL

Installation

Install the Python SDK using pip or uv:
pip install kugelaudio
Or with uv (recommended):
uv add kugelaudio

Basic Usage

Initialize the Client

Pre-connect at startup. Without client.connect(), the first TTS request pays the WebSocket handshake; subsequent requests reuse the connection. Pre-connecting moves the handshake cost to application startup, where it doesn’t affect user-perceived latency. See Latency for the numbers.
from kugelaudio import KugelAudio

# Initialize with your API key
client = KugelAudio(api_key="your_api_key")

# Pre-connect at startup (handshake happens here)
client.connect()

# Confirm connection is ready
print(f"Connected: {client.is_connected()}")

# First request is now fast — no handshake on the hot path

Generate Speech

Examples below use kugel-3, the canonical production model. Legacy IDs such as kugel-2.5 and kugel-2-turbo are still accepted for backwards compatibility; see Models for details.
# Generate speech
audio = client.tts.generate(
    text="Welcome to KugelAudio! This is high-quality text-to-speech.",
    model_id="kugel-3",
)

# Save to file
audio.save("output.wav")

# Or get the raw bytes
wav_bytes = audio.to_wav_bytes()

Stream Audio

For lower latency, stream audio chunks as they’re generated:
# Synchronous streaming
for chunk in client.tts.stream(
    text="Hello, this is streaming audio.",
    model_id="kugel-3",
):
    if hasattr(chunk, 'audio'):
        # Process audio chunk immediately
        print(f"Chunk {chunk.index}: {len(chunk.audio)} bytes")
        # play_audio(chunk.audio)
For async applications:
import asyncio

async def stream_audio():
    async for chunk in client.tts.stream_async(
        text="Async streaming example.",
        model_id="kugel-3",
    ):
        if hasattr(chunk, 'audio'):
            # Process chunk
            pass

asyncio.run(stream_audio())

Working with Voices

Pick your voice deliberately. Different voices have wildly different baseline energy, age, and warmth — a peppy DTC bot and a calm clinical agent should not share the same voice even with the same prompt. Listen to several before locking one in. Building an LLM-driven voice agent? See Voice Agent Prompting for the prompt patterns that matter most.

List Available Voices

# List all voices
result = client.voices.list()

for voice in result.voices:
    print(f"{voice.id}: {voice.name}")
    print(f"  Languages: {', '.join(voice.supported_languages)}")
print(f"Total: {result.total}")

# Filter by language
result = client.voices.list(language="de")

Use a Specific Voice

audio = client.tts.generate(
    text="Hello with a specific voice!",
    model_id="kugel-3",
    voice_id=1071,  # Use a specific voice ID
)

Next Steps

Generate Speech

All generation options and parameters

Streaming

Real-time audio streaming techniques

Using Voices

Browse and filter available voices

Text Processing

Normalization and spell tags