Skip to main content
Generate complete audio from text. This is the simplest way to get started - provide text and receive audio back.

Basic Generation

from kugelaudio import KugelAudio

client = KugelAudio(api_key="your_api_key")

audio = client.tts.generate(
    text="Hello, this is a test of the KugelAudio text-to-speech system.",
    model_id="kugel-3",
)

# Save to file
audio.save("output.wav")

# Or get WAV bytes
wav_bytes = audio.to_wav_bytes()

Generation Parameters

The parameters you’ll touch most often (Python/REST snake_case; JavaScript uses camelCase):
  • text (required) and model_id — use kugel-3
  • voice_id — the voice to speak with (Using voices)
  • cfg_scale — expressiveness (see the guide below)
  • normalize + languagetext normalization; always set the language when you know it
  • word_timestampsword-level timestamps
  • speed — playback speed (see Speed Control below)
The complete table — every field with type, default, range, and error behavior — lives in the Generate Speech API reference.

CFG Scale Guide

The cfg_scale parameter controls how closely the model follows the voice characteristics. Accepted range: 1.22.5 (inclusive). Values outside this range are clamped into it.
RangeStyleBest For
1.2-1.5Relaxed, naturalConversational AI, long-form narration
2.0Balanced (default)General purpose
2.5ExpressiveStorytelling, emphasis-heavy content

Speed Control

The speed parameter adjusts playback rate using pitch-preserving time-stretching (WSOLA), so the voice pitch stays natural even at different speeds. Range: 0.8 (20% slower) to 1.2 (20% faster).
Dashboard: The playground in the KugelAudio dashboard includes a Slow / Normal / Fast speed toggle next to the model selector. Changes are reflected live in the SDK code snippet shown below the generator.
# Global speed — whole request at 80% speed
audio = client.tts.generate(
    text="Bitte rufen Sie uns an unter: 0 30 12 34 56 78.",
    language="de",
    speed=0.8,
)
speed valueRateTypical use
0.820% slowerPhone numbers, addresses, medical terms
1.0Normal (default)General purpose
1.220% fasterNotifications, fast-paced content
Speed applies to the whole request; to change the rate for just part of it, wrap that text in <prosody rate="...">:
Unsere Rückrufnummer lautet <prosody rate="slow">0800 5834552.</prosody> Danke!
For pauses, codes, and pronunciation fixes, see the Prompting guide: <break> tags, <spell> tags, and the unsupported-tags table.

Full Example with All Options

audio = client.tts.generate(
    text="Hello, this is a test of the KugelAudio text-to-speech system.",
    model_id="kugel-3",
    voice_id=1071,
    cfg_scale=2.0,
    max_new_tokens=2048,
    sample_rate=24000,
    normalize=True,
    language="en",
    word_timestamps=False,
    speed=1.0,
)

# Inspect the response
print(f"Duration: {audio.duration_seconds:.2f}s")
print(f"Samples: {audio.samples}")
print(f"Sample rate: {audio.sample_rate} Hz")
print(f"Generation time: {audio.generation_ms:.0f}ms")
print(f"RTF: {audio.rtf:.2f}")

# Save to WAV file
audio.save("output.wav")

# Get raw PCM bytes
pcm_data = audio.audio

# Get WAV bytes (with header)
wav_bytes = audio.to_wav_bytes()

Async Generation

import asyncio

async def main():
    audio = await client.tts.generate_async(
        text="Async generation example.",
        model_id="kugel-3",
    )
    audio.save("async_output.wav")

asyncio.run(main())

Playing Audio in the Browser

The JavaScript SDK provides utility functions for audio playback:
import { KugelAudio, createWavBlob } from 'kugelaudio';

const client = new KugelAudio({ apiKey: 'your_api_key' });

const audio = await client.tts.generate({
  text: 'Hello, world!',
  modelId: 'kugel-3',
});

// Create WAV blob for playback
const wavBlob = createWavBlob(audio.audio, audio.sampleRate);
const url = URL.createObjectURL(wavBlob);

// Play with Audio element
const audioElement = new Audio(url);
audioElement.play();

// Or with Web Audio API
const audioContext = new AudioContext();
const arrayBuffer = await wavBlob.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();

Pre-connecting for Low Latency

For latency-sensitive applications, pre-establish the WebSocket connection at startup to keep the handshake out of your first request — see Latency.
import asyncio
from kugelaudio import KugelAudio

async def main():
    # Create a pre-connected client (handshake happens here)
    client = await KugelAudio.create(api_key="your_api_key")
    
    # First request is now fast — no handshake on the hot path
    audio = await client.tts.generate_async(
        text="Hello, world!",
        model_id="kugel-3",
    )
    audio.save("output.wav")
    
    await client.aclose()

asyncio.run(main())
Without pre-connecting, the first TTS request includes WebSocket connection setup. Subsequent requests reuse the connection. See Latency for typical numbers. Pre-connecting moves this overhead to application startup.

Word Timestamps

Request per-word time alignments alongside the generated audio. Useful for subtitles, karaoke, lip-sync, and barge-in handling.
audio = client.tts.generate(
    text="Hello, how are you today?",
    model_id="kugel-3",
    word_timestamps=True,
)

for ts in audio.word_timestamps:
    print(f"{ts.word}: {ts.start_ms}ms - {ts.end_ms}ms (score: {ts.score:.2f})")

# Output:
# Hello: 0ms - 320ms (score: 0.98)
# how: 350ms - 480ms (score: 0.95)
# are: 500ms - 580ms (score: 0.97)
# you: 600ms - 720ms (score: 0.96)
# today: 750ms - 1100ms (score: 0.94)
Word timestamps add no extra audio latency. For streaming use cases, see the Streaming Guide.

Next Steps

Streaming

Lower latency with real-time audio streaming

Text Processing

Text normalization and spell tags

Voices

Browse and use different voices

Models

Learn about available models