Skip to main content
Generate complete audio from text. This is the simplest way to get started - provide text and receive audio back.

Basic Generation

from kugelaudio import KugelAudio

client = KugelAudio(api_key="your_api_key")

audio = client.tts.generate(
    text="Hello, this is a test of the KugelAudio text-to-speech system.",
    model_id="kugel-1-turbo",
)

# Save to file
audio.save("output.wav")

# Or get WAV bytes
wav_bytes = audio.to_wav_bytes()

Generation Parameters

ParameterTypeDefaultDescription
textstringrequiredThe text to synthesize
model_id / modelIdstringkugel-1-turbokugel-1-turbo (fast) or kugel-1 (quality)
voice_id / voiceIdint-Specific voice to use
cfg_scale / cfgScalefloat2.0Guidance scale (1.0-5.0)
max_new_tokens / maxNewTokensint2048Maximum tokens to generate
sample_rate / sampleRateint24000Output sample rate (8000, 16000, 22050, 24000)
normalizebooltrueEnable text normalization
languagestring-Language for normalization (ISO 639-1 code)

CFG Scale Guide

The cfg_scale parameter controls how closely the model follows the voice characteristics:
RangeStyleBest For
1.0-1.5Relaxed, naturalConversational AI, long-form narration
2.0Balanced (default)General purpose
2.5-3.0ExpressiveStorytelling, emphasis-heavy content
3.5-5.0Maximum expressionCharacter voices, dramatic readings

Full Example with All Options

audio = client.tts.generate(
    text="Hello, this is a test of the KugelAudio text-to-speech system.",
    model_id="kugel-1-turbo",
    voice_id=123,
    cfg_scale=2.0,
    max_new_tokens=2048,
    sample_rate=24000,
    normalize=True,
    language="en",
)

# Inspect the response
print(f"Duration: {audio.duration_seconds:.2f}s")
print(f"Samples: {audio.samples}")
print(f"Sample rate: {audio.sample_rate} Hz")
print(f"Generation time: {audio.generation_ms:.0f}ms")
print(f"RTF: {audio.rtf:.2f}")

# Save to WAV file
audio.save("output.wav")

# Get raw PCM bytes
pcm_data = audio.audio

# Get WAV bytes (with header)
wav_bytes = audio.to_wav_bytes()

Async Generation

import asyncio

async def main():
    audio = await client.tts.generate_async(
        text="Async generation example.",
        model_id="kugel-1-turbo",
    )
    audio.save("async_output.wav")

asyncio.run(main())

Playing Audio in the Browser

The JavaScript SDK provides utility functions for audio playback:
import { KugelAudio, createWavBlob } from 'kugelaudio';

const client = new KugelAudio({ apiKey: 'your_api_key' });

const audio = await client.tts.generate({
  text: 'Hello, world!',
  modelId: 'kugel-1-turbo',
});

// Create WAV blob for playback
const wavBlob = createWavBlob(audio.audio, audio.sampleRate);
const url = URL.createObjectURL(wavBlob);

// Play with Audio element
const audioElement = new Audio(url);
audioElement.play();

// Or with Web Audio API
const audioContext = new AudioContext();
const arrayBuffer = await wavBlob.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();

Pre-connecting for Low Latency

For latency-sensitive applications, pre-establish the WebSocket connection at startup to eliminate cold start latency (~500ms) from your first request.
import asyncio
from kugelaudio import KugelAudio

async def main():
    # Create a pre-connected client (~500ms happens here)
    client = await KugelAudio.create(api_key="your_api_key")
    
    # First request is now fast (~100-150ms TTFA instead of ~600ms)
    audio = await client.tts.generate_async(
        text="Hello, world!",
        model_id="kugel-1-turbo",
    )
    audio.save("output.wav")
    
    await client.aclose()

asyncio.run(main())
Without pre-connecting, the first TTS request includes WebSocket connection setup (~500ms). Subsequent requests reuse the connection and are fast (~100-150ms TTFA). Pre-connecting moves this overhead to application startup.

Next Steps