Generate Speech

Generate complete audio from text. This is the simplest way to get started - provide text and receive audio back.

Basic Generation

Python
JavaScript

from kugelaudio import KugelAudio

client = KugelAudio(api_key="your_api_key")

audio = client.tts.generate(
    text="Hello, this is a test of the KugelAudio text-to-speech system.",
    model_id="kugel-1-turbo",
)

# Save to file
audio.save("output.wav")

# Or get WAV bytes
wav_bytes = audio.to_wav_bytes()

import { KugelAudio } from 'kugelaudio';

const client = new KugelAudio({ apiKey: 'your_api_key' });

const audio = await client.tts.generate({
  text: 'Hello, this is a test of the KugelAudio text-to-speech system.',
  modelId: 'kugel-1-turbo',
});

// audio.audio is an ArrayBuffer with PCM16 data
console.log(`Duration: ${audio.durationMs}ms`);

Generation Parameters

Parameter	Type	Default	Description
`text`	string	required	The text to synthesize
`model_id` / `modelId`	string	`kugel-1-turbo`	`kugel-1-turbo` (fast) or `kugel-1` (quality)
`voice_id` / `voiceId`	int	-	Specific voice to use
`cfg_scale` / `cfgScale`	float	`2.0`	Guidance scale (1.0-5.0)
`max_new_tokens` / `maxNewTokens`	int	`2048`	Maximum tokens to generate
`sample_rate` / `sampleRate`	int	`24000`	Output sample rate (8000, 16000, 22050, 24000)
`normalize`	bool	`true`	Enable text normalization
`language`	string	-	Language for normalization (ISO 639-1 code)

CFG Scale Guide

The cfg_scale parameter controls how closely the model follows the voice characteristics:

Range	Style	Best For
1.0-1.5	Relaxed, natural	Conversational AI, long-form narration
2.0	Balanced (default)	General purpose
2.5-3.0	Expressive	Storytelling, emphasis-heavy content
3.5-5.0	Maximum expression	Character voices, dramatic readings

Full Example with All Options

Python
JavaScript

audio = client.tts.generate(
    text="Hello, this is a test of the KugelAudio text-to-speech system.",
    model_id="kugel-1-turbo",
    voice_id=123,
    cfg_scale=2.0,
    max_new_tokens=2048,
    sample_rate=24000,
    normalize=True,
    language="en",
)

# Inspect the response
print(f"Duration: {audio.duration_seconds:.2f}s")
print(f"Samples: {audio.samples}")
print(f"Sample rate: {audio.sample_rate} Hz")
print(f"Generation time: {audio.generation_ms:.0f}ms")
print(f"RTF: {audio.rtf:.2f}")

# Save to WAV file
audio.save("output.wav")

# Get raw PCM bytes
pcm_data = audio.audio

# Get WAV bytes (with header)
wav_bytes = audio.to_wav_bytes()

const audio = await client.tts.generate({
  text: 'Hello, this is a test of the KugelAudio text-to-speech system.',
  modelId: 'kugel-1-turbo',
  voiceId: 123,
  cfgScale: 2.0,
  maxNewTokens: 2048,
  sampleRate: 24000,
  normalize: true,
  language: 'en',
});

// Inspect the response
console.log(`Duration: ${audio.durationMs}ms`);
console.log(`Samples: ${audio.samples}`);
console.log(`Sample rate: ${audio.sampleRate} Hz`);
console.log(`Generation time: ${audio.generationMs}ms`);
console.log(`RTF: ${audio.rtf}`);

// audio.audio is an ArrayBuffer with PCM16 data

Async Generation

Python
JavaScript

import asyncio

async def main():
    audio = await client.tts.generate_async(
        text="Async generation example.",
        model_id="kugel-1-turbo",
    )
    audio.save("async_output.wav")

asyncio.run(main())

// JavaScript SDK is async by default
const audio = await client.tts.generate({
  text: 'Async generation example.',
  modelId: 'kugel-1-turbo',
});

Playing Audio in the Browser

The JavaScript SDK provides utility functions for audio playback:

import { KugelAudio, createWavBlob } from 'kugelaudio';

const client = new KugelAudio({ apiKey: 'your_api_key' });

const audio = await client.tts.generate({
  text: 'Hello, world!',
  modelId: 'kugel-1-turbo',
});

// Create WAV blob for playback
const wavBlob = createWavBlob(audio.audio, audio.sampleRate);
const url = URL.createObjectURL(wavBlob);

// Play with Audio element
const audioElement = new Audio(url);
audioElement.play();

// Or with Web Audio API
const audioContext = new AudioContext();
const arrayBuffer = await wavBlob.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();

Pre-connecting for Low Latency

For latency-sensitive applications, pre-establish the WebSocket connection at startup to eliminate cold start latency (~500ms) from your first request.

Python (Async)
Python (Sync)
JavaScript

import asyncio
from kugelaudio import KugelAudio

async def main():
    # Create a pre-connected client (~500ms happens here)
    client = await KugelAudio.create(api_key="your_api_key")
    
    # First request is now fast (~100-150ms TTFA instead of ~600ms)
    audio = await client.tts.generate_async(
        text="Hello, world!",
        model_id="kugel-1-turbo",
    )
    audio.save("output.wav")
    
    await client.aclose()

asyncio.run(main())

from kugelaudio import KugelAudio

client = KugelAudio(api_key="your_api_key")

# Pre-connect at startup (~500ms happens here)
client.connect()

# First request is now fast
audio = client.tts.generate(
    text="Hello, world!",
    model_id="kugel-1-turbo",
)

import { KugelAudio } from 'kugelaudio';

// Create a pre-connected client (~500ms happens here)
const client = await KugelAudio.create({ apiKey: 'your_api_key' });

// First request is now fast (~100-150ms TTFA instead of ~500ms)
const audio = await client.tts.generate({
  text: 'Hello, world!',
  modelId: 'kugel-1-turbo',
});

Without pre-connecting, the first TTS request includes WebSocket connection setup (~500ms). Subsequent requests reuse the connection and are fast (~100-150ms TTFA). Pre-connecting moves this overhead to application startup.

Next Steps

Streaming

Lower latency with real-time audio streaming

Text Processing

Text normalization and spell tags

Voices

Browse and use different voices

Models

Learn about available models

Getting Started

Speech Generation

Voices

Integrations

SDK Reference

Basic Generation

Generation Parameters

CFG Scale Guide

Full Example with All Options

Async Generation

Playing Audio in the Browser

Pre-connecting for Low Latency

Next Steps

Streaming

Text Processing

Voices

Models

Getting Started

Speech Generation

Voices

Integrations

SDK Reference

​Basic Generation

​Generation Parameters

​CFG Scale Guide

​Full Example with All Options

​Async Generation

​Playing Audio in the Browser

​Pre-connecting for Low Latency

​Next Steps

Streaming

Text Processing

Voices

Models

Basic Generation

Generation Parameters

CFG Scale Guide

Full Example with All Options

Async Generation

Playing Audio in the Browser

Pre-connecting for Low Latency

Next Steps