Generate complete audio from text. This is the simplest way to get started - provide text and receive audio back.
Basic Generation
from kugelaudio import KugelAudio
client = KugelAudio(api_key="your_api_key")
audio = client.tts.generate(
text="Hello, this is a test of the KugelAudio text-to-speech system.",
model_id="kugel-1-turbo",
)
# Save to file
audio.save("output.wav")
# Or get WAV bytes
wav_bytes = audio.to_wav_bytes()
import { KugelAudio } from 'kugelaudio';
const client = new KugelAudio({ apiKey: 'your_api_key' });
const audio = await client.tts.generate({
text: 'Hello, this is a test of the KugelAudio text-to-speech system.',
modelId: 'kugel-1-turbo',
});
// audio.audio is an ArrayBuffer with PCM16 data
console.log(`Duration: ${audio.durationMs}ms`);
Generation Parameters
| Parameter | Type | Default | Description |
|---|
text | string | required | The text to synthesize |
model_id / modelId | string | kugel-1-turbo | kugel-1-turbo (fast) or kugel-1 (quality) |
voice_id / voiceId | int | - | Specific voice to use |
cfg_scale / cfgScale | float | 2.0 | Guidance scale (1.0-5.0) |
max_new_tokens / maxNewTokens | int | 2048 | Maximum tokens to generate |
sample_rate / sampleRate | int | 24000 | Output sample rate (8000, 16000, 22050, 24000) |
normalize | bool | true | Enable text normalization |
language | string | - | Language for normalization (ISO 639-1 code) |
CFG Scale Guide
The cfg_scale parameter controls how closely the model follows the voice characteristics:
| Range | Style | Best For |
|---|
| 1.0-1.5 | Relaxed, natural | Conversational AI, long-form narration |
| 2.0 | Balanced (default) | General purpose |
| 2.5-3.0 | Expressive | Storytelling, emphasis-heavy content |
| 3.5-5.0 | Maximum expression | Character voices, dramatic readings |
Full Example with All Options
audio = client.tts.generate(
text="Hello, this is a test of the KugelAudio text-to-speech system.",
model_id="kugel-1-turbo",
voice_id=123,
cfg_scale=2.0,
max_new_tokens=2048,
sample_rate=24000,
normalize=True,
language="en",
)
# Inspect the response
print(f"Duration: {audio.duration_seconds:.2f}s")
print(f"Samples: {audio.samples}")
print(f"Sample rate: {audio.sample_rate} Hz")
print(f"Generation time: {audio.generation_ms:.0f}ms")
print(f"RTF: {audio.rtf:.2f}")
# Save to WAV file
audio.save("output.wav")
# Get raw PCM bytes
pcm_data = audio.audio
# Get WAV bytes (with header)
wav_bytes = audio.to_wav_bytes()
const audio = await client.tts.generate({
text: 'Hello, this is a test of the KugelAudio text-to-speech system.',
modelId: 'kugel-1-turbo',
voiceId: 123,
cfgScale: 2.0,
maxNewTokens: 2048,
sampleRate: 24000,
normalize: true,
language: 'en',
});
// Inspect the response
console.log(`Duration: ${audio.durationMs}ms`);
console.log(`Samples: ${audio.samples}`);
console.log(`Sample rate: ${audio.sampleRate} Hz`);
console.log(`Generation time: ${audio.generationMs}ms`);
console.log(`RTF: ${audio.rtf}`);
// audio.audio is an ArrayBuffer with PCM16 data
Async Generation
import asyncio
async def main():
audio = await client.tts.generate_async(
text="Async generation example.",
model_id="kugel-1-turbo",
)
audio.save("async_output.wav")
asyncio.run(main())
// JavaScript SDK is async by default
const audio = await client.tts.generate({
text: 'Async generation example.',
modelId: 'kugel-1-turbo',
});
Playing Audio in the Browser
The JavaScript SDK provides utility functions for audio playback:
import { KugelAudio, createWavBlob } from 'kugelaudio';
const client = new KugelAudio({ apiKey: 'your_api_key' });
const audio = await client.tts.generate({
text: 'Hello, world!',
modelId: 'kugel-1-turbo',
});
// Create WAV blob for playback
const wavBlob = createWavBlob(audio.audio, audio.sampleRate);
const url = URL.createObjectURL(wavBlob);
// Play with Audio element
const audioElement = new Audio(url);
audioElement.play();
// Or with Web Audio API
const audioContext = new AudioContext();
const arrayBuffer = await wavBlob.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();
Pre-connecting for Low Latency
For latency-sensitive applications, pre-establish the WebSocket connection at startup to eliminate cold start latency (~500ms) from your first request.
Python (Async)
Python (Sync)
JavaScript
import asyncio
from kugelaudio import KugelAudio
async def main():
# Create a pre-connected client (~500ms happens here)
client = await KugelAudio.create(api_key="your_api_key")
# First request is now fast (~100-150ms TTFA instead of ~600ms)
audio = await client.tts.generate_async(
text="Hello, world!",
model_id="kugel-1-turbo",
)
audio.save("output.wav")
await client.aclose()
asyncio.run(main())
from kugelaudio import KugelAudio
client = KugelAudio(api_key="your_api_key")
# Pre-connect at startup (~500ms happens here)
client.connect()
# First request is now fast
audio = client.tts.generate(
text="Hello, world!",
model_id="kugel-1-turbo",
)
import { KugelAudio } from 'kugelaudio';
// Create a pre-connected client (~500ms happens here)
const client = await KugelAudio.create({ apiKey: 'your_api_key' });
// First request is now fast (~100-150ms TTFA instead of ~500ms)
const audio = await client.tts.generate({
text: 'Hello, world!',
modelId: 'kugel-1-turbo',
});
Without pre-connecting, the first TTS request includes WebSocket connection setup (~500ms).
Subsequent requests reuse the connection and are fast (~100-150ms TTFA).
Pre-connecting moves this overhead to application startup.
Next Steps