from kugelaudio import KugelAudioclient = KugelAudio(api_key="your_api_key")audio = client.tts.generate( text="Hello, this is a test of the KugelAudio text-to-speech system.", model_id="kugel-3",)# Save to fileaudio.save("output.wav")# Or get WAV byteswav_bytes = audio.to_wav_bytes()
import { KugelAudio } from 'kugelaudio';const client = new KugelAudio({ apiKey: 'your_api_key' });const audio = await client.tts.generate({ text: 'Hello, this is a test of the KugelAudio text-to-speech system.', modelId: 'kugel-3',});// audio.audio is an ArrayBuffer with PCM16 dataconsole.log(`Duration: ${audio.durationMs}ms`);
import com.kugelaudio.sdk.KugelAudio;import com.kugelaudio.sdk.KugelAudioOptions;import com.kugelaudio.sdk.GenerateRequest;import com.kugelaudio.sdk.AudioResponse;KugelAudio client = new KugelAudio( KugelAudioOptions.builder("your_api_key").build());AudioResponse audio = client.tts().generate( GenerateRequest.builder("Hello, this is a test of the KugelAudio text-to-speech system.") .modelId("kugel-3") .language("en") .build());// Save to WAV fileaudio.saveWav(java.nio.file.Path.of("output.wav"));// Or get raw PCM bytesbyte[] pcmData = audio.getAudio();
curl -X POST https://api.kugelaudio.com/v1/tts/generate \ -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Hello, this is a test of the KugelAudio text-to-speech system.", "model_id": "kugel-3" }' \ --output output.pcm# The response is raw PCM16 audio (signed 16-bit LE, mono, 24kHz)# Convert to WAV for playback:ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.wav
The cfg_scale parameter controls how closely the model follows the voice characteristics. Accepted range: 1.2–2.5 (inclusive). Values outside this range are clamped into it.
The speed parameter adjusts playback rate using pitch-preserving time-stretching (WSOLA), so the voice pitch stays natural even at different speeds. Range: 0.8 (20% slower) to 1.2 (20% faster).
Dashboard: The playground in the KugelAudio dashboard includes a Slow / Normal / Fast speed toggle next to the model selector. Changes are reflected live in the SDK code snippet shown below the generator.
Python
JavaScript
Java
cURL
# Global speed — whole request at 80% speedaudio = client.tts.generate( text="Bitte rufen Sie uns an unter: 0 30 12 34 56 78.", language="de", speed=0.8,)
// Global speedconst audio = await client.tts.generate({ text: 'Bitte rufen Sie uns an unter: 0 30 12 34 56 78.', language: 'de', speed: 0.8,});
// Global speedAudioResponse audio = client.tts().generate( GenerateRequest.builder("Bitte rufen Sie uns an unter: 0 30 12 34 56 78.") .language("de") .speed(0.8) .build());
curl -X POST https://api.kugelaudio.com/v1/tts/generate \ -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Bitte rufen Sie uns an unter: 0 30 12 34 56 78.", "language": "de", "speed": 0.8 }' \ --output output.pcm
speed value
Rate
Typical use
0.8
20% slower
Phone numbers, addresses, medical terms
1.0
Normal (default)
General purpose
1.2
20% faster
Notifications, fast-paced content
Speed applies to the whole request; to change the rate for just part of it,
wrap that text in
<prosody rate="...">:
audio = client.tts.generate( text="Hello, this is a test of the KugelAudio text-to-speech system.", model_id="kugel-3", voice_id=1071, cfg_scale=2.0, max_new_tokens=2048, sample_rate=24000, normalize=True, language="en", word_timestamps=False, speed=1.0,)# Inspect the responseprint(f"Duration: {audio.duration_seconds:.2f}s")print(f"Samples: {audio.samples}")print(f"Sample rate: {audio.sample_rate} Hz")print(f"Generation time: {audio.generation_ms:.0f}ms")print(f"RTF: {audio.rtf:.2f}")# Save to WAV fileaudio.save("output.wav")# Get raw PCM bytespcm_data = audio.audio# Get WAV bytes (with header)wav_bytes = audio.to_wav_bytes()
const audio = await client.tts.generate({ text: 'Hello, this is a test of the KugelAudio text-to-speech system.', modelId: 'kugel-3', voiceId: 1071, cfgScale: 2.0, maxNewTokens: 2048, sampleRate: 24000, normalize: true, language: 'en', wordTimestamps: false, speed: 1.0,});// Inspect the responseconsole.log(`Duration: ${audio.durationMs}ms`);console.log(`Samples: ${audio.samples}`);console.log(`Sample rate: ${audio.sampleRate} Hz`);console.log(`Generation time: ${audio.generationMs}ms`);console.log(`RTF: ${audio.rtf}`);// audio.audio is an ArrayBuffer with PCM16 data
AudioResponse audio = client.tts().generate( GenerateRequest.builder("Hello, this is a test of the KugelAudio text-to-speech system.") .modelId("kugel-3") .voiceId(1071) .cfgScale(2.0) .maxNewTokens(2048) .sampleRate(24000) .normalize(true) .language("en") .wordTimestamps(false) .speed(1.0) .build());// Inspect the responseSystem.out.printf("Duration: %.2fs%n", audio.getDurationSeconds());System.out.printf("Samples: %d%n", audio.getTotalSamples());System.out.printf("Sample rate: %d Hz%n", audio.getSampleRate());System.out.printf("Generation time: %.0fms%n", audio.getGenerationMs());System.out.printf("RTF: %.2f%n", audio.getRtf());// Save to WAV fileaudio.saveWav(java.nio.file.Path.of("output.wav"));
curl -X POST https://api.kugelaudio.com/v1/tts/generate \ -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Hello, this is a test of the KugelAudio text-to-speech system.", "model_id": "kugel-3", "voice_id": 1071, "cfg_scale": 2.0, "max_new_tokens": 2048, "sample_rate": 24000, "normalize": true, "language": "en", "word_timestamps": false, "speed": 1.0 }' \ --output output.pcm
For latency-sensitive applications, pre-establish the WebSocket connection at startup to keep the handshake out of your first request — see Latency.
Python (Async)
Python (Sync)
JavaScript
Java
import asynciofrom kugelaudio import KugelAudioasync def main(): # Create a pre-connected client (handshake happens here) client = await KugelAudio.create(api_key="your_api_key") # First request is now fast — no handshake on the hot path audio = await client.tts.generate_async( text="Hello, world!", model_id="kugel-3", ) audio.save("output.wav") await client.aclose()asyncio.run(main())
from kugelaudio import KugelAudioclient = KugelAudio(api_key="your_api_key")# Pre-connect at startup (handshake happens here)client.connect()# First request is now fastaudio = client.tts.generate( text="Hello, world!", model_id="kugel-3",)
import { KugelAudio } from 'kugelaudio';// Create a pre-connected client (handshake happens here)const client = await KugelAudio.create({ apiKey: 'your_api_key' });// First request is now fast — no handshake on the hot pathconst audio = await client.tts.generate({ text: 'Hello, world!', modelId: 'kugel-3',});
import com.kugelaudio.sdk.KugelAudio;import com.kugelaudio.sdk.KugelAudioOptions;// autoConnect warms the WebSocket in the background during constructionKugelAudio client = new KugelAudio( KugelAudioOptions.builder("your_api_key") .autoConnect(true) .build());// First request is now fast — connection is already establishedAudioResponse audio = client.tts().generate( GenerateRequest.builder("Hello, world!") .modelId("kugel-3") .language("en") .build());client.close();
Without pre-connecting, the first TTS request includes WebSocket connection setup.
Subsequent requests reuse the connection. See Latency for typical numbers.
Pre-connecting moves this overhead to application startup.
const audio = await client.tts.generate({ text: 'Hello, how are you today?', modelId: 'kugel-3', wordTimestamps: true,});for (const ts of audio.wordTimestamps) { console.log(`${ts.word}: ${ts.startMs}ms - ${ts.endMs}ms (score: ${ts.score.toFixed(2)})`);}
import com.kugelaudio.sdk.WordTimestamp;AudioResponse audio = client.tts().generate( GenerateRequest.builder("Hello, how are you today?") .modelId("kugel-3") .language("en") .wordTimestamps(true) .build());for (WordTimestamp ts : audio.getWordTimestamps()) { System.out.printf("%s: %dms - %dms (score: %.2f)%n", ts.getWord(), ts.getStartMs(), ts.getEndMs(), ts.getScore());}
curl -X POST https://api.kugelaudio.com/v1/tts/generate \ -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Hello, how are you today?", "model_id": "kugel-3", "word_timestamps": true }' \ --output output.pcm
Word timestamps are included in the response headers or as a JSON
preamble before the audio bytes. Use an SDK for convenient access
to parsed timestamp objects.
Word timestamps add no extra audio latency. For streaming use cases, see the Streaming Guide.