Prerequisites
Before you begin, make sure you have:- An API key from kugelaudio.com
- Python 3.9+, Node.js 18+, Java 17+, or cURL
Installation
- Python
- JavaScript/TypeScript
- Java
- cURL
Install the Python SDK using pip or uv:Or with uv (recommended):
pip install kugelaudio
uv add kugelaudio
Install the JavaScript SDK using your preferred package manager:Or with yarn/pnpm:
npm install kugelaudio
yarn add kugelaudio
# or
pnpm add kugelaudio
Add the dependency to your Or with Gradle:
pom.xml (requires Java 17+):<dependency>
<groupId>com.kugelaudio</groupId>
<artifactId>kugelaudio</artifactId>
<version>0.1.0</version>
</dependency>
implementation 'com.kugelaudio:kugelaudio:0.1.0'
cURL comes pre-installed on macOS, Linux, and Windows 10+. No installation needed.Set your API key as an environment variable:
export KUGELAUDIO_API_KEY="your_api_key"
Basic Usage
Initialize the Client
Pre-connect at startup. Without
client.connect(), the first TTS request pays the WebSocket handshake; subsequent requests reuse the connection. Pre-connecting moves the handshake cost to application startup, where it doesn’t affect user-perceived latency. See Latency for the numbers.- Python
- JavaScript
- Java
- cURL
from kugelaudio import KugelAudio
# Initialize with your API key
client = KugelAudio(api_key="your_api_key")
# Pre-connect at startup (handshake happens here)
client.connect()
# Confirm connection is ready
print(f"Connected: {client.is_connected()}")
# First request is now fast — no handshake on the hot path
import { KugelAudio } from 'kugelaudio';
// Initialize with your API key
const client = new KugelAudio({ apiKey: 'your_api_key' });
// Pre-connect at startup (handshake happens here)
await client.connect();
// Confirm connection is ready
console.log(`Connected: ${client.isConnected()}`);
// First request is now fast — no handshake on the hot path
import com.kugelaudio.sdk.KugelAudio;
import com.kugelaudio.sdk.KugelAudioOptions;
KugelAudio client = new KugelAudio(
KugelAudioOptions.builder("your_api_key").build()
);
// Pre-connect at startup (handshake happens here)
client.connect();
// Confirm connection is ready
System.out.println("Connected: " + client.isConnected());
// First request is now fast — no handshake on the hot path
No client initialization needed — just pass the API key in the
Authorization header:curl https://api.kugelaudio.com/v1/models \
-H "Authorization: Bearer $KUGELAUDIO_API_KEY"
Generate Speech
Examples below use
kugel-3, the canonical production model. Legacy IDs such as kugel-2.5 and kugel-2-turbo are still accepted for backwards compatibility; see Models for details.- Python
- JavaScript
- Java
- cURL
# Generate speech
audio = client.tts.generate(
text="Welcome to KugelAudio! This is high-quality text-to-speech.",
model_id="kugel-3",
)
# Save to file
audio.save("output.wav")
# Or get the raw bytes
wav_bytes = audio.to_wav_bytes()
// Generate speech
const audio = await client.tts.generate({
text: 'Welcome to KugelAudio! This is high-quality text-to-speech.',
modelId: 'kugel-3',
});
// audio.audio is an ArrayBuffer with PCM16 data
console.log(`Duration: ${audio.durationMs}ms`);
import com.kugelaudio.sdk.GenerateRequest;
import com.kugelaudio.sdk.AudioResponse;
AudioResponse audio = client.tts().generate(
GenerateRequest.builder("Welcome to KugelAudio! This is high-quality text-to-speech.")
.modelId("kugel-3")
.language("en")
.build()
);
// Save to file
audio.saveWav(java.nio.file.Path.of("output.wav"));
curl -X POST https://api.kugelaudio.com/v1/tts/generate \
-H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to KugelAudio! This is high-quality text-to-speech.",
"model_id": "kugel-3"
}' \
--output output.pcm
# Convert to WAV for playback
ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.wav
Stream Audio
For lower latency, stream audio chunks as they’re generated:- Python
- JavaScript
- Java
- cURL
# Synchronous streaming
for chunk in client.tts.stream(
text="Hello, this is streaming audio.",
model_id="kugel-3",
):
if hasattr(chunk, 'audio'):
# Process audio chunk immediately
print(f"Chunk {chunk.index}: {len(chunk.audio)} bytes")
# play_audio(chunk.audio)
import asyncio
async def stream_audio():
async for chunk in client.tts.stream_async(
text="Async streaming example.",
model_id="kugel-3",
):
if hasattr(chunk, 'audio'):
# Process chunk
pass
asyncio.run(stream_audio())
await client.tts.stream(
{
text: 'Hello, this is streaming audio.',
modelId: 'kugel-3',
},
{
onChunk: (chunk) => {
console.log(`Chunk ${chunk.index}: ${chunk.samples} samples`);
// Play or process the audio chunk
},
onFinal: (stats) => {
console.log(`Total duration: ${stats.durationMs}ms`);
console.log(`Generation time: ${stats.generationMs}ms`);
},
}
);
import com.kugelaudio.sdk.StreamCallbacks;
import com.kugelaudio.sdk.AudioChunk;
client.tts().stream(
GenerateRequest.builder("Hello, this is streaming audio.")
.modelId("kugel-3")
.language("en")
.build(),
new StreamCallbacks() {
@Override
public void onChunk(AudioChunk chunk) {
System.out.printf("Chunk %d: %d bytes%n",
chunk.getIndex(), chunk.getAudio().length);
// playAudio(chunk.getAudio());
}
}
);
The REST endpoint streams raw PCM bytes — pipe directly to
ffplay for real-time playback:curl -X POST https://api.kugelaudio.com/v1/tts/generate \
-H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, this is streaming audio.",
"model_id": "kugel-3"
}' \
--no-buffer | ffplay -f s16le -ar 24000 -ac 1 -nodisp -
For advanced streaming (WebSocket-based, token-by-token from LLMs), use the
Python, JavaScript, or Java SDK or the
raw WebSocket API.
Working with Voices
Pick your voice deliberately. Different voices have wildly different baseline energy, age, and warmth — a peppy DTC bot and a calm clinical agent should not share the same voice even with the same prompt. Listen to several before locking one in. Building an LLM-driven voice agent? See Voice Agent Prompting for the prompt patterns that matter most.
List Available Voices
- Python
- JavaScript
- Java
- cURL
# List all voices
result = client.voices.list()
for voice in result.voices:
print(f"{voice.id}: {voice.name}")
print(f" Languages: {', '.join(voice.supported_languages)}")
print(f"Total: {result.total}")
# Filter by language
result = client.voices.list(language="de")
// List all voices
const result = await client.voices.list();
for (const voice of result.voices) {
console.log(`${voice.id}: ${voice.name}`);
console.log(` Languages: ${voice.supportedLanguages.join(', ')}`);
}
console.log(`Total: ${result.total}`);
// Filter by language
const germanVoices = await client.voices.list({ language: 'de' });
import com.kugelaudio.sdk.Voice;
import com.kugelaudio.sdk.VoiceListResponse;
VoiceListResponse result = client.voices().list();
for (Voice voice : result.getVoices()) {
System.out.printf("%d: %s%n", voice.getId(), voice.getName());
}
System.out.printf("Total: %d%n", result.getTotal());
// Filter by language
VoiceListResponse germanVoices = client.voices().list("de", null, null, null);
# List all voices
curl https://api.kugelaudio.com/v1/voices \
-H "Authorization: Bearer $KUGELAUDIO_API_KEY"
# Filter by language
curl "https://api.kugelaudio.com/v1/voices?language=de" \
-H "Authorization: Bearer $KUGELAUDIO_API_KEY"
Use a Specific Voice
- Python
- JavaScript
- Java
- cURL
audio = client.tts.generate(
text="Hello with a specific voice!",
model_id="kugel-3",
voice_id=1071, # Use a specific voice ID
)
const audio = await client.tts.generate({
text: 'Hello with a specific voice!',
modelId: 'kugel-3',
voiceId: 1071, // Use a specific voice ID
});
AudioResponse audio = client.tts().generate(
GenerateRequest.builder("Hello with a specific voice!")
.modelId("kugel-3")
.voiceId(1071)
.language("en")
.build()
);
curl -X POST https://api.kugelaudio.com/v1/tts/generate \
-H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello with a specific voice!",
"model_id": "kugel-3",
"voice_id": 1071
}' \
--output output.pcm
Next Steps
Generate Speech
All generation options and parameters
Streaming
Real-time audio streaming techniques
Using Voices
Browse and filter available voices
Text Processing
Normalization and spell tags