base_url. No other code changes required.
Quick Start
Python SDK
Node.js SDK
Migrating from ElevenLabs
The only changes needed:- Replace
base_url— point to your KugelAudio server - Update
voice_id— use KugelAudio voice IDs (not ElevenLabs IDs) - Update
output_format— use a PCM format for lowest overhead, or MP3 for integrations that require ElevenLabs’ default response shape (see Output Formats)
Migrating a streaming integration
ElevenLabs’text_chunker flushes on every internal trigger; their WebSocket
protocol is forgiving of mid-stream flushes because each flush is comparatively
cheap. KugelAudio’s /ws/tts/stream is not: each flush
triggers a fresh model prefill. The mechanical translation — “flush=True on
KugelAudio == flush=true on ElevenLabs” — is the single most common source of
bad TTFA when porting an existing ElevenLabs integration. See
Chunking & per-segment latency for why.
The right translation:
| ElevenLabs pattern | KugelAudio equivalent |
|---|---|
send(text, flush=True) after every chunk | send(text) with no flush; let the server’s text buffer chunk. |
try_trigger_generation=True | Default behavior. The server starts generation at sentence boundaries automatically. |
auto_mode=true | Same name on KugelAudio (StreamConfig.auto_mode). |
| One context per turn | One StreamingSession per turn — see Turn lifecycle. |
Output Formats
KugelAudio generates audio natively at 24 kHz PCM16. Lower sample rates use server-side resampling. MP3 output is encoded server-side for ElevenLabs-compatible tools that expectaudio/mpeg.
| Format | Status | Notes |
|---|---|---|
pcm_24000 | ✅ Recommended | Native rate, zero conversion cost |
pcm_22050 | ✅ Supported | |
pcm_16000 | ✅ Supported | Common for telephony |
pcm_8000 | ✅ Supported | |
pcm_44100 | ✅ Supported | Higher-rate PCM for ElevenLabs compatibility |
mp3_44100_128 | ✅ Supported | ElevenLabs default; also selected when Accept: audio/mpeg is sent without output_format |
mp3_44100_32, mp3_44100_64, mp3_44100_96, mp3_44100_192 | ✅ Supported | |
mp3_22050_32 | ✅ Supported | Lower-bandwidth MP3 |
ulaw_8000 | ✅ Supported | G.711 µ-law at 8 kHz; audio/basic, audio.ulaw |
alaw_8000 | ✅ Supported | G.711 a-law at 8 kHz; audio/basic, audio.alaw |
Open WebUI
Open WebUI’s ElevenLabs TTS path sendsAccept: audio/mpeg and saves the response as an .mp3 file. KugelAudio honors that header on /11labs/v1/text-to-speech/{voice_id} and returns audio/mpeg MP3 bytes when no explicit output_format query parameter is present.
Optional client-side G.711 conversion
KugelAudio can emitulaw_8000 and alaw_8000 directly. If you need to convert an existing PCM stream client-side, resample to 8 kHz first:
Supported Endpoints
Text-to-Speech
| Endpoint | Method | Status |
|---|---|---|
/v1/text-to-speech/{voice_id} | POST | ✅ Supported |
/v1/text-to-speech/{voice_id}/stream | POST | ✅ Supported |
/v1/text-to-speech/{voice_id}/stream-input | WebSocket | ✅ Supported |
stream-input: Feed text tokens as they arrive from an LLM — synthesis starts as soon as a sentence boundary is detected, minimizing time-to-first-audio. The server sends ElevenLabs-format audio frames ({"audio": "<base64>", "isFinal": false}), then {"audio": "", "isFinal": true}, then closes the WebSocket with code 1000. That normal close is required for the official ElevenLabs Python SDK (convert_realtime), which keeps reading until the server closes (it does not stop on isFinal alone).
Voices
| Endpoint | Method | Status |
|---|---|---|
/v1/voices | GET | ✅ Supported |
/v1/voices/{voice_id} | GET | ✅ Supported |
/v1/voices/add | POST | ❌ Not supported |
/v1/voices/{voice_id}/edit | POST | ❌ Not supported |
Other
| Endpoint | Method | Status |
|---|---|---|
/v1/models | GET | ✅ Supported |
/v1/user | GET | ⚠️ Stub |
/v1/user/subscription | GET | ⚠️ Stub |
/v1/history | GET | ⚠️ Stub |
Available Models
| Model ID (ElevenLabs alias) | KugelAudio model | Description |
|---|---|---|
eleven_turbo_v2, eleven_turbo_v2_5 | kugel-3 | Fast, low-latency |
eleven_multilingual_v2 | kugel-3 | High quality, multilingual |
kugel-3.
Parameter Mapping
| ElevenLabs | KugelAudio | Notes |
|---|---|---|
voice_id | voice_id | Use KugelAudio voice IDs |
model_id | model | See model table above |
similarity_boost | cfg_scale | cfg_scale = 1.0 + (similarity_boost × 2.0), clamped to the accepted [1.2, 2.5] range |
stability | — | Not used |
Troubleshooting
Python SDK
Native KugelAudio SDK with full feature access
JavaScript SDK
Native KugelAudio SDK with full feature access