Skip to main content
KugelAudio exposes an ElevenLabs-compatible HTTP API, so any existing integration built for ElevenLabs works by changing one line: the base_url. No other code changes required.

Quick Start

Python SDK

from elevenlabs import ElevenLabs

client = ElevenLabs(
    api_key="your-kugelaudio-api-key",
    base_url="https://api.kugelaudio.com/11labs",
)

audio = client.text_to_speech.convert(
    voice_id="480",  # use client.voices.get_all() to list available voices
    text="Hello from KugelAudio!",
    model_id="kugel-3",
    output_format="pcm_24000",
)

with open("output.pcm", "wb") as f:
    for chunk in audio:
        f.write(chunk)

Node.js SDK

import ElevenLabs from "elevenlabs";

const client = new ElevenLabs({
  apiKey: "your-kugelaudio-api-key",
  baseUrl: "https://api.kugelaudio.com/11labs",
});

const stream = await client.textToSpeech.convertAsStream("480", {
  text: "Hello from KugelAudio!",
  modelId: "kugel-3",
  outputFormat: "pcm_24000",
});

Migrating from ElevenLabs

The only changes needed:
  1. Replace base_url — point to your KugelAudio server
  2. Update voice_id — use KugelAudio voice IDs (not ElevenLabs IDs)
  3. Update output_format — use a PCM format for lowest overhead, or MP3 for integrations that require ElevenLabs’ default response shape (see Output Formats)
# Before
client = ElevenLabs(api_key="your-elevenlabs-key")

# After
client = ElevenLabs(
    api_key="your-kugelaudio-key",
    base_url="https://api.kugelaudio.com/11labs",
)
List your available voices to get the right IDs:
voices = client.voices.get_all()
for v in voices.voices:
    print(f"{v.voice_id}: {v.name}")

Migrating a streaming integration

ElevenLabs’ text_chunker flushes on every internal trigger; their WebSocket protocol is forgiving of mid-stream flushes because each flush is comparatively cheap. KugelAudio’s /ws/tts/stream is not: each flush triggers a fresh model prefill. The mechanical translation — “flush=True on KugelAudio == flush=true on ElevenLabs” — is the single most common source of bad TTFA when porting an existing ElevenLabs integration. See Chunking & per-segment latency for why. The right translation:
ElevenLabs patternKugelAudio equivalent
send(text, flush=True) after every chunksend(text) with no flush; let the server’s text buffer chunk.
try_trigger_generation=TrueDefault behavior. The server starts generation at sentence boundaries automatically.
auto_mode=trueSame name on KugelAudio (StreamConfig.auto_mode).
One context per turnOne StreamingSession per turn — see Turn lifecycle.

Output Formats

KugelAudio generates audio natively at 24 kHz PCM16. Lower sample rates use server-side resampling. MP3 output is encoded server-side for ElevenLabs-compatible tools that expect audio/mpeg.
FormatStatusNotes
pcm_24000✅ RecommendedNative rate, zero conversion cost
pcm_22050✅ Supported
pcm_16000✅ SupportedCommon for telephony
pcm_8000✅ Supported
pcm_44100✅ SupportedHigher-rate PCM for ElevenLabs compatibility
mp3_44100_128✅ SupportedElevenLabs default; also selected when Accept: audio/mpeg is sent without output_format
mp3_44100_32, mp3_44100_64, mp3_44100_96, mp3_44100_192✅ Supported
mp3_22050_32✅ SupportedLower-bandwidth MP3
ulaw_8000✅ SupportedG.711 µ-law at 8 kHz; audio/basic, audio.ulaw
alaw_8000✅ SupportedG.711 a-law at 8 kHz; audio/basic, audio.alaw

Open WebUI

Open WebUI’s ElevenLabs TTS path sends Accept: audio/mpeg and saves the response as an .mp3 file. KugelAudio honors that header on /11labs/v1/text-to-speech/{voice_id} and returns audio/mpeg MP3 bytes when no explicit output_format query parameter is present.

Optional client-side G.711 conversion

KugelAudio can emit ulaw_8000 and alaw_8000 directly. If you need to convert an existing PCM stream client-side, resample to 8 kHz first:
import audioop

pcm_bytes = b"".join(chunk for chunk in audio_stream)
pcm_8k = audioop.ratecv(pcm_bytes, 2, 1, 24000, 8000, None)[0]

ulaw_bytes = audioop.lin2ulaw(pcm_8k, 2)
alaw_bytes = audioop.lin2alaw(pcm_8k, 2)

Supported Endpoints

Text-to-Speech

EndpointMethodStatus
/v1/text-to-speech/{voice_id}POST✅ Supported
/v1/text-to-speech/{voice_id}/streamPOST✅ Supported
/v1/text-to-speech/{voice_id}/stream-inputWebSocket✅ Supported
About stream-input: Feed text tokens as they arrive from an LLM — synthesis starts as soon as a sentence boundary is detected, minimizing time-to-first-audio. The server sends ElevenLabs-format audio frames ({"audio": "<base64>", "isFinal": false}), then {"audio": "", "isFinal": true}, then closes the WebSocket with code 1000. That normal close is required for the official ElevenLabs Python SDK (convert_realtime), which keeps reading until the server closes (it does not stop on isFinal alone).
import asyncio, base64, json
import websockets

async def stream_tts():
    url = "wss://api.kugelaudio.com/11labs/v1/text-to-speech/480/stream-input?model_id=eleven_turbo_v2&output_format=pcm_24000"
    async with websockets.connect(url, extra_headers={"xi-api-key": "your-api-key"}) as ws:
        # Send text tokens one by one (e.g. from an LLM stream)
        for token in ["Hello, ", "this is ", "streamed ", "speech."]:
            await ws.send(json.dumps({"text": token}))

        # Signal end of stream
        await ws.send(json.dumps({"text": ""}))

        # Receive audio frames
        with open("output.pcm", "wb") as f:
            async for msg in ws:
                frame = json.loads(msg)
                if frame.get("isFinal"):
                    break
                if audio := frame.get("audio"):
                    f.write(base64.b64decode(audio))

asyncio.run(stream_tts())

Voices

EndpointMethodStatus
/v1/voicesGET✅ Supported
/v1/voices/{voice_id}GET✅ Supported
/v1/voices/addPOST❌ Not supported
/v1/voices/{voice_id}/editPOST❌ Not supported

Other

EndpointMethodStatus
/v1/modelsGET✅ Supported
/v1/userGET⚠️ Stub
/v1/user/subscriptionGET⚠️ Stub
/v1/historyGET⚠️ Stub

Available Models

Model ID (ElevenLabs alias)KugelAudio modelDescription
eleven_turbo_v2, eleven_turbo_v2_5kugel-3Fast, low-latency
eleven_multilingual_v2kugel-3High quality, multilingual
You can also pass the KugelAudio model ID directly: kugel-3.

Parameter Mapping

ElevenLabsKugelAudioNotes
voice_idvoice_idUse KugelAudio voice IDs
model_idmodelSee model table above
similarity_boostcfg_scalecfg_scale = 1.0 + (similarity_boost × 2.0), clamped to the accepted [1.2, 2.5] range
stabilityNot used

Troubleshooting

# Check server health
curl https://api.kugelaudio.com/11labs/health

# List voices
curl -H "xi-api-key: your-api-key" https://api.kugelaudio.com/11labs/v1/voices | jq '.voices[:5]'

# Test PCM TTS
curl -X POST https://api.kugelaudio.com/11labs/v1/text-to-speech/480 \
  -H "xi-api-key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "model_id": "kugel-3"}' \
  --output test.pcm

# Test Open WebUI-style MP3 TTS
curl -X POST https://api.kugelaudio.com/11labs/v1/text-to-speech/480 \
  -H "xi-api-key: your-api-key" \
  -H "Accept: audio/mpeg" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "model_id": "kugel-3"}' \
  --output test.mp3

Python SDK

Native KugelAudio SDK with full feature access

JavaScript SDK

Native KugelAudio SDK with full feature access