ElevenLabs API Compatibility

KugelAudio exposes an ElevenLabs-compatible HTTP API, so any existing integration built for ElevenLabs works by changing one line: the base_url. No other code changes required.

Quick Start

Python SDK

from elevenlabs import ElevenLabs

client = ElevenLabs(
    api_key="your-kugelaudio-api-key",
    base_url="https://api.kugelaudio.com/11labs",
)

audio = client.text_to_speech.convert(
    voice_id="480",  # use client.voices.get_all() to list available voices
    text="Hello from KugelAudio!",
    model_id="kugel-3",
    output_format="pcm_24000",
)

with open("output.pcm", "wb") as f:
    for chunk in audio:
        f.write(chunk)

Node.js SDK

import ElevenLabs from "elevenlabs";

const client = new ElevenLabs({
  apiKey: "your-kugelaudio-api-key",
  baseUrl: "https://api.kugelaudio.com/11labs",
});

const stream = await client.textToSpeech.convertAsStream("480", {
  text: "Hello from KugelAudio!",
  modelId: "kugel-3",
  outputFormat: "pcm_24000",
});

Migrating from ElevenLabs

The only changes needed:

Replace base_url — point to your KugelAudio server
Update voice_id — use KugelAudio voice IDs (not ElevenLabs IDs)
Update output_format — use a PCM format for lowest overhead, or MP3 for integrations that require ElevenLabs’ default response shape (see Output Formats)

# Before
client = ElevenLabs(api_key="your-elevenlabs-key")

# After
client = ElevenLabs(
    api_key="your-kugelaudio-key",
    base_url="https://api.kugelaudio.com/11labs",
)

List your available voices to get the right IDs:

voices = client.voices.get_all()
for v in voices.voices:
    print(f"{v.voice_id}: {v.name}")

Migrating a streaming integration

ElevenLabs’ text_chunker flushes on every internal trigger; their WebSocket protocol is forgiving of mid-stream flushes because each flush is comparatively cheap. KugelAudio’s /ws/tts/stream is not: each flush triggers a fresh model prefill. The mechanical translation — “flush=True on KugelAudio == flush=true on ElevenLabs” — is the single most common source of bad TTFA when porting an existing ElevenLabs integration. See Chunking & per-segment latency for why. The right translation:

ElevenLabs pattern	KugelAudio equivalent
`send(text, flush=True)` after every chunk	`send(text)` with no flush; let the server’s text buffer chunk.
`try_trigger_generation=True`	Default behavior. The server starts generation at sentence boundaries automatically.
`auto_mode=true`	Same name on KugelAudio (`StreamConfig.auto_mode`).
One context per turn	One `StreamingSession` per turn — see Turn lifecycle.

Output Formats

KugelAudio generates audio natively at 24 kHz PCM16. Lower sample rates use server-side resampling. MP3 output is encoded server-side for ElevenLabs-compatible tools that expect audio/mpeg.

Format	Status	Notes
`pcm_24000`	✅ Recommended	Native rate, zero conversion cost
`pcm_22050`	✅ Supported
`pcm_16000`	✅ Supported	Common for telephony
`pcm_8000`	✅ Supported
`pcm_44100`	✅ Supported	Higher-rate PCM for ElevenLabs compatibility
`mp3_44100_128`	✅ Supported	ElevenLabs default; also selected when `Accept: audio/mpeg` is sent without `output_format`
`mp3_44100_32`, `mp3_44100_64`, `mp3_44100_96`, `mp3_44100_192`	✅ Supported
`mp3_22050_32`	✅ Supported	Lower-bandwidth MP3
`ulaw_8000`	✅ Supported	G.711 µ-law at 8 kHz; `audio/basic`, `audio.ulaw`
`alaw_8000`	✅ Supported	G.711 a-law at 8 kHz; `audio/basic`, `audio.alaw`

Open WebUI

Open WebUI’s ElevenLabs TTS path sends Accept: audio/mpeg and saves the response as an .mp3 file. KugelAudio honors that header on /11labs/v1/text-to-speech/{voice_id} and returns audio/mpeg MP3 bytes when no explicit output_format query parameter is present.

Optional client-side G.711 conversion

KugelAudio can emit ulaw_8000 and alaw_8000 directly. If you need to convert an existing PCM stream client-side, resample to 8 kHz first:

import audioop

pcm_bytes = b"".join(chunk for chunk in audio_stream)
pcm_8k = audioop.ratecv(pcm_bytes, 2, 1, 24000, 8000, None)[0]

ulaw_bytes = audioop.lin2ulaw(pcm_8k, 2)
alaw_bytes = audioop.lin2alaw(pcm_8k, 2)

Supported Endpoints

Text-to-Speech

Endpoint	Method	Status
`/v1/text-to-speech/{voice_id}`	POST	✅ Supported
`/v1/text-to-speech/{voice_id}/stream`	POST	✅ Supported
`/v1/text-to-speech/{voice_id}/stream-input`	WebSocket	✅ Supported

About stream-input: Feed text tokens as they arrive from an LLM — synthesis starts as soon as a sentence boundary is detected, minimizing time-to-first-audio. The server sends ElevenLabs-format audio frames ({"audio": "<base64>", "isFinal": false}), then {"audio": "", "isFinal": true}, then closes the WebSocket with code 1000. That normal close is required for the official ElevenLabs Python SDK (convert_realtime), which keeps reading until the server closes (it does not stop on isFinal alone).

import asyncio, base64, json
import websockets

async def stream_tts():
    url = "wss://api.kugelaudio.com/11labs/v1/text-to-speech/480/stream-input?model_id=eleven_turbo_v2&output_format=pcm_24000"
    async with websockets.connect(url, extra_headers={"xi-api-key": "your-api-key"}) as ws:
        # Send text tokens one by one (e.g. from an LLM stream)
        for token in ["Hello, ", "this is ", "streamed ", "speech."]:
            await ws.send(json.dumps({"text": token}))

        # Signal end of stream
        await ws.send(json.dumps({"text": ""}))

        # Receive audio frames
        with open("output.pcm", "wb") as f:
            async for msg in ws:
                frame = json.loads(msg)
                if frame.get("isFinal"):
                    break
                if audio := frame.get("audio"):
                    f.write(base64.b64decode(audio))

asyncio.run(stream_tts())

Voices

Endpoint	Method	Status
`/v1/voices`	GET	✅ Supported
`/v1/voices/{voice_id}`	GET	✅ Supported
`/v1/voices/add`	POST	❌ Not supported
`/v1/voices/{voice_id}/edit`	POST	❌ Not supported

Other

Endpoint	Method	Status
`/v1/models`	GET	✅ Supported
`/v1/user`	GET	⚠️ Stub
`/v1/user/subscription`	GET	⚠️ Stub
`/v1/history`	GET	⚠️ Stub

Available Models

Model ID (ElevenLabs alias)	KugelAudio model	Description
`eleven_turbo_v2`, `eleven_turbo_v2_5`	`kugel-3`	Fast, low-latency
`eleven_multilingual_v2`	`kugel-3`	High quality, multilingual

You can also pass the KugelAudio model ID directly: kugel-3.

Parameter Mapping

ElevenLabs	KugelAudio	Notes
`voice_id`	`voice_id`	Use KugelAudio voice IDs
`model_id`	`model`	See model table above
`similarity_boost`	`cfg_scale`	`cfg_scale = 1.0 + (similarity_boost × 2.0)`, clamped to the accepted `[1.2, 2.5]` range
`stability`	—	Not used

Troubleshooting

# Check server health
curl https://api.kugelaudio.com/11labs/health

# List voices
curl -H "xi-api-key: your-api-key" https://api.kugelaudio.com/11labs/v1/voices | jq '.voices[:5]'

# Test PCM TTS
curl -X POST https://api.kugelaudio.com/11labs/v1/text-to-speech/480 \
  -H "xi-api-key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "model_id": "kugel-3"}' \
  --output test.pcm

# Test Open WebUI-style MP3 TTS
curl -X POST https://api.kugelaudio.com/11labs/v1/text-to-speech/480 \
  -H "xi-api-key: your-api-key" \
  -H "Accept: audio/mpeg" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "model_id": "kugel-3"}' \
  --output test.mp3

Python SDK

Native KugelAudio SDK with full feature access

JavaScript SDK

Native KugelAudio SDK with full feature access

​Quick Start

​Python SDK

​Node.js SDK

​Migrating from ElevenLabs

​Migrating a streaming integration

​Output Formats

​Open WebUI

​Optional client-side G.711 conversion

​Supported Endpoints

​Text-to-Speech

​Voices

​Other

​Available Models

​Parameter Mapping

​Troubleshooting

Python SDK

JavaScript SDK

Quick Start

Python SDK

Node.js SDK

Migrating from ElevenLabs

Migrating a streaming integration

Output Formats

Open WebUI

Optional client-side G.711 conversion

Supported Endpoints

Text-to-Speech

Voices

Other

Available Models

Parameter Mapping

Troubleshooting