Skip to main content
Manage up to 20 independent audio streams over a single WebSocket connection. Useful for multi-speaker conversations, pre-buffering, and interleaved audio. The conceptual guide is Multi-context streaming.
WebSocket

Connection

wss://api.kugelaudio.com/ws/tts/multi?api_key=YOUR_API_KEY

Client → Server Messages

MessageDescription
{"text": " ", "context_id": "ctx1", "voice_settings": {"voice_id": 1071}}Initialize context with voice
{"text": "Hello", "context_id": "ctx1"}Send text to context
{"text": "...", "context_id": "ctx1", "flush": true}Send text and flush buffer
{"flush": true, "context_id": "ctx1"}Flush context buffer
{"text": "", "context_id": "ctx1"}Keep-alive: an empty-text frame resets the context’s inactivity timeout without generating audio
{"close_context": true, "context_id": "ctx1"}Close a context, letting queued sentences finish first
{"close_context": true, "context_id": "ctx1", "immediate": true}Barge-in: cancel the context’s in-flight generation immediately and drop buffered text — see Barge-in
{"close_socket": true}Close all contexts and connection

Server → Client Messages

MessageDescription
{"context_created": true, "context_id": "ctx1"}Context created
{"generation_started": true, "context_id": "ctx1", "chunk_id": 0, "text": "..."}Generation started
{"audio": "base64...", "enc": "pcm_s16le", "context_id": "ctx1", "idx": 0, "sr": 24000, "samples": 4800, "chunk_id": 0}Audio chunk (field reference)
{"chunk_complete": true, "context_id": "ctx1", "chunk_id": 0, "audio_seconds": 1.2, "gen_ms": 150}Chunk complete
{"word_timestamps": [...], "context_id": "ctx1", "chunk_id": 0}Word-level time alignments (when enabled)
{"final": true, "context_id": "ctx1"}End of audio for a flush (ElevenLabs is_final equivalent): every audio frame for text sent before your {"flush": true} has been delivered. Also sent right before context_closed on a graceful close. Not sent on an immediate (barge-in) close
{"context_closed": true, "context_id": "ctx1", "usage": {"audio_seconds": 4.1, "cost_cents": 0.37, "currency": "eur", "model_id": "kugel-3"}}Context closed (terminal — all audio sent). usage carries this conversation’s audio time + amount charged (EUR cents; null + cost_unavailable if undetermined)
{"session_closed": true, "total_audio_seconds": 5.4}Session ended (all contexts). Per-conversation usage is on each context_closed, not here

Voice Settings

When creating a context, pass voice settings as a nested object:
{
  "voice_settings": {
    "voice_id": 1071,
    "cfg_scale": 2.0,
    "max_new_tokens": 2048
  }
}

Session-Level Config

These options can be set on any message and apply to the entire session:
ParameterTypeDefaultDescription
model_idstringkugel-3Model to use for generation. Use kugel-3 for new integrations.
sample_rateinteger24000Output sample rate in Hz. Options: 8000, 16000, 22050, 24000
output_formatstring-Combined codec + rate token (e.g. ulaw_8000) — see Audio formats. Set-once per session; may be sent top-level or inside voice_settings.
normalizebooleantrueEnable text normalization
languagestring-ISO 639-1 language code for normalization
word_timestampsbooleanfalseEnable word-level timestamp alignment
dictionary_idsinteger[]omittedPer-session dictionary selection. Omitted = all active dictionaries (language-filtered); [] = none; a list = exactly those (including inactive ones), bypassing the language filter
Reuse the same context_id across turns to keep one context alive (recommended for a single conversation), or open new ids for parallel speakers:
// Create / address a context. Session-level fields (sample_rate,
// output_format, language, …) may be sent top-level or inside voice_settings.
{
  "context_id": "call-42",
  "text": "Hello, how can I help you today?",
  "output_format": "ulaw_8000",
  "voice_settings": { "voice_id": 1071, "cfg_scale": 2.0 }
}

Example

import asyncio
import websockets
import json
import base64

async def multi_speaker():
    uri = "wss://api.kugelaudio.com/ws/tts/multi?api_key=YOUR_API_KEY"

    async with websockets.connect(uri) as ws:
        # Create narrator context
        await ws.send(json.dumps({
            "text": " ",
            "context_id": "narrator",
            "voice_settings": {"voice_id": 1071},
        }))

        # Create character context
        await ws.send(json.dumps({
            "text": " ",
            "context_id": "character",
            "voice_settings": {"voice_id": 1072},
        }))

        # Send text to different speakers
        await ws.send(json.dumps({
            "text": "The story begins.",
            "context_id": "narrator",
            "flush": True,
        }))

        await ws.send(json.dumps({
            "text": "Hello, I'm the main character!",
            "context_id": "character",
            "flush": True,
        }))

        # Receive audio from both contexts
        async for message in ws:
            data = json.loads(message)

            if "audio" in data:
                ctx = data["context_id"]
                audio_bytes = base64.b64decode(data["audio"])
                print(f"[{ctx}] Chunk {data['idx']}: {len(audio_bytes)} bytes")

            if data.get("context_closed"):
                usage = data.get("usage", {})
                # Per-context (per-conversation) usage: audio time + charge (EUR cents)
                print(f"[{data['context_id']}] usage: {usage.get('audio_seconds')}s, "
                      f"{usage.get('cost_cents')} ct")

            if data.get("session_closed"):
                break

        # Close when done
        await ws.send(json.dumps({"close_socket": True}))

asyncio.run(multi_speaker())

Limits

  • Maximum 20 concurrent contexts per connection
  • Contexts auto-close after 20 seconds of inactivity (send the empty-text keep-alive to reset)
  • Opening a context beyond the limit returns a per-context error (error_code: "TOO_MANY_CONTEXTS", code: 429) without closing the connection — close an existing context, or wait for an idle one to be released, then retry.

Errors

See Error Codes for the full TTS error lookup table, including HTTP status codes, WebSocket close codes, and rate-limit behavior.