Multi-Context Streaming

Manage up to 20 independent audio streams over a single WebSocket connection. Useful for multi-speaker conversations, pre-buffering, and interleaved audio. The conceptual guide is Multi-context streaming.

WebSocket

Connection

wss://api.kugelaudio.com/ws/tts/multi?api_key=YOUR_API_KEY

Client → Server Messages

Message	Description
`{"text": " ", "context_id": "ctx1", "voice_settings": {"voice_id": 1071}}`	Initialize context with voice
`{"text": "Hello", "context_id": "ctx1"}`	Send text to context
`{"text": "...", "context_id": "ctx1", "flush": true}`	Send text and flush buffer
`{"flush": true, "context_id": "ctx1"}`	Flush context buffer
`{"text": "", "context_id": "ctx1"}`	Keep-alive: an empty-text frame resets the context’s inactivity timeout without generating audio
`{"close_context": true, "context_id": "ctx1"}`	Close a context, letting queued sentences finish first
`{"close_context": true, "context_id": "ctx1", "immediate": true}`	Barge-in: cancel the context’s in-flight generation immediately and drop buffered text — see Barge-in
`{"close_socket": true}`	Close all contexts and connection

Server → Client Messages

Message	Description
`{"context_created": true, "context_id": "ctx1"}`	Context created
`{"generation_started": true, "context_id": "ctx1", "chunk_id": 0, "text": "..."}`	Generation started
`{"audio": "base64...", "enc": "pcm_s16le", "context_id": "ctx1", "idx": 0, "sr": 24000, "samples": 4800, "chunk_id": 0}`	Audio chunk (field reference)
`{"chunk_complete": true, "context_id": "ctx1", "chunk_id": 0, "audio_seconds": 1.2, "gen_ms": 150}`	Chunk complete
`{"word_timestamps": [...], "context_id": "ctx1", "chunk_id": 0}`	Word-level time alignments (when enabled)
`{"final": true, "context_id": "ctx1"}`	End of audio for a flush (ElevenLabs `is_final` equivalent): every audio frame for text sent before your `{"flush": true}` has been delivered. Also sent right before `context_closed` on a graceful close. Not sent on an `immediate` (barge-in) close
`{"context_closed": true, "context_id": "ctx1", "usage": {"audio_seconds": 4.1, "cost_cents": 0.37, "currency": "eur", "model_id": "kugel-3"}}`	Context closed (terminal — all audio sent). `usage` carries this conversation’s audio time + amount charged (EUR cents; `null` + `cost_unavailable` if undetermined)
`{"session_closed": true, "total_audio_seconds": 5.4}`	Session ended (all contexts). Per-conversation usage is on each `context_closed`, not here

Voice Settings

When creating a context, pass voice settings as a nested object:

{
  "voice_settings": {
    "voice_id": 1071,
    "cfg_scale": 2.0,
    "max_new_tokens": 2048
  }
}

Session-Level Config

These options can be set on any message and apply to the entire session:

Parameter	Type	Default	Description
`model_id`	string	`kugel-3`	Model to use for generation. Use `kugel-3` for new integrations.
`sample_rate`	integer	24000	Output sample rate in Hz. Options: 8000, 16000, 22050, 24000
`output_format`	string	-	Combined codec + rate token (e.g. `ulaw_8000`) — see Audio formats. Set-once per session; may be sent top-level or inside `voice_settings`.
`normalize`	boolean	true	Enable text normalization
`language`	string	-	ISO 639-1 language code for normalization
`word_timestamps`	boolean	false	Enable word-level timestamp alignment
`dictionary_ids`	`integer[]`	omitted	Per-session dictionary selection. Omitted = all active dictionaries (language-filtered); `[]` = none; a list = exactly those (including inactive ones), bypassing the language filter

Reuse the same context_id across turns to keep one context alive (recommended for a single conversation), or open new ids for parallel speakers:

// Create / address a context. Session-level fields (sample_rate,
// output_format, language, …) may be sent top-level or inside voice_settings.
{
  "context_id": "call-42",
  "text": "Hello, how can I help you today?",
  "output_format": "ulaw_8000",
  "voice_settings": { "voice_id": 1071, "cfg_scale": 2.0 }
}

Example

import asyncio
import websockets
import json
import base64

async def multi_speaker():
    uri = "wss://api.kugelaudio.com/ws/tts/multi?api_key=YOUR_API_KEY"

    async with websockets.connect(uri) as ws:
        # Create narrator context
        await ws.send(json.dumps({
            "text": " ",
            "context_id": "narrator",
            "voice_settings": {"voice_id": 1071},
        }))

        # Create character context
        await ws.send(json.dumps({
            "text": " ",
            "context_id": "character",
            "voice_settings": {"voice_id": 1072},
        }))

        # Send text to different speakers
        await ws.send(json.dumps({
            "text": "The story begins.",
            "context_id": "narrator",
            "flush": True,
        }))

        await ws.send(json.dumps({
            "text": "Hello, I'm the main character!",
            "context_id": "character",
            "flush": True,
        }))

        # Receive audio from both contexts
        async for message in ws:
            data = json.loads(message)

            if "audio" in data:
                ctx = data["context_id"]
                audio_bytes = base64.b64decode(data["audio"])
                print(f"[{ctx}] Chunk {data['idx']}: {len(audio_bytes)} bytes")

            if data.get("context_closed"):
                usage = data.get("usage", {})
                # Per-context (per-conversation) usage: audio time + charge (EUR cents)
                print(f"[{data['context_id']}] usage: {usage.get('audio_seconds')}s, "
                      f"{usage.get('cost_cents')} ct")

            if data.get("session_closed"):
                break

        # Close when done
        await ws.send(json.dumps({"close_socket": True}))

asyncio.run(multi_speaker())

Limits

Maximum 20 concurrent contexts per connection
Contexts auto-close after 20 seconds of inactivity (send the empty-text keep-alive to reset)
Opening a context beyond the limit returns a per-context error (error_code: "TOO_MANY_CONTEXTS", code: 429) without closing the connection — close an existing context, or wait for an idle one to be released, then retry.

Errors

See Error Codes for the full TTS error lookup table, including HTTP status codes, WebSocket close codes, and rate-limit behavior.

​Connection

​Client → Server Messages

​Server → Client Messages

​Voice Settings

​Session-Level Config

​Example

​Limits

​Errors