For real-time LLM pipelines, use client.tts.streamingSession() instead of client.tts.stream(). The session endpoint (/ws/tts/stream) keeps a persistent WebSocket connection and accumulates LLM tokens server-side, starting generation at natural sentence boundaries.
Why not flush per sentence?
Calling send(token, flush=true) on every sentence feels intuitive, but it actually increases latency:
- Each flush triggers a full model prefill (the fixed cost of loading context into the model).
- The server’s KV cache cannot be reused across separate flushes, so each segment is cold.
- Word-level flushing adds avoidable latency per sentence compared to letting the server batch — see Latency.
Let the server handle chunking via chunkLengthSchedule and autoMode.
Basic usage
const session = client.tts.streamingSession(
{
voiceId: 1071,
modelId: 'kugel-3',
// autoMode: emit at first sentence boundary (lowest TTFA)
autoMode: true,
chunkLengthSchedule: [50, 100, 150, 250],
},
{
onChunk: (chunk) => playAudio(chunk.audio),
onChunkComplete: (chunkId, audioSecs, genMs) => {
console.log(`Chunk ${chunkId}: ${audioSecs.toFixed(2)}s audio in ${genMs}ms`);
},
onSessionClosed: (totalSecs) => {
console.log(`Session complete: ${totalSecs.toFixed(2)}s total audio`);
},
onError: (err) => console.error('TTS error:', err),
}
);
session.connect();
// Feed LLM tokens as they arrive
for await (const delta of openai.chat.completions.stream(...)) {
const text = delta.choices[0]?.delta?.content;
if (text) session.send(text);
}
// Flush remaining buffer and close
session.close();
Session Reuse
End a session without closing the WebSocket to avoid reconnection overhead (see Latency):
const session = client.tts.streamingSession(
{ voiceId: 1071 },
{ onChunk: (chunk) => playAudio(chunk.audio) }
);
session.connect();
// Session 1
session.send('Hello from voice one.');
await session.endSession(); // Keeps WebSocket open
// Session 2 — no reconnection needed
session.updateConfig({ voiceId: 1072 });
session.send('Hello from voice two.');
await session.close(); // Closes session + WebSocket
Barge-in (interrupt the current turn)
When the end user speaks over the agent, call cancelCurrent() to stop
generating the current turn immediately and drop any buffered/queued text —
without closing the WebSocket. Unlike endSession(), no remaining text is
flushed; the turn is abandoned. The socket stays open so you can send() the
next turn right away.
const session = client.tts.streamingSession(
{ voiceId: 1071 },
{
onChunk: (chunk) => playAudio(chunk.audio),
onInterrupted: () => stopLocalPlayback(),
}
);
await session.connect();
session.send('This is a very long answer the user talks over');
// VAD detected the user speaking — barge in:
await session.cancelCurrent();
// Socket still open — next turn starts immediately:
session.send('Sure, what would you like instead?', true);
cancelCurrent() resolves once the server acknowledges (onInterrupted
fires), or after a short quiet timeout if the server goes silent. Stop local
playback as soon as you call it — a few in-flight frames may arrive before the
acknowledgement. See Barge-in
for the full protocol.
Chunking presets
| Preset | Config | Best for |
|---|
| Low-latency | autoMode: true, chunkLengthSchedule: [50, 100, 150, 250] | Voice assistants, chat bots |
| Balanced | chunkLengthSchedule: [80, 150, 250] (default) | General LLM streaming |
| High-quality | chunkLengthSchedule: [120, 200, 300] | Narration, long-form audio |
autoMode: true and small chunkLengthSchedule values minimise time-to-first-audio. Use larger values when prosody quality matters more than TTFA.
Avoid calling send(text, true) (flush=true) on every sentence. This bypasses server-side semantic chunking, forces a cold model prefill per segment, and degrades both latency and audio quality.
Session methods
streamingSession(config, callbacks) returns a StreamingSession:
| Method | Returns | Description |
|---|
session.connect() | Promise<void> | Open and authenticate the WebSocket. |
session.send(text, flush?) | void | Buffer text; flush=true forces synthesis of whatever is buffered. |
session.cancelCurrent() | Promise<void> | Barge-in: abandon the current turn and drop buffered/queued text, keeping the socket open. |
session.endSession() | Promise<void> | End the current turn (flushing remaining text) but keep the WebSocket open for reuse. |
session.updateConfig(config) | void | Update configuration (e.g. voiceId) for the next session after endSession(). |
session.close() | Promise<void> | Close the session and the WebSocket. |
session.isConnected | boolean | Whether the underlying WebSocket is open. |
session.lastUsage | SessionUsage | null | Per-session usage (audio time + amount charged) from the most recently closed session, for billing your own customers per conversation. null before the first session closes. See SessionUsage. |
Multi-Context Sessions
A multi-context session manages up to 20 independent audio-generation
contexts over a single WebSocket. Each context has its own text buffer,
voice settings, and generation queue — useful for multi-speaker
conversations, pre-buffering one stream while another plays, or interleaving
audio for dynamic dialogue.
const session = client.tts.createMultiContextSession({
defaultVoiceId: 1071,
language: 'en',
});
await session.connect({
onChunk: (chunk) => {
// chunk.contextId tells you which speaker this audio belongs to
playAudio(chunk.contextId, chunk.audio);
},
onContextClosed: (contextId, usage) =>
// `usage` carries this conversation's audio time + amount charged (EUR cents)
console.log(`${contextId} finished`, usage),
onError: (err, contextId) => console.error(contextId, err),
});
// Create contexts, optionally with different voices
session.createContext('narrator', { voiceId: 1071 });
session.createContext('character', { voiceId: 1072 });
// Send text to a specific context
session.send('narrator', 'The story begins.');
session.send('character', 'Hello there!', true); // flush
// Close one context, then the whole session
session.closeContext('narrator');
session.close();
Create the session with createMultiContextSession(config?):
| Config field | Type | Default | Description |
|---|
defaultVoiceId | number | – | Default voice for contexts that don’t override it. |
sampleRate | number | 24000 | Output sample rate. |
outputFormat | string | – | Combined codec + rate token (pcm_8000, pcm_16000, pcm_22050, pcm_24000, ulaw_8000, alaw_8000). |
cfgScale | number | 2.0 | Guidance scale. |
temperature | number | 0.5 | Sampling variance (0.0–1.0). |
maxNewTokens | number | 2048 | Maximum tokens per generation. |
normalize | boolean | true | Enable text normalization. |
language | string | – | Normalization language. |
inactivityTimeout | number | 20.0 | Seconds before an idle context auto-closes. |
MultiContextSession methods and properties:
| Member | Returns | Description |
|---|
connect(callbacks) | Promise<void> | Open the WebSocket with MultiContextCallbacks. |
createContext(contextId, { voiceId?, voiceSettings? }) | void | Create a context with an optional voice override. |
send(contextId, text, flush?) | void | Send text to a context. |
flush(contextId) | void | Flush a context’s buffer. |
closeContext(contextId, immediate?) | void | Close a context. immediate=true barges in, discarding buffered/queued text. |
keepAlive(contextId) | void | Reset a context’s inactivity timeout. |
close() | void | Close the session. |
usageFor(contextId) | SessionUsage | null | Per-context usage (audio time + amount charged) for a closed context — each context is its own conversation. null until that context closes. Also delivered as the second arg to onContextClosed. See SessionUsage. |
contextUsage | Map<string, SessionUsage> | Map of contextId → usage for every context closed so far. |
sessionId | string | null | Server-assigned session ID. |
activeContexts | string[] | Currently active context IDs. |
isConnected | boolean | Whether the WebSocket is open. |
Audio arrives via the onChunk callback as a MultiContextAudioChunk —
an AudioChunk plus a contextId field identifying its context.
Multi-context types
interface MultiContextConfig {
defaultVoiceId?: number;
sampleRate?: number;
cfgScale?: number;
temperature?: number;
maxNewTokens?: number;
normalize?: boolean;
language?: string;
inactivityTimeout?: number;
}
interface MultiContextAudioChunk extends AudioChunk {
contextId: string; // which context this audio belongs to
}
interface ContextVoiceSettings {
stability?: number;
similarityBoost?: number;
style?: number;
useSpeakerBoost?: boolean;
speed?: number;
}
interface MultiContextCallbacks {
onSessionStarted?: (sessionId: string) => void;
onContextCreated?: (contextId: string) => void;
onChunk?: (chunk: MultiContextAudioChunk) => void;
// All audio admitted before a flush has been delivered for this context
// (ElevenLabs is_final equivalent); also fires before a graceful
// onContextClosed.
onFinal?: (contextId: string) => void;
onContextClosed?: (contextId: string) => void;
onContextTimeout?: (contextId: string) => void;
onSessionClosed?: (stats: Record<string, unknown>) => void;
onError?: (error: Error, contextId?: string) => void;
}
Shared interfaces (StreamConfig, StreamingSessionCallbacks, SessionUsage, AudioChunk) are documented in Types & Errors.