Skip to main content
For real-time LLM pipelines, use client.tts.streamingSession() instead of client.tts.stream(). The session endpoint (/ws/tts/stream) keeps a persistent WebSocket connection and accumulates LLM tokens server-side, starting generation at natural sentence boundaries.

Why not flush per sentence?

Calling send(token, flush=true) on every sentence feels intuitive, but it actually increases latency:
  • Each flush triggers a full model prefill (the fixed cost of loading context into the model).
  • The server’s KV cache cannot be reused across separate flushes, so each segment is cold.
  • Word-level flushing adds avoidable latency per sentence compared to letting the server batch — see Latency.
Let the server handle chunking via chunkLengthSchedule and autoMode.

Basic usage

const session = client.tts.streamingSession(
  {
    voiceId: 1071,
    modelId: 'kugel-3',
    // autoMode: emit at first sentence boundary (lowest TTFA)
    autoMode: true,
    chunkLengthSchedule: [50, 100, 150, 250],
  },
  {
    onChunk: (chunk) => playAudio(chunk.audio),
    onChunkComplete: (chunkId, audioSecs, genMs) => {
      console.log(`Chunk ${chunkId}: ${audioSecs.toFixed(2)}s audio in ${genMs}ms`);
    },
    onSessionClosed: (totalSecs) => {
      console.log(`Session complete: ${totalSecs.toFixed(2)}s total audio`);
    },
    onError: (err) => console.error('TTS error:', err),
  }
);

session.connect();

// Feed LLM tokens as they arrive
for await (const delta of openai.chat.completions.stream(...)) {
  const text = delta.choices[0]?.delta?.content;
  if (text) session.send(text);
}

// Flush remaining buffer and close
session.close();

Session Reuse

End a session without closing the WebSocket to avoid reconnection overhead (see Latency):
const session = client.tts.streamingSession(
  { voiceId: 1071 },
  { onChunk: (chunk) => playAudio(chunk.audio) }
);
session.connect();

// Session 1
session.send('Hello from voice one.');
await session.endSession(); // Keeps WebSocket open

// Session 2 — no reconnection needed
session.updateConfig({ voiceId: 1072 });
session.send('Hello from voice two.');

await session.close(); // Closes session + WebSocket

Barge-in (interrupt the current turn)

When the end user speaks over the agent, call cancelCurrent() to stop generating the current turn immediately and drop any buffered/queued text — without closing the WebSocket. Unlike endSession(), no remaining text is flushed; the turn is abandoned. The socket stays open so you can send() the next turn right away.
const session = client.tts.streamingSession(
  { voiceId: 1071 },
  {
    onChunk: (chunk) => playAudio(chunk.audio),
    onInterrupted: () => stopLocalPlayback(),
  }
);
await session.connect();

session.send('This is a very long answer the user talks over');

// VAD detected the user speaking — barge in:
await session.cancelCurrent();

// Socket still open — next turn starts immediately:
session.send('Sure, what would you like instead?', true);
cancelCurrent() resolves once the server acknowledges (onInterrupted fires), or after a short quiet timeout if the server goes silent. Stop local playback as soon as you call it — a few in-flight frames may arrive before the acknowledgement. See Barge-in for the full protocol.

Chunking presets

PresetConfigBest for
Low-latencyautoMode: true, chunkLengthSchedule: [50, 100, 150, 250]Voice assistants, chat bots
BalancedchunkLengthSchedule: [80, 150, 250] (default)General LLM streaming
High-qualitychunkLengthSchedule: [120, 200, 300]Narration, long-form audio
autoMode: true and small chunkLengthSchedule values minimise time-to-first-audio. Use larger values when prosody quality matters more than TTFA.
Avoid calling send(text, true) (flush=true) on every sentence. This bypasses server-side semantic chunking, forces a cold model prefill per segment, and degrades both latency and audio quality.

Session methods

streamingSession(config, callbacks) returns a StreamingSession:
MethodReturnsDescription
session.connect()Promise<void>Open and authenticate the WebSocket.
session.send(text, flush?)voidBuffer text; flush=true forces synthesis of whatever is buffered.
session.cancelCurrent()Promise<void>Barge-in: abandon the current turn and drop buffered/queued text, keeping the socket open.
session.endSession()Promise<void>End the current turn (flushing remaining text) but keep the WebSocket open for reuse.
session.updateConfig(config)voidUpdate configuration (e.g. voiceId) for the next session after endSession().
session.close()Promise<void>Close the session and the WebSocket.
session.isConnectedbooleanWhether the underlying WebSocket is open.
session.lastUsageSessionUsage | nullPer-session usage (audio time + amount charged) from the most recently closed session, for billing your own customers per conversation. null before the first session closes. See SessionUsage.

Multi-Context Sessions

A multi-context session manages up to 20 independent audio-generation contexts over a single WebSocket. Each context has its own text buffer, voice settings, and generation queue — useful for multi-speaker conversations, pre-buffering one stream while another plays, or interleaving audio for dynamic dialogue.
const session = client.tts.createMultiContextSession({
  defaultVoiceId: 1071,
  language: 'en',
});

await session.connect({
  onChunk: (chunk) => {
    // chunk.contextId tells you which speaker this audio belongs to
    playAudio(chunk.contextId, chunk.audio);
  },
  onContextClosed: (contextId, usage) =>
    // `usage` carries this conversation's audio time + amount charged (EUR cents)
    console.log(`${contextId} finished`, usage),
  onError: (err, contextId) => console.error(contextId, err),
});

// Create contexts, optionally with different voices
session.createContext('narrator', { voiceId: 1071 });
session.createContext('character', { voiceId: 1072 });

// Send text to a specific context
session.send('narrator', 'The story begins.');
session.send('character', 'Hello there!', true); // flush

// Close one context, then the whole session
session.closeContext('narrator');
session.close();
Create the session with createMultiContextSession(config?):
Config fieldTypeDefaultDescription
defaultVoiceIdnumberDefault voice for contexts that don’t override it.
sampleRatenumber24000Output sample rate.
outputFormatstringCombined codec + rate token (pcm_8000, pcm_16000, pcm_22050, pcm_24000, ulaw_8000, alaw_8000).
cfgScalenumber2.0Guidance scale.
temperaturenumber0.5Sampling variance (0.0–1.0).
maxNewTokensnumber2048Maximum tokens per generation.
normalizebooleantrueEnable text normalization.
languagestringNormalization language.
inactivityTimeoutnumber20.0Seconds before an idle context auto-closes.
MultiContextSession methods and properties:
MemberReturnsDescription
connect(callbacks)Promise<void>Open the WebSocket with MultiContextCallbacks.
createContext(contextId, { voiceId?, voiceSettings? })voidCreate a context with an optional voice override.
send(contextId, text, flush?)voidSend text to a context.
flush(contextId)voidFlush a context’s buffer.
closeContext(contextId, immediate?)voidClose a context. immediate=true barges in, discarding buffered/queued text.
keepAlive(contextId)voidReset a context’s inactivity timeout.
close()voidClose the session.
usageFor(contextId)SessionUsage | nullPer-context usage (audio time + amount charged) for a closed context — each context is its own conversation. null until that context closes. Also delivered as the second arg to onContextClosed. See SessionUsage.
contextUsageMap<string, SessionUsage>Map of contextId → usage for every context closed so far.
sessionIdstring | nullServer-assigned session ID.
activeContextsstring[]Currently active context IDs.
isConnectedbooleanWhether the WebSocket is open.
Audio arrives via the onChunk callback as a MultiContextAudioChunk — an AudioChunk plus a contextId field identifying its context.

Multi-context types

interface MultiContextConfig {
  defaultVoiceId?: number;
  sampleRate?: number;
  cfgScale?: number;
  temperature?: number;
  maxNewTokens?: number;
  normalize?: boolean;
  language?: string;
  inactivityTimeout?: number;
}

interface MultiContextAudioChunk extends AudioChunk {
  contextId: string;  // which context this audio belongs to
}

interface ContextVoiceSettings {
  stability?: number;
  similarityBoost?: number;
  style?: number;
  useSpeakerBoost?: boolean;
  speed?: number;
}

interface MultiContextCallbacks {
  onSessionStarted?: (sessionId: string) => void;
  onContextCreated?: (contextId: string) => void;
  onChunk?: (chunk: MultiContextAudioChunk) => void;
  // All audio admitted before a flush has been delivered for this context
  // (ElevenLabs is_final equivalent); also fires before a graceful
  // onContextClosed.
  onFinal?: (contextId: string) => void;
  onContextClosed?: (contextId: string) => void;
  onContextTimeout?: (contextId: string) => void;
  onSessionClosed?: (stats: Record<string, unknown>) => void;
  onError?: (error: Error, contextId?: string) => void;
}

Shared interfaces (StreamConfig, StreamingSessionCallbacks, SessionUsage, AudioChunk) are documented in Types & Errors.