Skip to main content

LLM Integration: Streaming Sessions

For real-time TTS when streaming text from an LLM (like GPT-4, Claude, etc.):
import com.kugelaudio.sdk.StreamConfig;
import com.kugelaudio.sdk.StreamCallbacks;
import com.kugelaudio.sdk.StreamingSession;

StreamConfig config = StreamConfig.builder()
    .voiceId(1071)
    .modelId("kugel-3")
    .language("en")
    .flushTimeoutMs(500)  // Auto-flush after 500ms of no input
    .build();

// Simulate LLM token stream
String[] tokens = {"Hello, ", "this ", "is ", "a ", "streamed ", "response."};

try (StreamingSession session = client.streamingSession(config, new StreamCallbacks() {
    @Override
    public void onChunk(AudioChunk chunk) {
        playAudio(chunk.getAudio());
    }

    @Override
    public void onComplete(AudioResponse response) {
        System.out.printf("Done: %.0fms%n", response.getDurationMs());
    }
})) {
    // Send tokens as they arrive from LLM
    for (String token : tokens) {
        session.send(token);
    }
    // Flush any remaining text to trigger generation
    session.flush();
}

Session Reuse

End a session without closing the WebSocket to avoid reconnection overhead (see Turn lifecycle):
StreamingSession session = client.streamingSession(config, callbacks);
session.connect();

// Session 1
session.send("Hello from voice one.");
session.flush();
session.endSession(); // Keeps WebSocket open

// Session 2 — no reconnection needed
// (send new config on the next send)
session.send("Hello from voice two.");
session.flush();

session.close(); // Closes session + WebSocket

Barge-in (interrupt the current turn)

When the end user speaks over the agent, call cancelCurrent() to stop generating the current turn immediately and drop any buffered/queued text — without closing the WebSocket. Unlike endSession(), no remaining text is flushed; the turn is abandoned. The socket stays open so the next send() starts the next turn right away. Override onInterrupted() on your StreamCallbacks to stop local playback at the cancellation point.
StreamCallbacks callbacks = new StreamCallbacks() {
    @Override public void onChunk(AudioChunk chunk) { playAudio(chunk); }
    @Override public void onInterrupted() { stopLocalPlayback(); }
};

StreamingSession session = client.streamingSession(config, callbacks);
session.connect();

session.send("This is a very long answer the user talks over");

// VAD detected the user speaking — barge in:
session.cancelCurrent();

// Socket still open — next turn starts immediately:
session.send("Sure, what would you like instead?", true);
cancelCurrent() blocks until the server acknowledges (onInterrupted fires), or up to ~5 seconds if the server goes silent. Stop local playback as soon as you call it — a few in-flight frames may arrive before the acknowledgement. See Barge-in for the full protocol.

Tuning latency with StreamConfig

StreamConfig.builder() exposes the same generation knobs as GenerateRequest plus session-specific chunking controls:
StreamConfig config = StreamConfig.builder()
    .voiceId(1071)
    .modelId("kugel-3")
    .language("en")
    .temperature(0.5)
    .speed(1.0)
    .flushTimeoutMs(500)                            // server-side auto-flush timeout
    .autoMode(true)                                 // emit at first clean sentence boundary
    .chunkLengthSchedule(List.of(50, 100, 150, 250)) // low-latency schedule
    .build();
Builder methodDescription
.flushTimeoutMs(int)Emit buffered text after this many ms of no new input. Default 500.
.chunkLengthSchedule(List<Integer>)Minimum buffer size (chars) before each successive auto-chunk. Entry i applies to chunk i; the last value repeats. Default [5, 80, 150, 250]. Smaller = lower TTFA; larger = better prosody.
.autoMode(boolean)Start generating at the very first clean sentence boundary (ElevenLabs auto_mode). Lowest TTFA.
.dictionaryIds(List<Integer>)Per-session dictionary selection, applied to every turn. Not set = all active dictionaries (language-filtered); List.of() = none; a list = exactly those dictionaries (including inactive ones), bypassing the language filter.
.speed(double)Playback speed 0.8–1.2 (pitch-preserving WSOLA).
.cfgScale, .temperature, .maxNewTokens, .sampleRate, .outputFormat, .normalize, .wordTimestampsSame meaning as on GenerateRequest. outputFormat accepts pcm_8000, pcm_16000, pcm_22050, pcm_24000, ulaw_8000, or alaw_8000.

StreamCallbacks reference

StreamCallbacks is used by both tts().stream(...) and streamingSession(...). Only onChunk is required; the rest are default no-ops you override as needed:
MethodFires when
onChunk(AudioChunk chunk)An audio chunk arrives (required).
onComplete(AudioResponse response)The one-shot stream() request finishes.
onGenerationStarted(int chunkId, String text)The server starts generating a text segment (sessions).
onChunkComplete(int chunkId, double audioSeconds, double genMs)A flushed segment finishes (sessions).
onSessionClosed(double totalAudioSeconds, int totalTextChunks, int totalAudioChunks)The session is fully closed.
onWordTimestamps(List<WordTimestamp> timestamps)Word timestamps arrive (requires wordTimestamps(true)).
onInterrupted()The server acknowledges a barge-in (cancelCurrent()).
onError(KugelAudioException error)Any error occurs.
tts().stream(request, callbacks, reuseConnection) takes an optional third argument — pass true to reuse the client’s pooled WebSocket connection instead of opening a fresh one for the request.

Per-session usage

For billing your own customers per conversation, every closed session and request reports a SessionUsage (audio time + the actual amount charged):
  • session.getLastUsage() on a StreamingSession — usage from the most recently closed session (null before the first close).
  • response.getUsage() on the AudioResponse from a one-shot generate() / stream() request — per-request usage.
  • session.getUsageFor(contextId) on a MultiContextSession — per-context usage (see Multi-Context Sessions).
import com.kugelaudio.sdk.SessionUsage;

SessionUsage usage = session.getLastUsage();
if (usage != null) {
    System.out.printf("audio: %.1fs%n", usage.getAudioSeconds());
    if (usage.isCostAvailable()) {
        System.out.printf("charged: %.2f %s ct%n",
                usage.getCostCents(), usage.getCurrency());
    }
}
SessionUsage getters: getAudioSeconds() (double, always present), getCostCents() (Double, the EUR-cents charge or null), getCurrency() (String), getCharacters() (Integer), getModelId() (String), and isCostAvailable() (boolean).
getCostCents() is null (and isCostAvailable() is false) when the charge cannot be determined at session end — e.g. a transient billing error or an internal session. It is never a misleading 0; getAudioSeconds() is always reported.

Multi-Context Sessions

Generate audio for multiple speakers or contexts concurrently over a single WebSocket connection:
import com.kugelaudio.sdk.MultiContextConfig;
import com.kugelaudio.sdk.MultiContextSession;
import com.kugelaudio.sdk.MultiContextCallbacks;
import com.kugelaudio.sdk.CreateContextOptions;
import com.kugelaudio.sdk.AudioChunk;

MultiContextConfig config = MultiContextConfig.builder()
    .language("en")
    .sampleRate(24000)
    .build();

try (MultiContextSession session = client.multiContextSession(config)) {
    session.connect(new MultiContextCallbacks() {
        @Override
        public void onChunk(String contextId, AudioChunk chunk) {
            System.out.printf("[%s] Chunk %d: %d samples%n",
                contextId, chunk.getIndex(), chunk.getSamples());
            playAudio(contextId, chunk.getAudio());
        }

        @Override
        public void onContextClosed(String contextId) {
            // Per-context usage (audio time + amount charged) for this
            // conversation — each context is billed independently.
            var usage = session.getUsageFor(contextId);
            System.out.println("[" + contextId + "] complete; usage: " + usage);
        }

        @Override
        public void onError(String contextId, com.kugelaudio.sdk.KugelAudioException error) {
            System.err.println("[" + contextId + "] Error: " + error.getMessage());
        }
    });

    // Create contexts for different speakers
    session.createContext("speaker-a", CreateContextOptions.builder().voiceId(101).build());
    session.createContext("speaker-b", CreateContextOptions.builder().voiceId(202).build());

    // Send text to each context independently
    session.send("speaker-a", "Hello from speaker A!");
    session.send("speaker-b", "And greetings from speaker B!");

    session.flush("speaker-a");
    session.flush("speaker-b");

    // Barge-in: the user spoke over speaker A — cancel just that context's
    // in-flight generation (speaker B and the connection stay open).
    session.closeContext("speaker-a", true);
}
MultiContextConfig.builder() accepts .sampleRate(int), .outputFormat(String), .normalize(boolean), .language(String), .wordTimestamps(boolean), and .temperature(double). Supported outputFormat tokens are pcm_8000, pcm_16000, pcm_22050, pcm_24000, ulaw_8000, and alaw_8000. Per-context overrides are set with CreateContextOptions.builder(): .voiceId(int), .cfgScale(double), and .maxNewTokens(int). MultiContextSession methods: connect(MultiContextCallbacks), createContext(id[, options]), send(id, text[, flush]), flush(id), closeContext(id[, immediate]), keepAlive(id), getSessionId(), getActiveContexts(), isConnected(), and close(). For per-conversation billing, getUsageFor(contextId) returns the SessionUsage (audio time + amount charged) for a closed context — each context is its own conversation — or null until it closes; getContextUsage() returns a snapshot map of contextId → usage for all closed contexts. MultiContextCallbacks (only onChunk is required):
MethodFires when
onChunk(String contextId, AudioChunk chunk)Audio arrives for a context (required).
onContextCreated(String contextId)A context is created on the server.
onGenerationStarted(String contextId)Generation starts for a context.
onContextClosed(String contextId)A context finishes and closes.
onSessionClosed()The whole session closes.
onWordTimestamps(String contextId, List<WordTimestamp> timestamps)Word timestamps arrive for a context.
onError(String contextId, KugelAudioException error)An error occurs.

Next: Voices — list, create, and manage voices.