LLM Integration: Streaming Sessions
For real-time TTS when streaming text from an LLM (like GPT-4, Claude, etc.):
import com.kugelaudio.sdk.StreamConfig;
import com.kugelaudio.sdk.StreamCallbacks;
import com.kugelaudio.sdk.StreamingSession;
StreamConfig config = StreamConfig.builder()
.voiceId(1071)
.modelId("kugel-3")
.language("en")
.flushTimeoutMs(500) // Auto-flush after 500ms of no input
.build();
// Simulate LLM token stream
String[] tokens = {"Hello, ", "this ", "is ", "a ", "streamed ", "response."};
try (StreamingSession session = client.streamingSession(config, new StreamCallbacks() {
@Override
public void onChunk(AudioChunk chunk) {
playAudio(chunk.getAudio());
}
@Override
public void onComplete(AudioResponse response) {
System.out.printf("Done: %.0fms%n", response.getDurationMs());
}
})) {
// Send tokens as they arrive from LLM
for (String token : tokens) {
session.send(token);
}
// Flush any remaining text to trigger generation
session.flush();
}
Session Reuse
End a session without closing the WebSocket to avoid reconnection overhead (see Turn lifecycle):
StreamingSession session = client.streamingSession(config, callbacks);
session.connect();
// Session 1
session.send("Hello from voice one.");
session.flush();
session.endSession(); // Keeps WebSocket open
// Session 2 — no reconnection needed
// (send new config on the next send)
session.send("Hello from voice two.");
session.flush();
session.close(); // Closes session + WebSocket
Barge-in (interrupt the current turn)
When the end user speaks over the agent, call cancelCurrent() to stop
generating the current turn immediately and drop any buffered/queued text —
without closing the WebSocket. Unlike endSession(), no remaining text is
flushed; the turn is abandoned. The socket stays open so the next send()
starts the next turn right away. Override onInterrupted() on your
StreamCallbacks to stop local playback at the cancellation point.
StreamCallbacks callbacks = new StreamCallbacks() {
@Override public void onChunk(AudioChunk chunk) { playAudio(chunk); }
@Override public void onInterrupted() { stopLocalPlayback(); }
};
StreamingSession session = client.streamingSession(config, callbacks);
session.connect();
session.send("This is a very long answer the user talks over");
// VAD detected the user speaking — barge in:
session.cancelCurrent();
// Socket still open — next turn starts immediately:
session.send("Sure, what would you like instead?", true);
cancelCurrent() blocks until the server acknowledges (onInterrupted fires),
or up to ~5 seconds if the server goes silent. Stop local playback as soon as
you call it — a few in-flight frames may arrive before the acknowledgement. See
Barge-in for the
full protocol.
Tuning latency with StreamConfig
StreamConfig.builder() exposes the same generation knobs as GenerateRequest
plus session-specific chunking controls:
StreamConfig config = StreamConfig.builder()
.voiceId(1071)
.modelId("kugel-3")
.language("en")
.temperature(0.5)
.speed(1.0)
.flushTimeoutMs(500) // server-side auto-flush timeout
.autoMode(true) // emit at first clean sentence boundary
.chunkLengthSchedule(List.of(50, 100, 150, 250)) // low-latency schedule
.build();
| Builder method | Description |
|---|
.flushTimeoutMs(int) | Emit buffered text after this many ms of no new input. Default 500. |
.chunkLengthSchedule(List<Integer>) | Minimum buffer size (chars) before each successive auto-chunk. Entry i applies to chunk i; the last value repeats. Default [5, 80, 150, 250]. Smaller = lower TTFA; larger = better prosody. |
.autoMode(boolean) | Start generating at the very first clean sentence boundary (ElevenLabs auto_mode). Lowest TTFA. |
.dictionaryIds(List<Integer>) | Per-session dictionary selection, applied to every turn. Not set = all active dictionaries (language-filtered); List.of() = none; a list = exactly those dictionaries (including inactive ones), bypassing the language filter. |
.speed(double) | Playback speed 0.8–1.2 (pitch-preserving WSOLA). |
.cfgScale, .temperature, .maxNewTokens, .sampleRate, .outputFormat, .normalize, .wordTimestamps | Same meaning as on GenerateRequest. outputFormat accepts pcm_8000, pcm_16000, pcm_22050, pcm_24000, ulaw_8000, or alaw_8000. |
StreamCallbacks reference
StreamCallbacks is used by both tts().stream(...) and streamingSession(...).
Only onChunk is required; the rest are default no-ops you override as needed:
| Method | Fires when |
|---|
onChunk(AudioChunk chunk) | An audio chunk arrives (required). |
onComplete(AudioResponse response) | The one-shot stream() request finishes. |
onGenerationStarted(int chunkId, String text) | The server starts generating a text segment (sessions). |
onChunkComplete(int chunkId, double audioSeconds, double genMs) | A flushed segment finishes (sessions). |
onSessionClosed(double totalAudioSeconds, int totalTextChunks, int totalAudioChunks) | The session is fully closed. |
onWordTimestamps(List<WordTimestamp> timestamps) | Word timestamps arrive (requires wordTimestamps(true)). |
onInterrupted() | The server acknowledges a barge-in (cancelCurrent()). |
onError(KugelAudioException error) | Any error occurs. |
tts().stream(request, callbacks, reuseConnection) takes an optional third
argument — pass true to reuse the client’s pooled WebSocket connection
instead of opening a fresh one for the request.
Per-session usage
For billing your own customers per conversation, every closed session and
request reports a SessionUsage (audio time + the actual amount charged):
session.getLastUsage() on a StreamingSession — usage from the most
recently closed session (null before the first close).
response.getUsage() on the AudioResponse from a one-shot
generate() / stream() request — per-request usage.
session.getUsageFor(contextId) on a MultiContextSession — per-context
usage (see Multi-Context Sessions).
import com.kugelaudio.sdk.SessionUsage;
SessionUsage usage = session.getLastUsage();
if (usage != null) {
System.out.printf("audio: %.1fs%n", usage.getAudioSeconds());
if (usage.isCostAvailable()) {
System.out.printf("charged: %.2f %s ct%n",
usage.getCostCents(), usage.getCurrency());
}
}
SessionUsage getters: getAudioSeconds() (double, always present),
getCostCents() (Double, the EUR-cents charge or null), getCurrency()
(String), getCharacters() (Integer), getModelId() (String), and
isCostAvailable() (boolean).
getCostCents() is null (and isCostAvailable() is false) when the
charge cannot be determined at session end — e.g. a transient billing error
or an internal session. It is never a misleading 0; getAudioSeconds() is
always reported.
Multi-Context Sessions
Generate audio for multiple speakers or contexts concurrently over a single WebSocket connection:
import com.kugelaudio.sdk.MultiContextConfig;
import com.kugelaudio.sdk.MultiContextSession;
import com.kugelaudio.sdk.MultiContextCallbacks;
import com.kugelaudio.sdk.CreateContextOptions;
import com.kugelaudio.sdk.AudioChunk;
MultiContextConfig config = MultiContextConfig.builder()
.language("en")
.sampleRate(24000)
.build();
try (MultiContextSession session = client.multiContextSession(config)) {
session.connect(new MultiContextCallbacks() {
@Override
public void onChunk(String contextId, AudioChunk chunk) {
System.out.printf("[%s] Chunk %d: %d samples%n",
contextId, chunk.getIndex(), chunk.getSamples());
playAudio(contextId, chunk.getAudio());
}
@Override
public void onContextClosed(String contextId) {
// Per-context usage (audio time + amount charged) for this
// conversation — each context is billed independently.
var usage = session.getUsageFor(contextId);
System.out.println("[" + contextId + "] complete; usage: " + usage);
}
@Override
public void onError(String contextId, com.kugelaudio.sdk.KugelAudioException error) {
System.err.println("[" + contextId + "] Error: " + error.getMessage());
}
});
// Create contexts for different speakers
session.createContext("speaker-a", CreateContextOptions.builder().voiceId(101).build());
session.createContext("speaker-b", CreateContextOptions.builder().voiceId(202).build());
// Send text to each context independently
session.send("speaker-a", "Hello from speaker A!");
session.send("speaker-b", "And greetings from speaker B!");
session.flush("speaker-a");
session.flush("speaker-b");
// Barge-in: the user spoke over speaker A — cancel just that context's
// in-flight generation (speaker B and the connection stay open).
session.closeContext("speaker-a", true);
}
MultiContextConfig.builder() accepts .sampleRate(int), .outputFormat(String),
.normalize(boolean), .language(String), .wordTimestamps(boolean), and
.temperature(double). Supported outputFormat tokens are pcm_8000,
pcm_16000, pcm_22050, pcm_24000, ulaw_8000, and alaw_8000.
Per-context overrides are set with CreateContextOptions.builder():
.voiceId(int), .cfgScale(double), and .maxNewTokens(int).
MultiContextSession methods: connect(MultiContextCallbacks),
createContext(id[, options]), send(id, text[, flush]), flush(id),
closeContext(id[, immediate]), keepAlive(id), getSessionId(),
getActiveContexts(), isConnected(), and close().
For per-conversation billing, getUsageFor(contextId) returns the
SessionUsage (audio time + amount charged) for a closed
context — each context is its own conversation — or null until it closes;
getContextUsage() returns a snapshot map of contextId → usage for all
closed contexts.
MultiContextCallbacks (only onChunk is required):
| Method | Fires when |
|---|
onChunk(String contextId, AudioChunk chunk) | Audio arrives for a context (required). |
onContextCreated(String contextId) | A context is created on the server. |
onGenerationStarted(String contextId) | Generation starts for a context. |
onContextClosed(String contextId) | A context finishes and closes. |
onSessionClosed() | The whole session closes. |
onWordTimestamps(String contextId, List<WordTimestamp> timestamps) | Word timestamps arrive for a context. |
onError(String contextId, KugelAudioException error) | An error occurs. |
Next: Voices — list, create, and manage voices.