Word timestamps - KugelAudio

When word_timestamps: true is set, the server performs forced alignment on each generated audio chunk and sends a word_timestamps message shortly after the corresponding audio. Useful for barge-in handling (“which word was the agent on when the user interrupted?”), subtitle synchronization, and lip-sync.

Word timestamps add no extra audio latency. The alignment model runs on the same GPU as TTS and timestamps arrive ~50–200 ms after the corresponding audio chunk.

Streaming with word timestamps

Python
JavaScript
Java
WebSocket (raw)

for chunk in client.tts.stream(
    text="Hello, this is streaming with timestamps.",
    model_id="kugel-3",
    word_timestamps=True,
):
    if hasattr(chunk, 'audio'):
        play_audio(chunk.audio)
    elif isinstance(chunk, list):
        # Word timestamps arrive as a list of WordTimestamp objects
        for ts in chunk:
            print(f"{ts.word}: {ts.start_ms}-{ts.end_ms}ms")

await client.tts.stream(
  {
    text: 'Hello, this is streaming with timestamps.',
    modelId: 'kugel-3',
    wordTimestamps: true,
  },
  {
    onChunk: (chunk) => playAudio(chunk.audio),
    onWordTimestamps: (timestamps) => {
      for (const ts of timestamps) {
        console.log(`${ts.word}: ${ts.startMs}-${ts.endMs}ms`);
      }
    },
  }
);

client.tts().stream(
    GenerateRequest.builder("Hello, this is streaming with timestamps.")
        .modelId("kugel-3")
        .language("en")
        .wordTimestamps(true)
        .build(),
    new StreamCallbacks() {
        @Override
        public void onChunk(AudioChunk chunk) {
            playAudio(chunk.getAudio());
        }
        @Override
        public void onWordTimestamps(List<WordTimestamp> timestamps) {
            for (WordTimestamp ts : timestamps) {
                System.out.printf("%s: %d-%dms%n",
                    ts.getWord(), ts.getStartMs(), ts.getEndMs());
            }
        }
    }
);

Word timestamps are only available on the WebSocket endpoints — the REST /v1/tts/generate endpoint does not accept word_timestamps (strict request validation returns 422 Unprocessable Entity). Without an SDK, connect to a WebSocket endpoint directly:

wscat -c "wss://api.kugelaudio.com/ws/tts?api_key=YOUR_API_KEY"
> {"text": "Hello, this is streaming with timestamps.", "voice_id": 978, "model_id": "kugel-3", "word_timestamps": true}

word_timestamps frames arrive interleaved with the audio frames — see the /ws/tts reference and /ws/tts/stream reference.

The timestamp payload

Each word_timestamps message carries the alignments for one audio chunk:

{
  "word_timestamps": [
    {"word": "Hello", "start_ms": 0, "end_ms": 320, "char_start": 0, "char_end": 5, "score": 0.98},
    {"word": "world", "start_ms": 350, "end_ms": 680, "char_start": 7, "char_end": 12, "score": 0.95}
  ],
  "chunk_id": 0
}

Field	Type	Description
`word`	`string`	The aligned word
`start_ms`	`int`	Start time in milliseconds (relative to chunk start)
`end_ms`	`int`	End time in milliseconds (relative to chunk start)
`char_start`	`int`	Start character offset in the original text
`char_end`	`int`	End character offset in the original text
`score`	`float`	Alignment confidence score (0.0 - 1.0)

Timestamps are relative to the start of their chunk — to place words on a global timeline, accumulate the duration of previous chunks.

Where timestamps are available

Surface	How they arrive
SDK streaming (`stream`, `streamingSession`)	`onWordTimestamps` callback (JS/Java) or timestamp items in the chunk iterator (Python)
SDK `generate()` (Python)	`AudioResponse.word_timestamps` — the SDK streams over WebSocket internally
`/ws/tts`, `/ws/tts/stream`, and `/ws/tts/multi`	`word_timestamps` frames interleaved with audio frames (reference)
REST `/v1/tts/generate`	Not supported — the field is rejected with `422`; use a WebSocket endpoint or an SDK

LiveKit uses these alignments natively for transcript sync — see the LiveKit integration.

​Streaming with word timestamps

​The timestamp payload

​Where timestamps are available

Streaming with word timestamps

The timestamp payload

Where timestamps are available