WebSocket
Connection
Connect with your API key:Request Message
Send a JSON message to start generation. Fields share the meaning and defaults of the Generate Speech parameters:Enable word-level timestamp alignment. When enabled, a
word_timestamps message is sent after the audio chunks with per-word timing data.Playback speed multiplier. Range:
0.8 (20% slower) to 1.2 (20% faster). Uses pitch-preserving WSOLA.Per-request dictionary selection. Omitted = all active dictionaries (language-filtered);
[] = none; a list applies exactly those dictionaries (including inactive ones), bypassing the language filter. Also accepted in the config of /ws/tts/stream and /ws/tts/multi, where it is sticky for the session.Prepend an internal speaker prefix to the text for better voice consistency.
Text Normalization: Set
normalize: true to convert numbers, dates, and symbols to spoken words.
Always specify language to ensure correct normalization — auto-detection may produce incorrect results for short texts.Response Messages
Audio Chunk
Word Timestamps (when word_timestamps: true)
Final Message
On this endpoint,final is the request-complete message and carries the
request’s stats and usage. (The streaming endpoints emit a lighter
end-of-audio final without usage, followed by session_closed — see
Turn lifecycle.)
| Field | Type | Description |
|---|---|---|
final | boolean | Indicates generation complete |
chunks | integer | Number of chunks generated |
total_samples | integer | Total audio samples generated |
dur_ms | number | Total audio duration in ms |
gen_ms | number | Total generation time in ms |
rtf | number | Real-time factor (gen_ms / dur_ms) |
usage object reports what this request consumed and what it was
charged, so you can bill your own customers per request:
| Field | Description |
|---|---|
audio_seconds | Audio generated for this request (the unit we bill on) |
characters | Input characters submitted |
cost_cents | Actual amount charged, in EUR cents. null (with cost_unavailable: true) if the charge could not be determined — never a misleading 0 |
currency | Currency of cost_cents ("eur"); present only when cost_cents is set |
model_id | Model that produced the audio |
Example
Errors
WebSocket error frames use the same JSON error shape as HTTP responses:code. See
Error Codes for the full lookup table.