The speed request parameter adjusts playback rate using pitch-preserving
time-stretching (WSOLA), so the voice stays natural at any rate. It applies
uniformly to all audio in the request — except inside
<prosody rate> spans, which override it.
speed | Effect | Typical use |
|---|
0.8 | 20% slower | Dictation, phone numbers, legal disclaimers |
1.0 | Normal (default) | General purpose |
1.2 | 20% faster | Notifications, fast-paced UI feedback |
Values outside 0.8–1.2 are clamped.
audio = client.tts.generate(
text="This entire sentence is read 20% faster.",
speed=1.2,
)
Dashboard: The playground in the KugelAudio dashboard includes a
Slow / Normal / Fast speed toggle next to the model selector. Changes
are reflected live in the SDK code snippet shown below the generator.
Per-span speed with <prosody rate>
To change speed for part of a request, wrap that text in an
SSML-style <prosody rate="..."> tag:
audio = client.tts.generate(
text=(
'Unsere Rückrufnummer lautet '
'<prosody rate="slow">0800 5834552.</prosody> '
'Wir freuen uns auf Ihren Anruf.'
),
)
The text inside the span is synthesized at the span’s rate; everything
outside keeps the request’s global speed.
rate | Effect |
|---|
"slow" | 0.8× (20% slower) |
"medium" | 1.0× (normal) |
"fast" | 1.2× (20% faster) |
"0.8"–"1.2" | Any numeric rate in the supported range |
Rules:
- Inside a span,
rate overrides the global speed; outside, speed
applies. Numeric rates outside 0.8–1.2 are clamped, like speed.
- Spans can cover anything from a few words to multiple sentences, and
<break> tags keep working inside them.
- Spans cannot be nested, and
rate is the only supported attribute
(pitch / volume are rejected).
- Malformed tags fail loudly — an unclosed
<prosody>, an unknown rate,
or a stray </prosody> returns a 400 (REST) or an error frame
(WebSocket) instead of being silently ignored.
- On the streaming endpoints, a span must open and
close within a single message.
For phone numbers and codes, combining a slow span with digit spacing
(<prosody rate="slow">0 30 12 34 56 78</prosody>) or
<spell group="2"> reads most naturally.