Skip to main content
The speed request parameter adjusts playback rate using pitch-preserving time-stretching (WSOLA), so the voice stays natural at any rate. It applies uniformly to all audio in the request — except inside <prosody rate> spans, which override it.
speedEffectTypical use
0.820% slowerDictation, phone numbers, legal disclaimers
1.0Normal (default)General purpose
1.220% fasterNotifications, fast-paced UI feedback
Values outside 0.81.2 are clamped.
audio = client.tts.generate(
    text="This entire sentence is read 20% faster.",
    speed=1.2,
)
Dashboard: The playground in the KugelAudio dashboard includes a Slow / Normal / Fast speed toggle next to the model selector. Changes are reflected live in the SDK code snippet shown below the generator.

Per-span speed with <prosody rate>

To change speed for part of a request, wrap that text in an SSML-style <prosody rate="..."> tag:
audio = client.tts.generate(
    text=(
        'Unsere Rückrufnummer lautet '
        '<prosody rate="slow">0800 5834552.</prosody> '
        'Wir freuen uns auf Ihren Anruf.'
    ),
)
The text inside the span is synthesized at the span’s rate; everything outside keeps the request’s global speed.
rateEffect
"slow"0.8× (20% slower)
"medium"1.0× (normal)
"fast"1.2× (20% faster)
"0.8""1.2"Any numeric rate in the supported range
Rules:
  • Inside a span, rate overrides the global speed; outside, speed applies. Numeric rates outside 0.81.2 are clamped, like speed.
  • Spans can cover anything from a few words to multiple sentences, and <break> tags keep working inside them.
  • Spans cannot be nested, and rate is the only supported attribute (pitch / volume are rejected).
  • Malformed tags fail loudly — an unclosed <prosody>, an unknown rate, or a stray </prosody> returns a 400 (REST) or an error frame (WebSocket) instead of being silently ignored.
  • On the streaming endpoints, a span must open and close within a single message.
For phone numbers and codes, combining a slow span with digit spacing (<prosody rate="slow">0 30 12 34 56 78</prosody>) or <spell group="2"> reads most naturally.