Supported controls at a glance
| Control | Syntax | Page |
|---|---|---|
| Pauses | <break time="300ms"/>, <break strength="medium"/>, <break/> | Breaks |
| Speed | speed request parameter (0.8–1.2, whole request) | Speed |
| Spell out characters | <spell>text</spell>, <spell group="2">…</spell> | Spell tags |
| Custom pronunciation | Inline IPA — /ˈkuːɡl̩/ — or pronunciation dictionaries | Pronunciation & IPA |
| Pacing & intonation | Plain punctuation — see below | this page |
<break> and <spell> are the only tags processed in request text.
Everything else — including SSML — is stripped before synthesis; see
Unsupported tags.
Punctuation as pacing
The model respects natural punctuation cues — no special tags needed:| Technique | Effect |
|---|---|
, comma | Brief pause between clauses |
. period | Sentence-end pause, falling intonation |
… ellipsis | Longer trailing pause |
— em dash | Abrupt pause / interruption feel |
? question mark | Rising intonation |
! exclamation | Energetic delivery |
\n newline | Paragraph-level pause (similar to period) |
<break> tags when you need an explicit silence of a
specific length (e.g. before a verification code).
Writing tips
- Strip markdown before TTS. Asterisks, hashes, and bullet characters are read literally by the model.
- No emoji. They are read out or garbled.
- Write numbers as digits when they should be normalized (“You have 3
messages”) and always set
language— see Text processing. - Keep sentences short and end them with punctuation — this also helps the streaming chunker start generation earlier (why).
!, ALL-CAPS, and?!are prosody cues — the model will deliver them energetically. Use deliberately.
LLM system prompt pattern
When an LLM generates text that feeds directly into TTS, add instructions so it uses the supported controls correctly:Unsupported tags
KugelAudio processes<spell>, <break>, and <prosody rate>. All other
tags are silently stripped — the inner text is kept but the tag itself
has no effect.
| Tag / Attribute | Status | Alternative |
|---|---|---|
<speak> wrapper | Stripped | Omit — plain text is assumed |
<prosody rate="..."> | Supported — per-span speed | See Speed |
<prosody pitch="..."> | Rejected (400) | No pitch control available |
<prosody volume="..."> | Rejected (400) | No volume control available |
<emphasis> | Stripped | Rephrase text for natural emphasis |
<say-as interpret-as="..."> | Stripped | Use <spell> for characters, normalization for numbers |
<sub alias="..."> | Stripped | Write the spoken form directly, or use a dictionary |
<phoneme> | Stripped | Write inline IPA between slashes (/ˈkuːɡl̩/) |
<audio>, <p>, <s>, <w>, <lang> | Stripped | — |
Next steps
Breaks
Explicit pauses with break tags
Speed
The global speed parameter
Spell tags
Character-by-character pronunciation
Pronunciation & IPA
Fix how specific words are spoken