Prompting overview - KugelAudio

KugelAudio generates speech directly from text. There is no voice direction layer — you shape the output by how you write the input. This section covers every mechanism available to control pronunciation, pacing, and emphasis.

Supported controls at a glance

Control	Syntax	Page
Pauses	`<break time="300ms"/>`, `<break strength="medium"/>`, `<break/>`	Breaks
Speed	`speed` request parameter (`0.8`–`1.2`, whole request)	Speed
Spell out characters	`<spell>text</spell>`, `<spell group="2">…</spell>`	Spell tags
Custom pronunciation	Inline IPA — `/ˈkuːɡl̩/` — or pronunciation dictionaries	Pronunciation & IPA
Pacing & intonation	Plain punctuation — see below	this page

<break> and <spell> are the only tags processed in request text. Everything else — including SSML — is stripped before synthesis; see Unsupported tags.

Punctuation as pacing

The model respects natural punctuation cues — no special tags needed:

Technique	Effect
`,` comma	Brief pause between clauses
`.` period	Sentence-end pause, falling intonation
`…` ellipsis	Longer trailing pause
`—` em dash	Abrupt pause / interruption feel
`?` question mark	Rising intonation
`!` exclamation	Energetic delivery
`\n` newline	Paragraph-level pause (similar to period)

Punctuation is the recommended way to add natural rhythm; reach for <break> tags when you need an explicit silence of a specific length (e.g. before a verification code).

Writing tips

Strip markdown before TTS. Asterisks, hashes, and bullet characters are read literally by the model.
No emoji. They are read out or garbled.
Write numbers as digits when they should be normalized (“You have 3 messages”) and always set language — see Text processing.
Keep sentences short and end them with punctuation — this also helps the streaming chunker start generation earlier (why).
!, ALL-CAPS, and ?! are prosody cues — the model will deliver them energetically. Use deliberately.

LLM system prompt pattern

When an LLM generates text that feeds directly into TTS, add instructions so it uses the supported controls correctly:

You are a voice assistant. Format your responses for text-to-speech output:

- For email addresses and codes, use <spell> tags:
    "Your code is <spell>ABC-123</spell>"
- For a deliberate pause, use a break tag:
    "Your total is <break time="400ms"/> forty-two euros."
- Do NOT use markdown formatting (**, *, #, -, bullet points) — it will be read aloud literally.
- Do NOT use emoji.
- Do NOT use SSML tags other than <spell> and <break> — they are ignored.
- Keep sentences short. End with punctuation.
- Write numbers as digits when they should be normalized: "You have 3 messages."

For full voice-agent prompt design (turn-taking, error recovery, tool-call acknowledgements), see Voice Agent Prompting.

Unsupported tags

KugelAudio processes <spell>, <break>, and <prosody rate>. All other tags are silently stripped — the inner text is kept but the tag itself has no effect.

Tag / Attribute	Status	Alternative
`<speak>` wrapper	Stripped	Omit — plain text is assumed
`<prosody rate="...">`	Supported — per-span speed	See Speed
`<prosody pitch="...">`	Rejected (400)	No pitch control available
`<prosody volume="...">`	Rejected (400)	No volume control available
`<emphasis>`	Stripped	Rephrase text for natural emphasis
`<say-as interpret-as="...">`	Stripped	Use `<spell>` for characters, normalization for numbers
`<sub alias="...">`	Stripped	Write the spoken form directly, or use a dictionary
`<phoneme>`	Stripped	Write inline IPA between slashes (`/ˈkuːɡl̩/`)
`<audio>`, `<p>`, `<s>`, `<w>`, `<lang>`	Stripped	—

Unknown tags are not validated at request time. Passing unsupported tags will not return an error — the tags are removed and the remaining text is synthesized. Test your output when migrating from a full-SSML provider like Google Cloud TTS, Amazon Polly, or Microsoft Azure.

Next steps

Breaks

Explicit pauses with break tags

Speed

The global speed parameter

Spell tags

Character-by-character pronunciation

Pronunciation & IPA

Fix how specific words are spoken

​Supported controls at a glance

​Punctuation as pacing

​Writing tips

​LLM system prompt pattern

​Unsupported tags

​Next steps

Breaks

Speed

Spell tags

Pronunciation & IPA

Supported controls at a glance

Punctuation as pacing

Writing tips

LLM system prompt pattern

Unsupported tags

Next steps