Use this file to discover all available pages before exploring further.
KugelAudio generates speech directly from text. There is no voice direction layer — instead you shape the output by how you write the input. This page covers every mechanism available to control pronunciation, pacing, and emphasis.
Wrapping text in <spell> tags causes each character to be read out individually. Useful for email addresses, codes, acronyms, and serial numbers.
"Contact us at <spell>hello@kugelaudio.com</spell>"→ "Contact us at H, E, L, L, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"
normalize: true must be enabled for spell tags to work. Special characters (@, ., -, _) are translated to language-specific spoken words.
Keep sentence-ending punctuation outside the tag. <spell>D8239014.</spell> reads the trailing period as the literal word “Dot” (or “Punkt” in German) and runs it into the next sentence. Write <spell>D8239014</spell>. instead.
Slow down or speed up a specific span of text without affecting the rest of the sentence. The tag is stripped before synthesis and the inner text is time-stretched after generation.
"Call us at <prosody rate="slow">0 30 12 34 56 78</prosody> during business hours."
When normalize: true, numbers, dates, times, currencies, and units are converted to spoken words before synthesis.
Input
Spoken output (English)
3 items
three items
€50.99
fifty euros and ninety-nine cents
01/15/2024
January fifteenth twenty twenty-four
2:30 PM
two thirty PM
100km/h
one hundred kilometres per hour
Always set language explicitly when using normalization. Auto-detection can produce incorrect results for short texts or languages with shared vocabulary.
When an LLM generates text that feeds directly into TTS, add instructions so it uses supported tags correctly:
You are a voice assistant. Format your responses for text-to-speech output:- For email addresses and codes, use <spell> tags: "Your code is <spell>ABC-123</spell>"- For phone numbers or content that should be read slowly, use prosody tags: "Call <prosody rate="slow">0 800 555 1234</prosody>"- Do NOT use markdown formatting (**, *, #, -, bullet points) — it will be read aloud literally.- Do NOT use emoji.- Keep sentences short. End with punctuation.- Write numbers as digits when they should be normalized: "You have 3 messages."
Strip markdown from LLM output before passing it to TTS. Asterisks, hashes, and bullet characters are read literally by the model.
KugelAudio processes <spell> and <prosody rate> only. All other tags are silently stripped — the inner text is kept but the tag itself has no effect.
Tag / Attribute
Status
Alternative
<speak> wrapper
Stripped
Omit — plain text is assumed
<prosody pitch="...">
Stripped
No pitch control available
<prosody volume="...">
Stripped
No volume control available
<prosody duration="...">
Stripped
Use speed parameter instead
<emphasis>
Stripped
Rephrase text for natural emphasis
<break time="...">
Stripped
Use punctuation (., ,, …)
<say-as interpret-as="...">
Stripped
Use <spell> for characters, normalization for numbers
<sub alias="...">
Stripped
Write the spoken form directly in the text
<audio>, <p>, <s>, <w>, <lang>
Stripped
—
Unknown tags are not validated at request time. Passing unsupported tags will not return an error — the tags are removed and the remaining text is synthesized. Test your output when migrating from a full-SSML provider like Google Cloud TTS, Amazon Polly, or Microsoft Azure.