Skip to main content
KugelAudio generates speech directly from text. There is no voice direction layer — instead you shape the output by how you write the input. This page covers every mechanism available to control pronunciation, pacing, and emphasis.

Supported Tags at a Glance

TagPurposeRequires normalize
<spell>text</spell>Spell out characters one by oneYes
<prosody rate="slow|medium|fast|0.8–1.2">text</prosody>Adjust speed of a text spanNo
These are the only tags processed. Everything else is stripped before synthesis — see Unsupported Tags below.

<spell> — Character-by-Character Pronunciation

Wrapping text in <spell> tags causes each character to be read out individually. Useful for email addresses, codes, acronyms, and serial numbers.
"Contact us at <spell>hello@kugelaudio.com</spell>"
→  "Contact us at H, E, L, L, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"
normalize: true must be enabled for spell tags to work. Special characters (@, ., -, _) are translated to language-specific spoken words.

Character translations by language

CharacterEnglishGermanFrenchSpanish
@atätarobasearroba
.dotPunktpointpunto
-dashStrichtiretguión
_underscoreUnterstrichunderscoreguión bajo

Examples

# Email address
audio = client.tts.generate(
    text="Email us at <spell>hello@kugelaudio.com</spell>",
    normalize=True,
    language="en",
)

# Verification code
audio = client.tts.generate(
    text="Your code is <spell>A4-B9-XZ</spell>",
    normalize=True,
    language="en",
)

# Acronym with context
audio = client.tts.generate(
    text="We use <spell>TTS</spell>, text-to-speech, for audio output.",
    normalize=True,
    language="en",
)
For clearer letter-by-letter pronunciation use kugel-1 over kugel-1-turbo.

<prosody rate> — Inline Speed Control

Slow down or speed up a specific span of text without affecting the rest of the sentence. The tag is stripped before synthesis and the inner text is time-stretched after generation.
"Call us at <prosody rate="slow">0 30 12 34 56 78</prosody> during business hours."

Rate values

ValueSpeedAlias
"slow"0.8× (20% slower)
"medium"1.0× (normal)
"fast"1.2× (20% faster)
"0.8""1.2"exact multipliernumeric
Values outside 0.8–1.2 are clamped. The speed request parameter sets a global default; <prosody rate> overrides it per-span.

Examples

# Slow down a phone number
audio = client.tts.generate(
    text='Call us at <prosody rate="slow">0 800 123 456</prosody> any time.',
    language="de",
)

# Mix speeds in one sentence
audio = client.tts.generate(
    text='<prosody rate="fast">Limited time offer!</prosody> '
         'Your confirmation code is <prosody rate="slow">X7-K2-9P</prosody>.',
    normalize=True,
    language="en",
)

# Numeric rate
audio = client.tts.generate(
    text='Details: <prosody rate="0.85">Article number 4 dash 0 0 7.</prosody>',
    language="en",
)

Global Speed Parameter

The speed request parameter applies a uniform speed to the entire synthesis. Use it when you want all output at a consistent rate.
audio = client.tts.generate(
    text="This entire sentence is read 20% faster.",
    speed=1.2,
)
speedEffectTypical use
0.820% slowerDictation, phone numbers, legal disclaimers
1.0Normal (default)General purpose
1.220% fasterNotifications, fast-paced UI feedback
<prosody rate> tags take precedence over the global speed for their spans.

Text Normalization

When normalize: true, numbers, dates, times, currencies, and units are converted to spoken words before synthesis.
InputSpoken output (English)
3 itemsthree items
€50.99fifty euros and ninety-nine cents
01/15/2024January fifteenth twenty twenty-four
2:30 PMtwo thirty PM
100km/hone hundred kilometres per hour
Always set language explicitly when using normalization. Auto-detection can produce incorrect results for short texts or languages with shared vocabulary.

Punctuation as Pacing

The model respects natural punctuation cues — no special tags needed:
TechniqueEffect
, commaBrief pause between clauses
. periodSentence-end pause, falling intonation
ellipsisLonger trailing pause
em dashAbrupt pause / interruption feel
? question markRising intonation
! exclamationEnergetic delivery
\n newlineParagraph-level pause (similar to period)
These are the recommended way to add natural rhythm. There is no <break> tag support.

LLM System Prompt Patterns

When an LLM generates text that feeds directly into TTS, add instructions so it uses supported tags correctly:
You are a voice assistant. Format your responses for text-to-speech output:

- For email addresses and codes, use <spell> tags:
    "Your code is <spell>ABC-123</spell>"
- For phone numbers or content that should be read slowly, use prosody tags:
    "Call <prosody rate="slow">0 800 555 1234</prosody>"
- Do NOT use markdown formatting (**, *, #, -, bullet points) — it will be read aloud literally.
- Do NOT use emoji.
- Keep sentences short. End with punctuation.
- Write numbers as digits when they should be normalized: "You have 3 messages."
Strip markdown from LLM output before passing it to TTS. Asterisks, hashes, and bullet characters are read literally by the model.

Unsupported Tags

KugelAudio processes <spell> and <prosody rate> only. All other tags are silently stripped — the inner text is kept but the tag itself has no effect.
Tag / AttributeStatusAlternative
<speak> wrapperStrippedOmit — plain text is assumed
<prosody pitch="...">StrippedNo pitch control available
<prosody volume="...">StrippedNo volume control available
<prosody duration="...">StrippedUse speed parameter instead
<emphasis>StrippedRephrase text for natural emphasis
<break time="...">StrippedUse punctuation (., ,, )
<say-as interpret-as="...">StrippedUse <spell> for characters, normalization for numbers
<sub alias="...">StrippedWrite the spoken form directly in the text
<audio>, <p>, <s>, <w>, <lang>Stripped
Unknown tags are not validated at request time. Passing unsupported tags will not return an error — the tags are removed and the remaining text is synthesized. Test your output when migrating from a full-SSML provider like Google Cloud TTS, Amazon Polly, or Microsoft Azure.

Next Steps