Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kugelaudio.com/llms.txt

Use this file to discover all available pages before exploring further.

KugelAudio generates speech directly from text. There is no voice direction layer — instead you shape the output by how you write the input. This page covers every mechanism available to control pronunciation, pacing, and emphasis.

Supported Tags at a Glance

TagPurposeRequires normalize
<spell>text</spell>Spell out characters one by oneYes
<prosody rate="slow|medium|fast|0.8–1.2">text</prosody>Adjust speed of a text spanNo
These are the only tags processed. Everything else is stripped before synthesis — see Unsupported Tags below.

<spell> — Character-by-Character Pronunciation

Wrapping text in <spell> tags causes each character to be read out individually. Useful for email addresses, codes, acronyms, and serial numbers.
"Contact us at <spell>hello@kugelaudio.com</spell>"
→  "Contact us at H, E, L, L, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"
normalize: true must be enabled for spell tags to work. Special characters (@, ., -, _) are translated to language-specific spoken words.
Keep sentence-ending punctuation outside the tag. <spell>D8239014.</spell> reads the trailing period as the literal word “Dot” (or “Punkt” in German) and runs it into the next sentence. Write <spell>D8239014</spell>. instead.

Character translations by language

CharacterEnglishGermanFrenchSpanish
@atätarobasearroba
.dotPunktpointpunto
-dashStrichtiretguión
_underscoreUnterstrichunderscoreguión bajo

Examples

# Email address
audio = client.tts.generate(
    text="Email us at <spell>hello@kugelaudio.com</spell>",
    normalize=True,
    language="en",
)

# Verification code
audio = client.tts.generate(
    text="Your code is <spell>A4-B9-XZ</spell>",
    normalize=True,
    language="en",
)

# Acronym with context
audio = client.tts.generate(
    text="We use <spell>TTS</spell>, text-to-speech, for audio output.",
    normalize=True,
    language="en",
)
For clearer letter-by-letter pronunciation use kugel-1 over kugel-1-turbo.

<prosody rate> — Inline Speed Control

Slow down or speed up a specific span of text without affecting the rest of the sentence. The tag is stripped before synthesis and the inner text is time-stretched after generation.
"Call us at <prosody rate="slow">0 30 12 34 56 78</prosody> during business hours."

Rate values

ValueSpeedAlias
"slow"0.8× (20% slower)
"medium"1.0× (normal)
"fast"1.2× (20% faster)
"0.8""1.2"exact multipliernumeric
Values outside 0.8–1.2 are clamped. The speed request parameter sets a global default; <prosody rate> overrides it per-span.

Examples

# Slow down a phone number
audio = client.tts.generate(
    text='Call us at <prosody rate="slow">0 800 123 456</prosody> any time.',
    language="de",
)

# Mix speeds in one sentence
audio = client.tts.generate(
    text='<prosody rate="fast">Limited time offer!</prosody> '
         'Your confirmation code is <prosody rate="slow">X7-K2-9P</prosody>.',
    normalize=True,
    language="en",
)

# Numeric rate
audio = client.tts.generate(
    text='Details: <prosody rate="0.85">Article number 4 dash 0 0 7.</prosody>',
    language="en",
)

Global Speed Parameter

The speed request parameter applies a uniform speed to the entire synthesis. Use it when you want all output at a consistent rate.
audio = client.tts.generate(
    text="This entire sentence is read 20% faster.",
    speed=1.2,
)
speedEffectTypical use
0.820% slowerDictation, phone numbers, legal disclaimers
1.0Normal (default)General purpose
1.220% fasterNotifications, fast-paced UI feedback
<prosody rate> tags take precedence over the global speed for their spans.

Text Normalization

When normalize: true, numbers, dates, times, currencies, and units are converted to spoken words before synthesis.
InputSpoken output (English)
3 itemsthree items
€50.99fifty euros and ninety-nine cents
01/15/2024January fifteenth twenty twenty-four
2:30 PMtwo thirty PM
100km/hone hundred kilometres per hour
Always set language explicitly when using normalization. Auto-detection can produce incorrect results for short texts or languages with shared vocabulary.

Punctuation as Pacing

The model respects natural punctuation cues — no special tags needed:
TechniqueEffect
, commaBrief pause between clauses
. periodSentence-end pause, falling intonation
ellipsisLonger trailing pause
em dashAbrupt pause / interruption feel
? question markRising intonation
! exclamationEnergetic delivery
\n newlineParagraph-level pause (similar to period)
These are the recommended way to add natural rhythm. There is no <break> tag support.

LLM System Prompt Patterns

When an LLM generates text that feeds directly into TTS, add instructions so it uses supported tags correctly:
You are a voice assistant. Format your responses for text-to-speech output:

- For email addresses and codes, use <spell> tags:
    "Your code is <spell>ABC-123</spell>"
- For phone numbers or content that should be read slowly, use prosody tags:
    "Call <prosody rate="slow">0 800 555 1234</prosody>"
- Do NOT use markdown formatting (**, *, #, -, bullet points) — it will be read aloud literally.
- Do NOT use emoji.
- Keep sentences short. End with punctuation.
- Write numbers as digits when they should be normalized: "You have 3 messages."
Strip markdown from LLM output before passing it to TTS. Asterisks, hashes, and bullet characters are read literally by the model.

Unsupported Tags

KugelAudio processes <spell> and <prosody rate> only. All other tags are silently stripped — the inner text is kept but the tag itself has no effect.
Tag / AttributeStatusAlternative
<speak> wrapperStrippedOmit — plain text is assumed
<prosody pitch="...">StrippedNo pitch control available
<prosody volume="...">StrippedNo volume control available
<prosody duration="...">StrippedUse speed parameter instead
<emphasis>StrippedRephrase text for natural emphasis
<break time="...">StrippedUse punctuation (., ,, )
<say-as interpret-as="...">StrippedUse <spell> for characters, normalization for numbers
<sub alias="...">StrippedWrite the spoken form directly in the text
<audio>, <p>, <s>, <w>, <lang>Stripped
Unknown tags are not validated at request time. Passing unsupported tags will not return an error — the tags are removed and the remaining text is synthesized. Test your output when migrating from a full-SSML provider like Google Cloud TTS, Amazon Polly, or Microsoft Azure.

Next Steps

Text Normalization

Numbers, dates, currencies, and spell tags in depth

Speed Control

Global speed parameter and prosody tag reference

LLM Integration

System prompt patterns for voice agents

Streaming

Real-time audio from token-by-token LLM output