Prompting Guide

KugelAudio generates speech directly from text. There is no voice direction layer — instead you shape the output by how you write the input. This page covers every mechanism available to control pronunciation, pacing, and emphasis.

Supported Tags at a Glance

Tag	Purpose	Requires `normalize`
`<spell>text</spell>`	Spell out characters one by one	Yes
`<prosody rate="slow\|medium\|fast\|0.8–1.2">text</prosody>`	Adjust speed of a text span	No

These are the only tags processed. Everything else is stripped before synthesis — see Unsupported Tags below.

`<spell>` — Character-by-Character Pronunciation

Wrapping text in <spell> tags causes each character to be read out individually. Useful for email addresses, codes, acronyms, and serial numbers.

"Contact us at <spell>hello@kugelaudio.com</spell>"
→  "Contact us at H, E, L, L, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"

normalize: true must be enabled for spell tags to work. Special characters (@, ., -, _) are translated to language-specific spoken words.

Keep sentence-ending punctuation outside the tag. <spell>D8239014.</spell> reads the trailing period as the literal word “Dot” (or “Punkt” in German) and runs it into the next sentence. Write <spell>D8239014</spell>. instead.

Character translations by language

Character	English	German	French	Spanish
`@`	at	ät	arobase	arroba
`.`	dot	Punkt	point	punto
`-`	dash	Strich	tiret	guión
`_`	underscore	Unterstrich	underscore	guión bajo

Examples

Python
JavaScript
cURL

# Email address
audio = client.tts.generate(
    text="Email us at <spell>hello@kugelaudio.com</spell>",
    normalize=True,
    language="en",
)

# Verification code
audio = client.tts.generate(
    text="Your code is <spell>A4-B9-XZ</spell>",
    normalize=True,
    language="en",
)

# Acronym with context
audio = client.tts.generate(
    text="We use <spell>TTS</spell>, text-to-speech, for audio output.",
    normalize=True,
    language="en",
)

// Email address
const audio = await client.tts.generate({
  text: 'Email us at <spell>hello@kugelaudio.com</spell>',
  normalize: true,
  language: 'en',
});

// Verification code
const audio2 = await client.tts.generate({
  text: 'Your code is <spell>A4-B9-XZ</spell>',
  normalize: true,
  language: 'en',
});

curl -X POST https://api.kugelaudio.com/v1/tts/generate \
  -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your code is <spell>A4-B9-XZ</spell>",
    "normalize": true,
    "language": "en"
  }' --output output.pcm

For clearer letter-by-letter pronunciation use kugel-1 over kugel-1-turbo.

`<prosody rate>` — Inline Speed Control

Slow down or speed up a specific span of text without affecting the rest of the sentence. The tag is stripped before synthesis and the inner text is time-stretched after generation.

"Call us at <prosody rate="slow">0 30 12 34 56 78</prosody> during business hours."

Rate values

Value	Speed	Alias
`"slow"`	0.8× (20% slower)	—
`"medium"`	1.0× (normal)	—
`"fast"`	1.2× (20% faster)	—
`"0.8"` – `"1.2"`	exact multiplier	numeric

Values outside 0.8–1.2 are clamped. The speed request parameter sets a global default; <prosody rate> overrides it per-span.

Examples

Python
JavaScript
cURL

# Slow down a phone number
audio = client.tts.generate(
    text='Call us at <prosody rate="slow">0 800 123 456</prosody> any time.',
    language="de",
)

# Mix speeds in one sentence
audio = client.tts.generate(
    text='<prosody rate="fast">Limited time offer!</prosody> '
         'Your confirmation code is <prosody rate="slow">X7-K2-9P</prosody>.',
    normalize=True,
    language="en",
)

# Numeric rate
audio = client.tts.generate(
    text='Details: <prosody rate="0.85">Article number 4 dash 0 0 7.</prosody>',
    language="en",
)

// Slow down a phone number
const audio = await client.tts.generate({
  text: 'Call us at <prosody rate="slow">0 800 123 456</prosody> any time.',
  language: 'de',
});

// Mix speeds in one sentence
const audio2 = await client.tts.generate({
  text: '<prosody rate="fast">Limited time offer!</prosody> '
      + 'Your confirmation code is <prosody rate="slow">X7-K2-9P</prosody>.',
  normalize: true,
  language: 'en',
});

curl -X POST https://api.kugelaudio.com/v1/tts/generate \
  -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Call us at <prosody rate=\"slow\">0 800 123 456</prosody> any time.",
    "language": "de"
  }' --output output.pcm

Global Speed Parameter

The speed request parameter applies a uniform speed to the entire synthesis. Use it when you want all output at a consistent rate.

audio = client.tts.generate(
    text="This entire sentence is read 20% faster.",
    speed=1.2,
)

`speed`	Effect	Typical use
`0.8`	20% slower	Dictation, phone numbers, legal disclaimers
`1.0`	Normal (default)	General purpose
`1.2`	20% faster	Notifications, fast-paced UI feedback

<prosody rate> tags take precedence over the global speed for their spans.

Text Normalization

When normalize: true, numbers, dates, times, currencies, and units are converted to spoken words before synthesis.

Input	Spoken output (English)
`3 items`	three items
`€50.99`	fifty euros and ninety-nine cents
`01/15/2024`	January fifteenth twenty twenty-four
`2:30 PM`	two thirty PM
`100km/h`	one hundred kilometres per hour

Always set language explicitly when using normalization. Auto-detection can produce incorrect results for short texts or languages with shared vocabulary.

Punctuation as Pacing

The model respects natural punctuation cues — no special tags needed:

Technique	Effect
`,` comma	Brief pause between clauses
`.` period	Sentence-end pause, falling intonation
`…` ellipsis	Longer trailing pause
`—` em dash	Abrupt pause / interruption feel
`?` question mark	Rising intonation
`!` exclamation	Energetic delivery
`\n` newline	Paragraph-level pause (similar to period)

These are the recommended way to add natural rhythm. There is no <break> tag support.

LLM System Prompt Patterns

When an LLM generates text that feeds directly into TTS, add instructions so it uses supported tags correctly:

You are a voice assistant. Format your responses for text-to-speech output:

- For email addresses and codes, use <spell> tags:
    "Your code is <spell>ABC-123</spell>"
- For phone numbers or content that should be read slowly, use prosody tags:
    "Call <prosody rate="slow">0 800 555 1234</prosody>"
- Do NOT use markdown formatting (**, *, #, -, bullet points) — it will be read aloud literally.
- Do NOT use emoji.
- Keep sentences short. End with punctuation.
- Write numbers as digits when they should be normalized: "You have 3 messages."

Strip markdown from LLM output before passing it to TTS. Asterisks, hashes, and bullet characters are read literally by the model.

Unsupported Tags

KugelAudio processes <spell> and <prosody rate> only. All other tags are silently stripped — the inner text is kept but the tag itself has no effect.

Tag / Attribute	Status	Alternative
`<speak>` wrapper	Stripped	Omit — plain text is assumed
`<prosody pitch="...">`	Stripped	No pitch control available
`<prosody volume="...">`	Stripped	No volume control available
`<prosody duration="...">`	Stripped	Use `speed` parameter instead
`<emphasis>`	Stripped	Rephrase text for natural emphasis
`<break time="...">`	Stripped	Use punctuation (`.`, `,`, `…`)
`<say-as interpret-as="...">`	Stripped	Use `<spell>` for characters, normalization for numbers
`<sub alias="...">`	Stripped	Write the spoken form directly in the text
`<audio>`, `<p>`, `<s>`, `<w>`, `<lang>`	Stripped	—

Unknown tags are not validated at request time. Passing unsupported tags will not return an error — the tags are removed and the remaining text is synthesized. Test your output when migrating from a full-SSML provider like Google Cloud TTS, Amazon Polly, or Microsoft Azure.

Next Steps

Text Normalization

Numbers, dates, currencies, and spell tags in depth

Speed Control

Global speed parameter and prosody tag reference

LLM Integration

System prompt patterns for voice agents

Streaming

Real-time audio from token-by-token LLM output

Getting Started

Speech Generation

Voices

Integrations

Deployment

SDK Reference

Supported Tags at a Glance

`<spell>` — Character-by-Character Pronunciation

Character translations by language

Examples

`<prosody rate>` — Inline Speed Control

Rate values

Examples

Global Speed Parameter

Text Normalization

Punctuation as Pacing

LLM System Prompt Patterns

Unsupported Tags

Next Steps

Text Normalization

Speed Control

LLM Integration

Streaming

Getting Started

Speech Generation

Voices

Integrations

Deployment

SDK Reference

Documentation Index

​Supported Tags at a Glance

​<spell> — Character-by-Character Pronunciation

​Character translations by language

​Examples

​<prosody rate> — Inline Speed Control

​Rate values

​Examples

​Global Speed Parameter

​Text Normalization

​Punctuation as Pacing

​LLM System Prompt Patterns

​Unsupported Tags

​Next Steps

Text Normalization

Speed Control

LLM Integration

Streaming

Supported Tags at a Glance

`<spell>` — Character-by-Character Pronunciation

Character translations by language

Examples

`<prosody rate>` — Inline Speed Control

Rate values

Examples

Global Speed Parameter

Text Normalization

Punctuation as Pacing

LLM System Prompt Patterns

Unsupported Tags

Next Steps