Skip to main content
Wrapping text in <spell> tags causes each character to be read out individually. Useful for email addresses, verification codes, acronyms, and serial numbers.
"Contact us at <spell>hello@kugelaudio.com</spell>"
→  "Contact us at H, E, L, L, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"
normalize: true must be enabled for spell tags, and you should always set language — special characters (@, ., -, _) are translated to language-specific spoken words.

Character translations by language

CharacterEnglishGermanFrenchSpanish
@atätarobasearroba
.dotPunktpointpunto
-dashStrichtiretguión
_underscoreUnterstrichunderscoreguión bajo
Letters are spelled with their phonetic names, digits with their spoken names. Whitespace inside a spell block is read as the word “space” (or the language’s equivalent).

Grouping

For long codes, add group="N" to insert a beat every N characters — the way a human would read a code aloud:
"Your code is <spell group="2">A4B9XZ</spell>"
→  "A 4,  B 9,  X Z"
Grouping applies only when the spell content has no whitespace — if you’ve already spaced the content yourself, it is read as written.

Examples

# Email address
audio = client.tts.generate(
    text="Email us at <spell>hello@kugelaudio.com</spell>",
    normalize=True,
    language="en",
)

# Verification code, grouped in pairs
audio = client.tts.generate(
    text='Your code is <spell group="2">A4B9XZ</spell>',
    normalize=True,
    language="en",
)

# Acronym with context
audio = client.tts.generate(
    text="We use <spell>TTS</spell>, text-to-speech, for audio output.",
    normalize=True,
    language="en",
)

Pitfalls

Keep sentence-ending punctuation outside the tag. <spell>D8239014.</spell> reads the trailing period as the literal word “Dot” (or “Punkt” in German) and runs it into the next sentence. Write <spell>D8239014</spell>. instead.
  • No nesting. A <spell> tag inside another spell block is read as literal characters.
  • No break tags inside spell blocks — use grouping for pacing instead.

Spell tags in streaming

When streaming text token-by-token, spell tags that span multiple chunks are handled automatically: the server buffers text until the closing </spell> arrives before generating audio, and auto-closes incomplete tags if the stream ends unexpectedly. See Streaming overview.

When spelling isn’t enough

If a brand name or domain term is pronounced wrong (rather than needing to be spelled out), use a pronunciation dictionary instead — it rewrites or IPA-annotates the word without changing your request text.