Wrapping text in <spell> tags causes each character to be read out
individually. Useful for email addresses, verification codes, acronyms, and
serial numbers.
"Contact us at <spell>hello@kugelaudio.com</spell>"
→ "Contact us at H, E, L, L, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"
normalize: true must be enabled for spell tags, and you should always set
language — special characters (@, ., -, _) are translated to
language-specific spoken words.
Character translations by language
| Character | English | German | French | Spanish |
|---|
@ | at | ät | arobase | arroba |
. | dot | Punkt | point | punto |
- | dash | Strich | tiret | guión |
_ | underscore | Unterstrich | underscore | guión bajo |
Letters are spelled with their phonetic names, digits with their spoken
names. Whitespace inside a spell block is read as the word “space” (or the
language’s equivalent).
Grouping
For long codes, add group="N" to insert a beat every N characters — the way
a human would read a code aloud:
"Your code is <spell group="2">A4B9XZ</spell>"
→ "A 4, B 9, X Z"
Grouping applies only when the spell content has no whitespace — if you’ve
already spaced the content yourself, it is read as written.
Examples
# Email address
audio = client.tts.generate(
text="Email us at <spell>hello@kugelaudio.com</spell>",
normalize=True,
language="en",
)
# Verification code, grouped in pairs
audio = client.tts.generate(
text='Your code is <spell group="2">A4B9XZ</spell>',
normalize=True,
language="en",
)
# Acronym with context
audio = client.tts.generate(
text="We use <spell>TTS</spell>, text-to-speech, for audio output.",
normalize=True,
language="en",
)
// Email address
const audio = await client.tts.generate({
text: 'Email us at <spell>hello@kugelaudio.com</spell>',
normalize: true,
language: 'en',
});
// Verification code, grouped in pairs
const audio2 = await client.tts.generate({
text: 'Your code is <spell group="2">A4B9XZ</spell>',
normalize: true,
language: 'en',
});
curl -X POST https://api.kugelaudio.com/v1/tts/generate \
-H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Your code is <spell group=\"2\">A4B9XZ</spell>",
"normalize": true,
"language": "en"
}' --output output.pcm
Pitfalls
Keep sentence-ending punctuation outside the tag.
<spell>D8239014.</spell> reads the trailing period as the literal word
“Dot” (or “Punkt” in German) and runs it into the next sentence. Write
<spell>D8239014</spell>. instead.
- No nesting. A
<spell> tag inside another spell block is read as
literal characters.
- No break tags inside spell blocks — use grouping for
pacing instead.
When streaming text token-by-token, spell tags that span multiple chunks are
handled automatically: the server buffers text until the closing </spell>
arrives before generating audio, and auto-closes incomplete tags if the
stream ends unexpectedly. See Streaming overview.
When spelling isn’t enough
If a brand name or domain term is pronounced wrong (rather than needing to
be spelled out), use a pronunciation dictionary
instead — it rewrites or IPA-annotates the word without changing your
request text.