Spell tags - KugelAudio

Wrapping text in <spell> tags causes each character to be read out individually. Useful for email addresses, verification codes, acronyms, and serial numbers.

"Contact us at <spell>hello@kugelaudio.com</spell>"
→  "Contact us at H, E, L, L, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"

normalize: true must be enabled for spell tags, and you should always set language — special characters (@, ., -, _) are translated to language-specific spoken words.

Character translations by language

Character	English	German	French	Spanish
`@`	at	ät	arobase	arroba
`.`	dot	Punkt	point	punto
`-`	dash	Strich	tiret	guión
`_`	underscore	Unterstrich	underscore	guión bajo

Letters are spelled with their phonetic names, digits with their spoken names. Whitespace inside a spell block is read as the word “space” (or the language’s equivalent).

Grouping

For long codes, add group="N" to insert a beat every N characters — the way a human would read a code aloud:

"Your code is <spell group="2">A4B9XZ</spell>"
→  "A 4,  B 9,  X Z"

Grouping applies only when the spell content has no whitespace — if you’ve already spaced the content yourself, it is read as written.

Examples

Python
JavaScript
cURL

# Email address
audio = client.tts.generate(
    text="Email us at <spell>hello@kugelaudio.com</spell>",
    normalize=True,
    language="en",
)

# Verification code, grouped in pairs
audio = client.tts.generate(
    text='Your code is <spell group="2">A4B9XZ</spell>',
    normalize=True,
    language="en",
)

# Acronym with context
audio = client.tts.generate(
    text="We use <spell>TTS</spell>, text-to-speech, for audio output.",
    normalize=True,
    language="en",
)

// Email address
const audio = await client.tts.generate({
  text: 'Email us at <spell>hello@kugelaudio.com</spell>',
  normalize: true,
  language: 'en',
});

// Verification code, grouped in pairs
const audio2 = await client.tts.generate({
  text: 'Your code is <spell group="2">A4B9XZ</spell>',
  normalize: true,
  language: 'en',
});

curl -X POST https://api.kugelaudio.com/v1/tts/generate \
  -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your code is <spell group=\"2\">A4B9XZ</spell>",
    "normalize": true,
    "language": "en"
  }' --output output.pcm

Pitfalls

Keep sentence-ending punctuation outside the tag. <spell>D8239014.</spell> reads the trailing period as the literal word “Dot” (or “Punkt” in German) and runs it into the next sentence. Write <spell>D8239014</spell>. instead.

No nesting. A <spell> tag inside another spell block is read as literal characters.
No break tags inside spell blocks — use grouping for pacing instead.

Spell tags in streaming

When streaming text token-by-token, spell tags that span multiple chunks are handled automatically: the server buffers text until the closing </spell> arrives before generating audio, and auto-closes incomplete tags if the stream ends unexpectedly. See Streaming overview.

When spelling isn’t enough

If a brand name or domain term is pronounced wrong (rather than needing to be spelled out), use a pronunciation dictionary instead — it rewrites or IPA-annotates the word without changing your request text.

​Character translations by language

​Grouping

​Examples

​Pitfalls

​Spell tags in streaming

​When spelling isn’t enough

Character translations by language

Grouping

Examples

Pitfalls

Spell tags in streaming

When spelling isn’t enough