Text normalization converts numbers, dates, times, and other non-verbal text into spoken words:
- “I have 3 apples” → “I have three apples”
- “The meeting is at 2:30 PM” → “The meeting is at two thirty PM”
- “€50.99” → “fifty euros and ninety-nine cents”
# With explicit language (recommended - fastest)
audio = client.tts.generate(
text="I bought 3 items for €50.99 on 01/15/2024.",
normalize=True,
language="en", # Specify language for best performance
)
# With auto-detection (may cause incorrect normalizations)
audio = client.tts.generate(
text="Ich habe 3 Artikel für 50,99€ gekauft.",
normalize=True,
# language not specified - will auto-detect
)
Supported Languages
| Code | Language | Code | Language |
|---|
de | German | nl | Dutch |
en | English | pl | Polish |
fr | French | sv | Swedish |
es | Spanish | da | Danish |
it | Italian | no | Norwegian |
pt | Portuguese | fi | Finnish |
cs | Czech | hu | Hungarian |
ro | Romanian | el | Greek |
uk | Ukrainian | bg | Bulgarian |
tr | Turkish | vi | Vietnamese |
ar | Arabic | hi | Hindi |
zh | Chinese | ja | Japanese |
ko | Korean | | |
Using normalize=True without specifying language may cause incorrect normalizations, especially for short texts or languages that share similar vocabulary. Always specify language when you know it.
Use <spell> tags to spell out text letter by letter. This is useful for email addresses, codes, acronyms, or any text that should be pronounced character by character:
# Spell out an email address
audio = client.tts.generate(
text="Contact me at <spell>kajo@kugelaudio.com</spell>",
normalize=True,
language="en",
)
# Output: "Contact me at K, A, J, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"
# Spell out an acronym
audio = client.tts.generate(
text="The <spell>API</spell> is easy to use.",
normalize=True,
language="en",
)
# Output: "The A, P, I is easy to use."
# German example with language-specific translations
audio = client.tts.generate(
text="Meine E-Mail ist <spell>test@beispiel.de</spell>",
normalize=True,
language="de",
)
# Output: "Meine E-Mail ist T, E, S, T, ät, B, E, I, S, P, I, E, L, Punkt, D, E"
Spell tags also work with streaming:
# Streaming with spell tags - tags spanning chunks are handled automatically
async with client.tts.streaming_session(
voice_id=1071,
normalize=True,
language="en",
) as session:
# Even if the tag is split across tokens, it works correctly
async for chunk in session.send("My code is <spell>"):
play_audio(chunk.audio)
async for chunk in session.send("ABC123</spell>"):
play_audio(chunk.audio)
async for chunk in session.flush():
play_audio(chunk.audio)
Special Characters: Characters like @, ., - are translated to language-specific words.
For example, @ becomes “at” in English, “ät” in German, and “arobase” in French.
Model recommendation: use kugel-3 for the cleanest letter-by-letter pronunciation of spelled-out text.
Next steps
- Dictionaries — per-project pronunciation and replacement lists
- Generate Speech — generation parameters including
normalize and language