Text Normalization - KugelAudio

Text normalization converts numbers, dates, times, and other non-verbal text into spoken words:

“I have 3 apples” → “I have three apples”
“The meeting is at 2:30 PM” → “The meeting is at two thirty PM”
“€50.99” → “fifty euros and ninety-nine cents”

# With explicit language (recommended - fastest)
audio = client.tts.generate(
    text="I bought 3 items for €50.99 on 01/15/2024.",
    normalize=True,
    language="en",  # Specify language for best performance
)

# With auto-detection (may cause incorrect normalizations)
audio = client.tts.generate(
    text="Ich habe 3 Artikel für 50,99€ gekauft.",
    normalize=True,
    # language not specified - will auto-detect
)

Supported Languages

Code	Language	Code	Language
`de`	German	`nl`	Dutch
`en`	English	`pl`	Polish
`fr`	French	`sv`	Swedish
`es`	Spanish	`da`	Danish
`it`	Italian	`no`	Norwegian
`pt`	Portuguese	`fi`	Finnish
`cs`	Czech	`hu`	Hungarian
`ro`	Romanian	`el`	Greek
`uk`	Ukrainian	`bg`	Bulgarian
`tr`	Turkish	`vi`	Vietnamese
`ar`	Arabic	`hi`	Hindi
`zh`	Chinese	`ja`	Japanese
`ko`	Korean

Using normalize=True without specifying language may cause incorrect normalizations, especially for short texts or languages that share similar vocabulary. Always specify language when you know it.

Spell Tags

Use <spell> tags to spell out text letter by letter. This is useful for email addresses, codes, acronyms, or any text that should be pronounced character by character:

# Spell out an email address
audio = client.tts.generate(
    text="Contact me at <spell>kajo@kugelaudio.com</spell>",
    normalize=True,
    language="en",
)
# Output: "Contact me at K, A, J, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"

# Spell out an acronym
audio = client.tts.generate(
    text="The <spell>API</spell> is easy to use.",
    normalize=True,
    language="en",
)
# Output: "The A, P, I is easy to use."

# German example with language-specific translations
audio = client.tts.generate(
    text="Meine E-Mail ist <spell>test@beispiel.de</spell>",
    normalize=True,
    language="de",
)
# Output: "Meine E-Mail ist T, E, S, T, ät, B, E, I, S, P, I, E, L, Punkt, D, E"

Spell tags also work with streaming:

# Streaming with spell tags - tags spanning chunks are handled automatically
async with client.tts.streaming_session(
    voice_id=1071,
    normalize=True,
    language="en",
) as session:
    # Even if the tag is split across tokens, it works correctly
    async for chunk in session.send("My code is <spell>"):
        play_audio(chunk.audio)
    async for chunk in session.send("ABC123</spell>"):
        play_audio(chunk.audio)
    async for chunk in session.flush():
        play_audio(chunk.audio)

Special Characters: Characters like @, ., - are translated to language-specific words. For example, @ becomes “at” in English, “ät” in German, and “arobase” in French.

Model recommendation: use kugel-3 for the cleanest letter-by-letter pronunciation of spelled-out text.

Next steps

Dictionaries — per-project pronunciation and replacement lists
Generate Speech — generation parameters including normalize and language

​Supported Languages

​Spell Tags

​Next steps

Supported Languages

Spell Tags

Next steps