Text Normalization & Spelling

KugelAudio provides text processing features to ensure your text is spoken naturally. This includes automatic normalization of numbers, dates, and currencies, as well as the ability to spell out text letter by letter.

Text Normalization

Text normalization converts numbers, dates, times, and other non-verbal text into spoken words:

“I have 3 apples” → “I have three apples”
“The meeting is at 2:30 PM” → “The meeting is at two thirty PM”
“€50.99” → “fifty euros and ninety-nine cents”

Enable normalization by setting normalize=True (Python) or normalize: true (JavaScript):

Python
JavaScript

# With explicit language (recommended - fastest)
audio = client.tts.generate(
    text="I bought 3 items for €50.99 on 01/15/2024.",
    normalize=True,
    language="en",
)

# With auto-detection (adds ~150ms latency)
audio = client.tts.generate(
    text="Ich habe 3 Artikel für 50,99€ gekauft.",
    normalize=True,
    # language not specified - will auto-detect
)

// With explicit language (recommended - fastest)
const audio = await client.tts.generate({
  text: 'I bought 3 items for €50.99 on 01/15/2024.',
  normalize: true,
  language: 'en',
});

// With auto-detection (adds ~150ms latency)
const audio = await client.tts.generate({
  text: 'Ich habe 3 Artikel für 50,99€ gekauft.',
  normalize: true,
  // language not specified - will auto-detect
});

Using normalize without specifying language adds approximately 150ms latency for language auto-detection. For best performance in latency-sensitive applications, always specify the language parameter.

Supported Languages

Code	Language	Code	Language
`de`	German	`nl`	Dutch
`en`	English	`pl`	Polish
`fr`	French	`sv`	Swedish
`es`	Spanish	`da`	Danish
`it`	Italian	`no`	Norwegian
`pt`	Portuguese	`fi`	Finnish
`cs`	Czech	`hu`	Hungarian
`ro`	Romanian	`el`	Greek
`uk`	Ukrainian	`bg`	Bulgarian
`tr`	Turkish	`vi`	Vietnamese
`ar`	Arabic	`hi`	Hindi
`zh`	Chinese	`ja`	Japanese
`ko`	Korean

Spell Tags

Use <spell> tags to spell out text letter by letter. This is useful for email addresses, codes, acronyms, or any text that should be pronounced character by character.

Spell tags require normalize to be enabled.

Python
JavaScript

# Spell out an email address
audio = client.tts.generate(
    text="Contact me at <spell>[email protected]</spell>",
    normalize=True,
    language="en",
)
# Output: "Contact me at K, A, J, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"

# Spell out an acronym
audio = client.tts.generate(
    text="The <spell>API</spell> is easy to use.",
    normalize=True,
    language="en",
)
# Output: "The A, P, I is easy to use."

# German example with language-specific translations
audio = client.tts.generate(
    text="Meine E-Mail ist <spell>[email protected]</spell>",
    normalize=True,
    language="de",
)
# Output: "Meine E-Mail ist T, E, S, T, ät, B, E, I, S, P, I, E, L, Punkt, D, E"

// Spell out an email address
const audio = await client.tts.generate({
  text: 'Contact me at <spell>[email protected]</spell>',
  normalize: true,
  language: 'en',
});
// Output: "Contact me at K, A, J, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"

// Spell out an acronym
const audio2 = await client.tts.generate({
  text: 'The <spell>API</spell> is easy to use.',
  normalize: true,
  language: 'en',
});
// Output: "The A, P, I is easy to use."

// German example with language-specific translations
const audio3 = await client.tts.generate({
  text: 'Meine E-Mail ist <spell>[email protected]</spell>',
  normalize: true,
  language: 'de',
});
// Output: "Meine E-Mail ist T, E, S, T, ät, B, E, I, S, P, I, E, L, Punkt, D, E"

Language-Specific Character Translations

Special characters within <spell> tags are translated based on the language:

Character	English	German	French	Spanish
`@`	at	ät	arobase	arroba
`.`	dot	Punkt	point	punto
`-`	dash	Strich	tiret	guión
`_`	underscore	Unterstrich	underscore	guión bajo

Spell Tags with Streaming

Spell tags work seamlessly with streaming. When streaming text token-by-token (e.g., from an LLM), tags that span multiple chunks are automatically handled:

Python
JavaScript

async with client.tts.streaming_session(
    voice_id=123,
    normalize=True,
    language="en",
) as session:
    # Even if the tag is split across tokens, it works correctly
    async for chunk in session.send("My code is <spell>"):
        play_audio(chunk.audio)
    async for chunk in session.send("ABC123</spell>"):
        play_audio(chunk.audio)
    async for chunk in session.flush():
        play_audio(chunk.audio)

await client.tts.stream(
  {
    text: 'My verification code is <spell>ABC-123-XYZ</spell>.',
    normalize: true,
    language: 'en',
  },
  {
    onChunk: (chunk) => playAudio(chunk.audio),
  }
);

Streaming Safety: The system buffers text until the closing </spell> tag arrives before generating audio. If the stream ends unexpectedly, incomplete tags are auto-closed so the content still gets spelled out.

Model recommendation: For clearer letter-by-letter pronunciation, use kugel-1 instead of kugel-1-turbo.

Using Spell Tags with LLMs

When integrating with language models, add instructions to your system prompt so the LLM wraps appropriate text in spell tags:

SYSTEM_PROMPT = """You are a helpful assistant. When you need to spell out text 
(like email addresses, codes, or acronyms), wrap it in <spell> tags.

Examples:
- "My email is <spell>[email protected]</spell>"
- "The code is <spell>ABC123</spell>"
- "That stands for <spell>API</spell>, Application Programming Interface"
"""

For more details, see the LLM Integration guide.

Next Steps

Generate Speech

Basic speech generation

Streaming

Real-time audio streaming

LLM Integration

Integrate with GPT-4, Claude, and more

Getting Started

Speech Generation

Voices

Integrations

SDK Reference

Text Normalization & Spelling