Control how text is processed before speech synthesis
KugelAudio provides text processing features to ensure your text is spoken naturally. This includes automatic normalization of numbers, dates, and currencies, as well as the ability to spell out text letter by letter.
Text normalization converts numbers, dates, times, and other non-verbal text into spoken words:
“I have 3 apples” → “I have three apples”
“The meeting is at 2:30 PM” → “The meeting is at two thirty PM”
“€50.99” → “fifty euros and ninety-nine cents”
Enable normalization by setting normalize=True (Python), normalize: true (JavaScript), or "normalize": true (JSON):
Python
JavaScript
Java
cURL
# With explicit language (recommended - fastest)audio = client.tts.generate( text="I bought 3 items for €50.99 on 01/15/2024.", normalize=True, language="en",)# With auto-detection (may cause incorrect normalizations)audio = client.tts.generate( text="Ich habe 3 Artikel für 50,99€ gekauft.", normalize=True, # language not specified - will auto-detect)
// With explicit language (recommended - fastest)const audio = await client.tts.generate({ text: 'I bought 3 items for €50.99 on 01/15/2024.', normalize: true, language: 'en',});// With auto-detection (may cause incorrect normalizations)const audio = await client.tts.generate({ text: 'Ich habe 3 Artikel für 50,99€ gekauft.', normalize: true, // language not specified - will auto-detect});
// With explicit language (recommended - fastest)AudioResponse audio = client.tts().generate( GenerateRequest.builder("I bought 3 items for €50.99 on 01/15/2024.") .normalize(true) .language("en") .build());// With auto-detection (may cause incorrect normalizations)AudioResponse audio2 = client.tts().generate( GenerateRequest.builder("Ich habe 3 Artikel für 50,99€ gekauft.") .normalize(true) // language not specified - will auto-detect .build());
# With explicit language (recommended - fastest)curl -X POST https://api.kugelaudio.com/v1/tts/generate \ -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "I bought 3 items for €50.99 on 01/15/2024.", "normalize": true, "language": "en" }' \ --output output.pcm# With auto-detection (may cause incorrect normalizations)curl -X POST https://api.kugelaudio.com/v1/tts/generate \ -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Ich habe 3 Artikel für 50,99€ gekauft.", "normalize": true }' \ --output output.pcm
Using normalize without specifying language may cause incorrect normalizations, especially for short texts or languages that share similar vocabulary. Always specify language when you know it.
Use <spell> tags to spell out text letter by letter. This is useful for email addresses, codes, acronyms, or any text that should be pronounced character by character.
Spell tags require normalize to be enabled.
Python
JavaScript
Java
cURL
# Spell out an email addressaudio = client.tts.generate( text="Contact me at <spell>kajo@kugelaudio.com</spell>", normalize=True, language="en",)# Output: "Contact me at K, A, J, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"# Spell out an acronymaudio = client.tts.generate( text="The <spell>API</spell> is easy to use.", normalize=True, language="en",)# Output: "The A, P, I is easy to use."# German example with language-specific translationsaudio = client.tts.generate( text="Meine E-Mail ist <spell>test@beispiel.de</spell>", normalize=True, language="de",)# Output: "Meine E-Mail ist T, E, S, T, ät, B, E, I, S, P, I, E, L, Punkt, D, E"
// Spell out an email addressconst audio = await client.tts.generate({ text: 'Contact me at <spell>kajo@kugelaudio.com</spell>', normalize: true, language: 'en',});// Output: "Contact me at K, A, J, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"// Spell out an acronymconst audio2 = await client.tts.generate({ text: 'The <spell>API</spell> is easy to use.', normalize: true, language: 'en',});// Output: "The A, P, I is easy to use."// German example with language-specific translationsconst audio3 = await client.tts.generate({ text: 'Meine E-Mail ist <spell>test@beispiel.de</spell>', normalize: true, language: 'de',});// Output: "Meine E-Mail ist T, E, S, T, ät, B, E, I, S, P, I, E, L, Punkt, D, E"
// Spell out an email addressAudioResponse audio = client.tts().generate( GenerateRequest.builder("Contact me at <spell>kajo@kugelaudio.com</spell>") .normalize(true) .language("en") .build());// Output: "Contact me at K, A, J, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"// Spell out an acronymAudioResponse audio2 = client.tts().generate( GenerateRequest.builder("The <spell>API</spell> is easy to use.") .normalize(true) .language("en") .build());// Output: "The A, P, I is easy to use."// German example with language-specific translationsAudioResponse audio3 = client.tts().generate( GenerateRequest.builder("Meine E-Mail ist <spell>test@beispiel.de</spell>") .normalize(true) .language("de") .build());// Output: "Meine E-Mail ist T, E, S, T, ät, B, E, I, S, P, I, E, L, Punkt, D, E"
# Spell out an email addresscurl -X POST https://api.kugelaudio.com/v1/tts/generate \ -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Contact me at <spell>kajo@kugelaudio.com</spell>", "normalize": true, "language": "en" }' \ --output output.pcm# Output: "Contact me at K, A, J, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"# Spell out an acronymcurl -X POST https://api.kugelaudio.com/v1/tts/generate \ -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "The <spell>API</spell> is easy to use.", "normalize": true, "language": "en" }' \ --output output.pcm# Output: "The A, P, I is easy to use."
Spell tags work seamlessly with streaming. When streaming text token-by-token (e.g., from an LLM), tags that span multiple chunks are automatically handled:
Python
JavaScript
Java
cURL
async with client.tts.streaming_session( voice_id=123, normalize=True, language="en",) as session: # Even if the tag is split across tokens, it works correctly async for chunk in session.send("My code is <spell>"): play_audio(chunk.audio) async for chunk in session.send("ABC123</spell>"): play_audio(chunk.audio) async for chunk in session.flush(): play_audio(chunk.audio)
import com.kugelaudio.sdk.StreamingSession;import com.kugelaudio.sdk.StreamConfig;try (StreamingSession session = client.tts().streamingSession( StreamConfig.builder() .normalize(true) .language("en") .build())) { // Even if the tag is split across sends, it works correctly session.send("My code is <spell>", false); session.send("ABC123</spell>", true);}
Streaming Safety: The system buffers text until the closing </spell> tag arrives before generating audio. If the stream ends unexpectedly, incomplete tags are auto-closed so the content still gets spelled out.
Model recommendation: For clearer letter-by-letter pronunciation, use kugel-1 instead of kugel-1-turbo.
When integrating with language models, add instructions to your system prompt so the LLM wraps appropriate text in spell tags:
SYSTEM_PROMPT = """You are a helpful assistant. When you need to spell out text (like email addresses, codes, or acronyms), wrap it in <spell> tags.Examples:- "My email is <spell>kajo@kugelaudio.com</spell>"- "The code is <spell>ABC123</spell>"- "That stands for <spell>API</spell>, Application Programming Interface""""