Generate Speech
Generate audio from text. Returns complete audio after generation.
Request Body
The text to convert to speech. Maximum length depends on the model.
model
string
default:"kugel-1-turbo"
The model to use. Options: kugel-1-turbo, kugel-1
The voice ID to use. If not specified, uses the default voice.
Classifier-free guidance scale. Range: 1.0-5.0. Higher values = more expressive.
Maximum tokens to generate. Limits output length.
Output sample rate in Hz. Options: 16000, 22050, 24000, 44100
Add speaker prefix for better voice consistency.
Response
Returns audio data with metadata.
{
"audio": "base64_encoded_pcm16_data",
"encoding": "pcm_s16le",
"sample_rate": 24000,
"samples": 48000,
"duration_ms": 2000,
"generation_ms": 150,
"rtf": 0.075
}
Example
curl -X POST "https://api.kugelaudio.com/v1/tts/generate" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, this is a test of the KugelAudio API.",
"model": "kugel-1-turbo",
"voice_id": 123,
"cfg_scale": 2.0
}'
Stream Speech (WebSocket)
Stream audio chunks as they’re generated for lower latency.
Connection
Connect with your API key:
wss://api.kugelaudio.com/ws/tts?api_key=YOUR_API_KEY
Request Message
Send a JSON message to start generation:
{
"text": "Hello, this is streaming audio.",
"model": "kugel-1-turbo",
"voice_id": 123,
"cfg_scale": 2.0
}
Response Messages
Audio Chunk
{
"audio": "base64_encoded_pcm16_data",
"encoding": "pcm_s16le",
"idx": 0,
"sample_rate": 24000,
"samples": 4800
}
Final Message
{
"final": true,
"dur_ms": 2000,
"gen_ms": 150,
"ttfa_ms": 120,
"rtf": 0.075,
"chunks": 10
}
Example
import asyncio
import websockets
import json
import base64
async def stream_tts():
uri = "wss://api.kugelaudio.com/ws/tts?api_key=YOUR_API_KEY"
async with websockets.connect(uri) as ws:
# Send request
await ws.send(json.dumps({
"text": "Hello, this is streaming audio.",
"model": "kugel-1-turbo",
}))
# Receive chunks
async for message in ws:
data = json.loads(message)
if "audio" in data:
audio_bytes = base64.b64decode(data["audio"])
print(f"Chunk {data['idx']}: {len(audio_bytes)} bytes")
if data.get("final"):
print(f"Complete: {data['dur_ms']}ms audio in {data['gen_ms']}ms")
break
asyncio.run(stream_tts())
Stream text input token-by-token for LLM integration.
Connection
wss://api.kugelaudio.com/ws/tts/stream?api_key=YOUR_API_KEY
Protocol
- Send config: Initial configuration message
- Send text: Text chunks as they arrive
- Send flush: Force generation of buffered text
- Send close: End the session
- Receive audio: Audio chunks as they’re generated
Messages
Config Message
{
"voice_id": 123,
"model": "kugel-1-turbo",
"cfg_scale": 2.0,
"sample_rate": 24000
}
Text Message
{
"text": "chunk of text"
}
Flush Message
Close Message
Example
import asyncio
import websockets
import json
async def stream_from_llm(llm_tokens):
uri = "wss://api.kugelaudio.com/ws/tts/stream?api_key=YOUR_API_KEY"
async with websockets.connect(uri) as ws:
# Send config
await ws.send(json.dumps({
"voice_id": 123,
"model": "kugel-1-turbo",
"cfg_scale": 2.0,
}))
# Stream tokens
for token in llm_tokens:
await ws.send(json.dumps({"text": token}))
# Check for audio (non-blocking)
try:
message = await asyncio.wait_for(ws.recv(), timeout=0.01)
data = json.loads(message)
if "audio" in data:
play_audio(data["audio"])
except asyncio.TimeoutError:
pass
# Flush and close
await ws.send(json.dumps({"flush": true}))
await ws.send(json.dumps({"close": true}))
# Receive remaining audio
async for message in ws:
data = json.loads(message)
if "audio" in data:
play_audio(data["audio"])
if data.get("session_closed"):
break
# Example usage
tokens = ["Hello, ", "this ", "is ", "streaming ", "from ", "an ", "LLM."]
asyncio.run(stream_from_llm(tokens))
Response Fields
Audio Response
| Field | Type | Description |
|---|
audio | string | Base64-encoded PCM16 audio data |
encoding | string | Audio encoding (always pcm_s16le) |
sample_rate | integer | Sample rate in Hz |
samples | integer | Total number of samples |
duration_ms | number | Audio duration in milliseconds |
generation_ms | number | Generation time in milliseconds |
rtf | number | Real-time factor (gen_time / audio_duration) |
Streaming Stats
| Field | Type | Description |
|---|
final | boolean | Indicates generation complete |
dur_ms | number | Total audio duration in ms |
gen_ms | number | Total generation time in ms |
ttfa_ms | number | Time to first audio in ms |
rtf | number | Real-time factor |
chunks | integer | Number of chunks generated |
Error Responses
Validation Error
{
"error": {
"code": "invalid_request",
"message": "Text exceeds maximum length",
"details": {
"max_length": 4096,
"provided_length": 5000
}
}
}
Voice Not Found
{
"error": {
"code": "not_found",
"message": "Voice not found",
"details": {
"voice_id": 999
}
}
}
Rate Limited
{
"error": {
"code": "rate_limited",
"message": "Too many requests",
"details": {
"retry_after": 60
}
}
}