Skip to main content

Generate Speech

Generate audio from text. Returns complete audio after generation.
POST

Request Body

text
string
required
The text to convert to speech. Maximum length depends on the model.
model
string
default:"kugel-1-turbo"
The model to use. Options: kugel-1-turbo, kugel-1
voice_id
integer
The voice ID to use. If not specified, uses the default voice.
cfg_scale
number
default:"2.0"
Classifier-free guidance scale. Range: 1.0-5.0. Higher values = more expressive.
max_new_tokens
integer
default:"2048"
Maximum tokens to generate. Limits output length.
sample_rate
integer
default:"24000"
Output sample rate in Hz. Options: 16000, 22050, 24000, 44100
speaker_prefix
boolean
default:"true"
Add speaker prefix for better voice consistency.

Response

Returns audio data with metadata.
{
  "audio": "base64_encoded_pcm16_data",
  "encoding": "pcm_s16le",
  "sample_rate": 24000,
  "samples": 48000,
  "duration_ms": 2000,
  "generation_ms": 150,
  "rtf": 0.075
}

Example

curl -X POST "https://api.kugelaudio.com/v1/tts/generate" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test of the KugelAudio API.",
    "model": "kugel-1-turbo",
    "voice_id": 123,
    "cfg_scale": 2.0
  }'

Stream Speech (WebSocket)

Stream audio chunks as they’re generated for lower latency.
WebSocket

Connection

Connect with your API key:
wss://api.kugelaudio.com/ws/tts?api_key=YOUR_API_KEY

Request Message

Send a JSON message to start generation:
{
  "text": "Hello, this is streaming audio.",
  "model": "kugel-1-turbo",
  "voice_id": 123,
  "cfg_scale": 2.0
}

Response Messages

Audio Chunk

{
  "audio": "base64_encoded_pcm16_data",
  "encoding": "pcm_s16le",
  "idx": 0,
  "sample_rate": 24000,
  "samples": 4800
}

Final Message

{
  "final": true,
  "dur_ms": 2000,
  "gen_ms": 150,
  "ttfa_ms": 120,
  "rtf": 0.075,
  "chunks": 10
}

Example

import asyncio
import websockets
import json
import base64

async def stream_tts():
    uri = "wss://api.kugelaudio.com/ws/tts?api_key=YOUR_API_KEY"
    
    async with websockets.connect(uri) as ws:
        # Send request
        await ws.send(json.dumps({
            "text": "Hello, this is streaming audio.",
            "model": "kugel-1-turbo",
        }))
        
        # Receive chunks
        async for message in ws:
            data = json.loads(message)
            
            if "audio" in data:
                audio_bytes = base64.b64decode(data["audio"])
                print(f"Chunk {data['idx']}: {len(audio_bytes)} bytes")
            
            if data.get("final"):
                print(f"Complete: {data['dur_ms']}ms audio in {data['gen_ms']}ms")
                break

asyncio.run(stream_tts())

Stream Input (WebSocket)

Stream text input token-by-token for LLM integration.
WebSocket

Connection

wss://api.kugelaudio.com/ws/tts/stream?api_key=YOUR_API_KEY

Protocol

  1. Send config: Initial configuration message
  2. Send text: Text chunks as they arrive
  3. Send flush: Force generation of buffered text
  4. Send close: End the session
  5. Receive audio: Audio chunks as they’re generated

Messages

Config Message

{
  "voice_id": 123,
  "model": "kugel-1-turbo",
  "cfg_scale": 2.0,
  "sample_rate": 24000
}

Text Message

{
  "text": "chunk of text"
}

Flush Message

{
  "flush": true
}

Close Message

{
  "close": true
}

Example

import asyncio
import websockets
import json

async def stream_from_llm(llm_tokens):
    uri = "wss://api.kugelaudio.com/ws/tts/stream?api_key=YOUR_API_KEY"
    
    async with websockets.connect(uri) as ws:
        # Send config
        await ws.send(json.dumps({
            "voice_id": 123,
            "model": "kugel-1-turbo",
            "cfg_scale": 2.0,
        }))
        
        # Stream tokens
        for token in llm_tokens:
            await ws.send(json.dumps({"text": token}))
            
            # Check for audio (non-blocking)
            try:
                message = await asyncio.wait_for(ws.recv(), timeout=0.01)
                data = json.loads(message)
                if "audio" in data:
                    play_audio(data["audio"])
            except asyncio.TimeoutError:
                pass
        
        # Flush and close
        await ws.send(json.dumps({"flush": true}))
        await ws.send(json.dumps({"close": true}))
        
        # Receive remaining audio
        async for message in ws:
            data = json.loads(message)
            if "audio" in data:
                play_audio(data["audio"])
            if data.get("session_closed"):
                break

# Example usage
tokens = ["Hello, ", "this ", "is ", "streaming ", "from ", "an ", "LLM."]
asyncio.run(stream_from_llm(tokens))

Response Fields

Audio Response

FieldTypeDescription
audiostringBase64-encoded PCM16 audio data
encodingstringAudio encoding (always pcm_s16le)
sample_rateintegerSample rate in Hz
samplesintegerTotal number of samples
duration_msnumberAudio duration in milliseconds
generation_msnumberGeneration time in milliseconds
rtfnumberReal-time factor (gen_time / audio_duration)

Streaming Stats

FieldTypeDescription
finalbooleanIndicates generation complete
dur_msnumberTotal audio duration in ms
gen_msnumberTotal generation time in ms
ttfa_msnumberTime to first audio in ms
rtfnumberReal-time factor
chunksintegerNumber of chunks generated

Error Responses

Validation Error

{
  "error": {
    "code": "invalid_request",
    "message": "Text exceeds maximum length",
    "details": {
      "max_length": 4096,
      "provided_length": 5000
    }
  }
}

Voice Not Found

{
  "error": {
    "code": "not_found",
    "message": "Voice not found",
    "details": {
      "voice_id": 999
    }
  }
}

Rate Limited

{
  "error": {
    "code": "rate_limited",
    "message": "Too many requests",
    "details": {
      "retry_after": 60
    }
  }
}