Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kugelaudio.com/llms.txt

Use this file to discover all available pages before exploring further.

This guide shows how to use the KugelAudio API directly with cURL, Python, or JavaScript without our SDKs.

Base URL

https://api.kugelaudio.com

Authentication

Include your API key in requests:
# HTTP header
Authorization: Bearer YOUR_API_KEY

# Or as header
x-api-key: YOUR_API_KEY

# WebSocket query parameter
wss://api.kugelaudio.com/ws/tts?api_key=YOUR_API_KEY

REST API Examples

List Models

curl -s "https://api.kugelaudio.com/v1/models" \
  -H "x-api-key: YOUR_API_KEY"
Response:
{
  "models": [
    {
      "id": "kugel-1-turbo",
      "name": "Kugel 1 Turbo",
      "description": "Fast, high-quality TTS model optimized for low latency.",
      "max_input_length": 5000,
      "sample_rate": 24000
    },
    {
      "id": "kugel-1",
      "name": "Kugel 1",
      "description": "Premium quality TTS model with exceptional voice quality.",
      "max_input_length": 5000,
      "sample_rate": 24000
    }
  ]
}

List Voices

curl -s "https://api.kugelaudio.com/v1/voices?limit=10" \
  -H "x-api-key: YOUR_API_KEY"
Response:
{
  "voices": [
    {
      "id": 268,
      "name": "Hans",
      "description": "A calm, intellectual German voice.",
      "category": "narrative_story",
      "sex": "male",
      "age": "old",
      "supported_languages": ["de"],
      "is_public": true
    }
  ],
  "total": 83,
  "limit": 10,
  "offset": 0
}

Get Single Voice

curl -s "https://api.kugelaudio.com/v1/voices/1071" \
  -H "x-api-key: YOUR_API_KEY"

WebSocket TTS Streaming

The primary way to generate speech is via WebSocket for streaming audio.

Connection URL

wss://api.kugelaudio.com/ws/tts?api_key=YOUR_API_KEY

Request Format

Send a JSON message:
{
  "text": "Hello, this is a test.",
  "model_id": "kugel-1-turbo",
  "voice_id": 1071,
  "cfg_scale": 2.0,
  "sample_rate": 24000,
  "normalize": true,
  "language": "en"
}

Response Format

Audio chunks:
{
  "audio": "BASE64_PCM16_DATA",
  "enc": "pcm_s16le",
  "idx": 0,
  "sr": 24000,
  "samples": 4800
}
Final message:
{
  "final": true,
  "chunks": 10,
  "total_samples": 48000,
  "dur_ms": 2000,
  "gen_ms": 150,
  "rtf": 0.075
}

Complete Example

import asyncio
import base64
import json
import wave
import websockets

API_KEY = "YOUR_API_KEY"
WS_URL = "wss://api.kugelaudio.com"

async def generate_speech(text: str, voice_id: int = 268):
    """Generate speech via WebSocket and save to WAV file."""
    ws_url = f"{WS_URL}/ws/tts?api_key={API_KEY}"
    audio_chunks = []
    
    async with websockets.connect(ws_url) as ws:
        # Send TTS request
        await ws.send(json.dumps({
            "text": text,
            "model_id": "kugel-1-turbo",
            "voice_id": voice_id,
            "cfg_scale": 2.0,
            "sample_rate": 24000,
        }))
        
        # Receive audio chunks
        async for msg in ws:
            data = json.loads(msg)
            
            if data.get("error"):
                raise Exception(data["error"])
            
            if data.get("audio"):
                audio_chunks.append(base64.b64decode(data["audio"]))
                print(f"Chunk {data['idx']}: {data['samples']} samples")
            
            if data.get("final"):
                print(f"Done: {data['dur_ms']:.0f}ms audio, generated in {data['gen_ms']:.0f}ms")
                break
    
    # Save to WAV
    with wave.open("output.wav", "wb") as wf:
        wf.setnchannels(1)
        wf.setsampwidth(2)  # 16-bit
        wf.setframerate(24000)
        wf.writeframes(b"".join(audio_chunks))
    
    print("Saved to output.wav")

# Run
asyncio.run(generate_speech("Hello, this is a test of the raw API."))

Request Parameters

ParameterTypeDefaultDescription
textstringrequiredText to synthesize (max 10,000 chars)
model_idstringkugel-1-turboModel ID
voice_idinteger-Voice ID from /v1/voices
cfg_scalenumber2.0Expressiveness (0.0-10.0)
temperaturenumber0.4Sampling variance (0.0-1.0). 0 = most stable, 1 = most variance. See guidance.
max_new_tokensinteger2048Maximum tokens to generate (1-8192)
sample_rateinteger24000Output sample rate. Options: 8000, 16000, 22050, 24000
normalizebooleantrueNormalize numbers/dates
languagestring-ISO 639-1 code for normalization
word_timestampsbooleanfalseEnable word-level timestamp alignment
speaker_prefixbooleantruePrepend internal speaker prefix
speednumber1.0Playback speed multiplier [0.8, 1.2]

Temperature guidance

temperature controls how much the sampler varies across regenerations of the same text. Lower values are closer to greedy decoding (stable, repeatable reads); higher values are more expressive but less consistent.
Use caseSuggested range
E-learning, IVR prompts, compliance reads0.00.3
General voiceover, conversational UX0.40.6 (default 0.4)
Expressive narration, ads, character voices0.71.0
The default of 0.4 tracks the TTS Studio natural preset. Lowered from 0.5 to reduce intermittent word-drop on short trailing sentences with kugel-2.

Error Codes

WebSocket CodeMeaning
4001Authentication failed
4003Insufficient credits
4029Rate limit exceeded

Audio Format

  • Encoding: PCM 16-bit signed little-endian (pcm_s16le)
  • Channels: Mono (1 channel)
  • Sample rate: 24000 Hz (default)
  • Byte order: Little-endian