Skip to main content
This guide shows how to use the KugelAudio API directly with cURL, Python, or JavaScript without our SDKs.

Base URL

https://api.kugelaudio.com

Authentication

Include your API key in requests:
# HTTP header
Authorization: Bearer YOUR_API_KEY

# Or as header
x-api-key: YOUR_API_KEY

# WebSocket query parameter
wss://api.kugelaudio.com/ws/tts?api_key=YOUR_API_KEY

REST API Examples

List Models

curl -s "https://api.kugelaudio.com/v1/models" \
  -H "x-api-key: YOUR_API_KEY"
Response:
{
  "models": [
    {
      "id": "kugel-1-turbo",
      "name": "Kugel 1 Turbo",
      "description": "Fast, high-quality TTS model optimized for low latency.",
      "parameters": "1.5B",
      "max_input_length": 5000,
      "sample_rate": 24000
    },
    {
      "id": "kugel-1",
      "name": "Kugel 1",
      "description": "Premium quality TTS model with exceptional voice quality.",
      "parameters": "7B",
      "max_input_length": 5000,
      "sample_rate": 24000
    }
  ]
}

List Voices

curl -s "https://api.kugelaudio.com/v1/voices?limit=10" \
  -H "x-api-key: YOUR_API_KEY"
Response:
{
  "voices": [
    {
      "id": 268,
      "name": "Hans",
      "description": "A calm, intellectual German voice.",
      "category": "narrative_story",
      "sex": "male",
      "age": "old",
      "supported_languages": ["de"],
      "is_public": true
    }
  ]
}

Get Single Voice

curl -s "https://api.kugelaudio.com/v1/voices/268" \
  -H "x-api-key: YOUR_API_KEY"

WebSocket TTS Streaming

The primary way to generate speech is via WebSocket for streaming audio.

Connection URL

wss://api.kugelaudio.com/ws/tts?api_key=YOUR_API_KEY

Request Format

Send a JSON message:
{
  "text": "Hello, this is a test.",
  "model_id": "kugel-1-turbo",
  "voice_id": 268,
  "cfg_scale": 2.0,
  "sample_rate": 24000,
  "normalize": true,
  "language": "en"
}

Response Format

Audio chunks:
{
  "audio": "BASE64_PCM16_DATA",
  "enc": "pcm_s16le",
  "idx": 0,
  "sr": 24000,
  "samples": 4800
}
Final message:
{
  "final": true,
  "chunks": 10,
  "total_samples": 48000,
  "dur_ms": 2000,
  "gen_ms": 150,
  "rtf": 0.075
}

Complete Example

import asyncio
import base64
import json
import wave
import websockets

API_KEY = "YOUR_API_KEY"
WS_URL = "wss://api.kugelaudio.com"

async def generate_speech(text: str, voice_id: int = 268):
    """Generate speech via WebSocket and save to WAV file."""
    ws_url = f"{WS_URL}/ws/tts?api_key={API_KEY}"
    audio_chunks = []
    
    async with websockets.connect(ws_url) as ws:
        # Send TTS request
        await ws.send(json.dumps({
            "text": text,
            "model_id": "kugel-1-turbo",
            "voice_id": voice_id,
            "cfg_scale": 2.0,
            "sample_rate": 24000,
        }))
        
        # Receive audio chunks
        async for msg in ws:
            data = json.loads(msg)
            
            if data.get("error"):
                raise Exception(data["error"])
            
            if data.get("audio"):
                audio_chunks.append(base64.b64decode(data["audio"]))
                print(f"Chunk {data['idx']}: {data['samples']} samples")
            
            if data.get("final"):
                print(f"Done: {data['dur_ms']:.0f}ms audio, generated in {data['gen_ms']:.0f}ms")
                break
    
    # Save to WAV
    with wave.open("output.wav", "wb") as wf:
        wf.setnchannels(1)
        wf.setsampwidth(2)  # 16-bit
        wf.setframerate(24000)
        wf.writeframes(b"".join(audio_chunks))
    
    print("Saved to output.wav")

# Run
asyncio.run(generate_speech("Hello, this is a test of the raw API."))

Request Parameters

ParameterTypeDefaultDescription
textstringrequiredText to synthesize
model_idstringkugel-1-turboModel ID
voice_idinteger-Voice ID from /v1/voices
cfg_scalenumber2.0Expressiveness (1.0-5.0)
sample_rateinteger24000Output sample rate
normalizebooleantrueNormalize numbers/dates
languagestring-ISO 639-1 code for normalization

Error Codes

WebSocket CodeMeaning
4001Authentication failed
4003Insufficient credits

Audio Format

  • Encoding: PCM 16-bit signed little-endian (pcm_s16le)
  • Channels: Mono (1 channel)
  • Sample rate: 24000 Hz (default)
  • Byte order: Little-endian