Documentation Index
Fetch the complete documentation index at: https://docs.kugelaudio.com/llms.txt
Use this file to discover all available pages before exploring further.
KugelAudio TTS is available as a self-contained Docker image that runs entirely on your own hardware. No audio data leaves your network — the container contacts KugelAudio’s license server only once on first start to activate and download the encrypted model weights.
Prerequisites
- Docker Engine 24+ and Docker Compose v2+
- NVIDIA Container Toolkit installed and configured
- A supported NVIDIA GPU (A10G, A100, H100 or equivalent with ≥ 24 GB VRAM)
- A valid self-hosted license key (contact hello@kugelaudio.com)
Verify GPU access is working before proceeding:
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
Quick Start
1. Create the environment file
cp .env.selfhosted.example .env.selfhosted
Open .env.selfhosted and fill in the two required values:
KUGEL_LICENSE_KEY=kgl_live_<your-key-here>
KUGEL_INSTANCE_ID=prod-dc1 # any unique string for this deployment
2. Start the container
docker compose -f docker-compose.selfhosted.yml up -d
On first start the container will:
- Activate your license key against the KugelAudio license server
- Download the encrypted model weights (~5 GB) and store them in the
weights_cache volume
- Decrypt the weights into GPU memory and run warmup batches
- Begin serving requests on port
8000
Startup takes 2–4 minutes on first run (weight download). Subsequent restarts load from the local cache and are ready in ~90 seconds.
3. Verify it is healthy
curl http://localhost:8000/health
# {"status":"healthy"}
curl http://localhost:8000/v1/models
4. Generate speech
curl -X POST http://localhost:8000/v1/tts/generate \
-H "Content-Type: application/json" \
-d '{
"text": "Hello from KugelAudio self-hosted!",
"voice_id": "af_heart",
"model_id": "kugel-1-turbo"
}' \
--output hello.pcm
The response is raw 16-bit signed PCM at 24 kHz (little-endian). Pass "format": "wav" in the request body to receive a WAV file instead.
Configuration
All configuration is done via environment variables. Set them in .env.selfhosted or pass them with docker run -e.
Required
| Variable | Description |
|---|
KUGEL_LICENSE_KEY | License key provided by KugelAudio |
KUGEL_INSTANCE_ID | Unique identifier for this deployment (e.g. prod-eu-1) |
Model selection
| Variable | Default | Description |
|---|
KUGELAUDIO_DEPLOY_MODELS | 1.5b | Model variant: 1.5b (faster, ~6 GB VRAM) or 7b (higher quality, ~18 GB VRAM) |
| Variable | Default | Description |
|---|
KUGELAUDIO_OPTIMIZATION | continuous_compiled_cudagraph | Optimization level — leave at default for best throughput |
KUGELAUDIO_DDPM_STEPS | 10 | Diffusion steps. Lower = faster but slightly lower quality (min 4) |
Storage
| Variable | Default | Description |
|---|
KUGEL_WEIGHTS_CACHE_DIR | (set in compose file) | Path inside the container where encrypted weights are cached. Mapped to the weights_cache Docker volume in the compose file — do not change unless you know what you are doing |
Observability
| Variable | Default | Description |
|---|
SENTRY_DSN | (empty) | Optional Sentry DSN for error reporting |
API Compatibility
The self-hosted container exposes the same HTTP API as the KugelAudio cloud service.
| Endpoint | Description |
|---|
GET /health | Liveness check |
GET /ready | Readiness check (returns 503 until warmup is complete) |
GET /v1/models | List available models |
GET /v1/voices | List available voices |
POST /v1/tts/generate | Generate speech (streaming or non-streaming) |
POST /11labs/v1/text-to-speech/{voice_id} | ElevenLabs-compatible endpoint |
WS /ws/tts | WebSocket streaming |
You can point any KugelAudio SDK at your self-hosted instance by overriding the base URL:
from kugelaudio import KugelAudio
client = KugelAudio(
api_key="not-required-for-self-hosted",
base_url="http://your-host:8000",
)
import { KugelAudio } from "kugelaudio";
const client = new KugelAudio({
apiKey: "not-required-for-self-hosted",
baseUrl: "http://your-host:8000",
});
# Point requests at your self-hosted instance
curl -X POST http://your-host:8000/v1/tts/generate \
-H "Content-Type: application/json" \
-d '{
"text": "Hello from self-hosted KugelAudio!",
"model_id": "kugel-1-turbo"
}' \
--output output.pcm
Voices
The self-hosted container has access to all voices your organisation is entitled to use — including both KugelAudio’s public voice library and any private voices you have created.
Voices are fetched from the KugelAudio license server using your license key. The container does not need storage credentials or database access; audio reference files are delivered as short-lived pre-signed URLs. This means:
- No Supabase or S3 credentials are required in the container.
- Voice metadata and audio URLs are refreshed on demand.
- Newly added or cloned voices are available immediately without restarting the container.
Listing voices
curl http://localhost:8000/v1/voices
Example response:
{
"voices": [
{
"id": 1,
"name": "Sarah",
"category": "cloned",
"sex": "female",
"supported_languages": ["en"],
"sample_url": "https://..."
}
]
}
Using a voice
Pass the numeric id from the listing as voice_id in any synthesis request:
curl -X POST http://localhost:8000/v1/tts/generate \
-H "Content-Type: application/json" \
-d '{"text": "Hello!", "voice_id": 1}' \
--output out.pcm
Volumes
The compose file creates two named Docker volumes. Do not delete them.
| Volume | Mount path | Contents |
|---|
license_state | /app/tts/.license | Activation token and usage ledger — back this up; losing it forces re-activation |
weights_cache | /cache/weights | Encrypted model weights (~5 GB) — re-downloaded automatically if missing |
Upgrading
Pull the latest image and recreate the container. Volumes are preserved.
docker compose -f docker-compose.selfhosted.yml pull
docker compose -f docker-compose.selfhosted.yml up -d
Troubleshooting
Check the logs:
docker compose -f docker-compose.selfhosted.yml logs tts
Common causes:
KUGEL_LICENSE_KEY not set — the container refuses to start without a valid key.
- GPU not accessible — run the
nvidia-smi Docker check from the prerequisites section.
- License already active on another instance — each license key is tied to one
KUGEL_INSTANCE_ID. Use a different ID or contact support.
Container is unhealthy after 3 minutes
The healthcheck allows 3 minutes for startup. If it is still unhealthy:
docker compose -f docker-compose.selfhosted.yml logs --tail 100 tts
Look for errors during model loading. The most common cause is insufficient GPU VRAM for the selected model variant.
Weights re-downloaded on every restart
The weights_cache volume must be mounted and writable. Check that the volume exists and is attached:
docker volume inspect weights_cache
docker compose -f docker-compose.selfhosted.yml ps
Port already in use
Change the host port in .env.selfhosted:
Then restart the container.