Prerequisites
- Docker Engine 24+ and Docker Compose v2+
- NVIDIA Container Toolkit installed and configured
- A supported NVIDIA GPU (A10G, A100, H100 or equivalent with ≥ 24 GB VRAM)
- A valid self-hosted license key (contact hello@kugelaudio.com)
Quick Start
1. Create the environment file
.env.selfhosted and fill in the two required values:
2. Start the container
- Activate your license key against the KugelAudio license server
- Download the encrypted model weights (~5 GB) and store them in the
weights_cachevolume - Decrypt the weights into GPU memory and run warmup batches
- Begin serving requests on port
8000
3. Verify it is healthy
4. Generate speech
"format": "wav" in the request body to receive a WAV file instead.
Configuration
All configuration is done via environment variables. Set them in.env.selfhosted or pass them with docker run -e.
Required
| Variable | Description |
|---|---|
KUGEL_LICENSE_KEY | License key provided by KugelAudio |
KUGEL_INSTANCE_ID | Unique identifier for this deployment (e.g. prod-eu-1) |
Model selection
| Variable | Default | Description |
|---|---|---|
KUGELAUDIO_DEPLOY_MODELS | 1.5b | Model variant: 1.5b (faster, ~6 GB VRAM) or 7b (higher quality, ~18 GB VRAM) |
Performance
| Variable | Default | Description |
|---|---|---|
KUGELAUDIO_OPTIMIZATION | continuous_compiled_cudagraph | Optimization level — leave at default for best throughput |
KUGELAUDIO_DDPM_STEPS | 10 | Diffusion steps. Lower = faster but slightly lower quality (min 4) |
Storage
| Variable | Default | Description |
|---|---|---|
KUGEL_WEIGHTS_CACHE_DIR | (set in compose file) | Path inside the container where encrypted weights are cached. Mapped to the weights_cache Docker volume in the compose file — do not change unless you know what you are doing |
Observability
| Variable | Default | Description |
|---|---|---|
SENTRY_DSN | (empty) | Optional Sentry DSN for error reporting |
API Compatibility
The self-hosted container exposes the same HTTP API as the KugelAudio cloud service.| Endpoint | Description |
|---|---|
GET /health | Liveness check |
GET /ready | Readiness check (returns 503 until warmup is complete) |
GET /v1/models | List available models |
GET /v1/voices | List available voices |
POST /v1/tts/generate | Generate speech (streaming or non-streaming) |
POST /11labs/v1/text-to-speech/{voice_id} | ElevenLabs-compatible endpoint |
WS /ws/tts | WebSocket streaming |
- Python
- JavaScript
- cURL
Voices
The self-hosted container has access to all voices your organisation is entitled to use — including both KugelAudio’s public voice library and any private voices you have created. Voices are fetched from the KugelAudio license server using your license key. The container does not need storage credentials or database access; audio reference files are delivered as short-lived pre-signed URLs. This means:- No Supabase or S3 credentials are required in the container.
- Voice metadata and audio URLs are refreshed on demand.
- Newly added or cloned voices are available immediately without restarting the container.
Listing voices
Using a voice
Pass the numericid from the listing as voice_id in any synthesis request:
Volumes
The compose file creates two named Docker volumes. Do not delete them.| Volume | Mount path | Contents |
|---|---|---|
license_state | /app/tts/.license | Activation token and usage ledger — back this up; losing it forces re-activation |
weights_cache | /cache/weights | Encrypted model weights (~5 GB) — re-downloaded automatically if missing |
Upgrading
Pull the latest image and recreate the container. Volumes are preserved.Troubleshooting
Container exits immediately
Check the logs:KUGEL_LICENSE_KEYnot set — the container refuses to start without a valid key.- GPU not accessible — run the
nvidia-smiDocker check from the prerequisites section. - License already active on another instance — each license key is tied to one
KUGEL_INSTANCE_ID. Use a different ID or contact support.
Container is unhealthy after 3 minutes
The healthcheck allows 3 minutes for startup. If it is still unhealthy:Weights re-downloaded on every restart
Theweights_cache volume must be mounted and writable. Check that the volume exists and is attached:
Port already in use
Change the host port in.env.selfhosted: