Skip to main content
KugelAudio TTS is available as a self-contained Docker image that runs entirely on your own hardware. No audio data leaves your network — the container contacts KugelAudio’s license server only once on first start to activate and download the encrypted model weights.

Prerequisites

  • Docker Engine 24+ and Docker Compose v2+
  • NVIDIA Container Toolkit installed and configured
  • A supported NVIDIA GPU (A10G, A100, H100 or equivalent with ≥ 24 GB VRAM)
  • A valid self-hosted license key (contact hello@kugelaudio.com)
Verify GPU access is working before proceeding:
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

Quick Start

1. Create the environment file

cp .env.selfhosted.example .env.selfhosted
Open .env.selfhosted and fill in the two required values:
KUGEL_LICENSE_KEY=kgl_live_<your-key-here>
KUGEL_INSTANCE_ID=prod-dc1          # any unique string for this deployment

2. Start the container

docker compose -f docker-compose.selfhosted.yml up -d
On first start the container will:
  1. Activate your license key against the KugelAudio license server
  2. Download the encrypted model weights (~5 GB) and store them in the weights_cache volume
  3. Decrypt the weights into GPU memory and run warmup batches
  4. Begin serving requests on port 8000
Startup takes 2–4 minutes on first run (weight download). Subsequent restarts load from the local cache and are ready in ~90 seconds.

3. Verify it is healthy

curl http://localhost:8000/health
# {"status":"healthy"}

curl http://localhost:8000/v1/models

4. Generate speech

curl -X POST http://localhost:8000/v1/tts/generate \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from KugelAudio self-hosted!",
    "voice_id": "af_heart",
    "model_id": "kugel-1-turbo"
  }' \
  --output hello.pcm
The response is raw 16-bit signed PCM at 24 kHz (little-endian). Pass "format": "wav" in the request body to receive a WAV file instead.

Configuration

All configuration is done via environment variables. Set them in .env.selfhosted or pass them with docker run -e.

Required

VariableDescription
KUGEL_LICENSE_KEYLicense key provided by KugelAudio
KUGEL_INSTANCE_IDUnique identifier for this deployment (e.g. prod-eu-1)

Model selection

VariableDefaultDescription
KUGELAUDIO_DEPLOY_MODELS1.5bModel variant: 1.5b (faster, ~6 GB VRAM) or 7b (higher quality, ~18 GB VRAM)

Performance

VariableDefaultDescription
KUGELAUDIO_OPTIMIZATIONcontinuous_compiled_cudagraphOptimization level — leave at default for best throughput
KUGELAUDIO_DDPM_STEPS10Diffusion steps. Lower = faster but slightly lower quality (min 4)

Storage

VariableDefaultDescription
KUGEL_WEIGHTS_CACHE_DIR(set in compose file)Path inside the container where encrypted weights are cached. Mapped to the weights_cache Docker volume in the compose file — do not change unless you know what you are doing

Observability

VariableDefaultDescription
SENTRY_DSN(empty)Optional Sentry DSN for error reporting

API Compatibility

The self-hosted container exposes the same HTTP API as the KugelAudio cloud service.
EndpointDescription
GET /healthLiveness check
GET /readyReadiness check (returns 503 until warmup is complete)
GET /v1/modelsList available models
GET /v1/voicesList available voices
POST /v1/tts/generateGenerate speech (streaming or non-streaming)
POST /11labs/v1/text-to-speech/{voice_id}ElevenLabs-compatible endpoint
WS /ws/ttsWebSocket streaming
You can point any KugelAudio SDK at your self-hosted instance by overriding the base URL:
from kugelaudio import KugelAudio

client = KugelAudio(
    api_key="not-required-for-self-hosted",
    base_url="http://your-host:8000",
)

Voices

The self-hosted container has access to all voices your organisation is entitled to use — including both KugelAudio’s public voice library and any private voices you have created. Voices are fetched from the KugelAudio license server using your license key. The container does not need storage credentials or database access; audio reference files are delivered as short-lived pre-signed URLs. This means:
  • No Supabase or S3 credentials are required in the container.
  • Voice metadata and audio URLs are refreshed on demand.
  • Newly added or cloned voices are available immediately without restarting the container.

Listing voices

curl http://localhost:8000/v1/voices
Example response:
{
  "voices": [
    {
      "id": 1,
      "name": "Sarah",
      "category": "cloned",
      "sex": "female",
      "supported_languages": ["en"],
      "sample_url": "https://..."
    }
  ]
}

Using a voice

Pass the numeric id from the listing as voice_id in any synthesis request:
curl -X POST http://localhost:8000/v1/tts/generate \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello!", "voice_id": 1}' \
  --output out.pcm

Volumes

The compose file creates two named Docker volumes. Do not delete them.
VolumeMount pathContents
license_state/app/tts/.licenseActivation token and usage ledger — back this up; losing it forces re-activation
weights_cache/cache/weightsEncrypted model weights (~5 GB) — re-downloaded automatically if missing

Upgrading

Pull the latest image and recreate the container. Volumes are preserved.
docker compose -f docker-compose.selfhosted.yml pull
docker compose -f docker-compose.selfhosted.yml up -d

Troubleshooting

Container exits immediately

Check the logs:
docker compose -f docker-compose.selfhosted.yml logs tts
Common causes:
  • KUGEL_LICENSE_KEY not set — the container refuses to start without a valid key.
  • GPU not accessible — run the nvidia-smi Docker check from the prerequisites section.
  • License already active on another instance — each license key is tied to one KUGEL_INSTANCE_ID. Use a different ID or contact support.

Container is unhealthy after 3 minutes

The healthcheck allows 3 minutes for startup. If it is still unhealthy:
docker compose -f docker-compose.selfhosted.yml logs --tail 100 tts
Look for errors during model loading. The most common cause is insufficient GPU VRAM for the selected model variant.

Weights re-downloaded on every restart

The weights_cache volume must be mounted and writable. Check that the volume exists and is attached:
docker volume inspect weights_cache
docker compose -f docker-compose.selfhosted.yml ps

Port already in use

Change the host port in .env.selfhosted:
TTS_PORT=9000
Then restart the container.