Self-Hosted Deployment

KugelAudio TTS is available as a self-contained Docker image that runs entirely on your own hardware. No audio data leaves your network — the container contacts KugelAudio’s license server only once on first start to activate and download the encrypted model weights.

Prerequisites

Docker Engine 24+ and Docker Compose v2+
NVIDIA Container Toolkit installed and configured
A supported NVIDIA GPU (A10G, A100, H100 or equivalent with ≥ 24 GB VRAM)
A valid self-hosted license key (contact hello@kugelaudio.com)

Verify GPU access is working before proceeding:

docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

Quick Start

1. Create the environment file

cp .env.selfhosted.example .env.selfhosted

Open .env.selfhosted and fill in the two required values:

KUGEL_LICENSE_KEY=kgl_live_<your-key-here>
KUGEL_INSTANCE_ID=prod-dc1          # any unique string for this deployment

2. Start the container

docker compose -f docker-compose.selfhosted.yml up -d

On first start the container will:

Activate your license key against the KugelAudio license server
Download the encrypted model weights (~5 GB) and store them in the weights_cache volume
Decrypt the weights into GPU memory and run warmup batches
Begin serving requests on port 8000

Startup takes 2–4 minutes on first run (weight download). Subsequent restarts load from the local cache and are ready in ~90 seconds.

3. Verify it is healthy

curl http://localhost:8000/health
# {"status":"healthy"}

curl http://localhost:8000/v1/models

4. Generate speech

curl -X POST http://localhost:8000/v1/tts/generate \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from KugelAudio self-hosted!",
    "voice_id": "af_heart",
    "model_id": "kugel-1-turbo"
  }' \
  --output hello.pcm

The response is raw 16-bit signed PCM at 24 kHz (little-endian). Pass "format": "wav" in the request body to receive a WAV file instead.

Configuration

All configuration is done via environment variables. Set them in .env.selfhosted or pass them with docker run -e.

Required

Variable	Description
`KUGEL_LICENSE_KEY`	License key provided by KugelAudio
`KUGEL_INSTANCE_ID`	Unique identifier for this deployment (e.g. `prod-eu-1`)

Model selection

Variable	Default	Description
`KUGELAUDIO_DEPLOY_MODELS`	`1.5b`	Model variant: `1.5b` (faster, ~6 GB VRAM) or `7b` (higher quality, ~18 GB VRAM)

Performance

Variable	Default	Description
`KUGELAUDIO_OPTIMIZATION`	`continuous_compiled_cudagraph`	Optimization level — leave at default for best throughput
`KUGELAUDIO_DDPM_STEPS`	`10`	Diffusion steps. Lower = faster but slightly lower quality (min `4`)

Storage

Variable	Default	Description
`KUGEL_WEIGHTS_CACHE_DIR`	(set in compose file)	Path inside the container where encrypted weights are cached. Mapped to the `weights_cache` Docker volume in the compose file — do not change unless you know what you are doing

Observability

Variable	Default	Description
`SENTRY_DSN`	(empty)	Optional Sentry DSN for error reporting

API Compatibility

The self-hosted container exposes the same HTTP API as the KugelAudio cloud service.

Endpoint	Description
`GET /health`	Liveness check
`GET /ready`	Readiness check (returns 503 until warmup is complete)
`GET /v1/models`	List available models
`GET /v1/voices`	List available voices
`POST /v1/tts/generate`	Generate speech (streaming or non-streaming)
`POST /11labs/v1/text-to-speech/{voice_id}`	ElevenLabs-compatible endpoint
`WS /ws/tts`	WebSocket streaming

You can point any KugelAudio SDK at your self-hosted instance by overriding the base URL:

Python
JavaScript
cURL

from kugelaudio import KugelAudio

client = KugelAudio(
    api_key="not-required-for-self-hosted",
    base_url="http://your-host:8000",
)

import { KugelAudio } from "kugelaudio";

const client = new KugelAudio({
  apiKey: "not-required-for-self-hosted",
  baseUrl: "http://your-host:8000",
});

# Point requests at your self-hosted instance
curl -X POST http://your-host:8000/v1/tts/generate \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from self-hosted KugelAudio!",
    "model_id": "kugel-1-turbo"
  }' \
  --output output.pcm

Voices

The self-hosted container has access to all voices your organisation is entitled to use — including both KugelAudio’s public voice library and any private voices you have created. Voices are fetched from the KugelAudio license server using your license key. The container does not need storage credentials or database access; audio reference files are delivered as short-lived pre-signed URLs. This means:

No Supabase or S3 credentials are required in the container.
Voice metadata and audio URLs are refreshed on demand.
Newly added or cloned voices are available immediately without restarting the container.

Listing voices

curl http://localhost:8000/v1/voices

Example response:

{
  "voices": [
    {
      "id": 1,
      "name": "Sarah",
      "category": "cloned",
      "sex": "female",
      "supported_languages": ["en"],
      "sample_url": "https://..."
    }
  ]
}

Using a voice

Pass the numeric id from the listing as voice_id in any synthesis request:

curl -X POST http://localhost:8000/v1/tts/generate \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello!", "voice_id": 1}' \
  --output out.pcm

Volumes

The compose file creates two named Docker volumes. Do not delete them.

Volume	Mount path	Contents
`license_state`	`/app/tts/.license`	Activation token and usage ledger — back this up; losing it forces re-activation
`weights_cache`	`/cache/weights`	Encrypted model weights (~5 GB) — re-downloaded automatically if missing

Upgrading

Pull the latest image and recreate the container. Volumes are preserved.

docker compose -f docker-compose.selfhosted.yml pull
docker compose -f docker-compose.selfhosted.yml up -d

Troubleshooting

Container exits immediately

Check the logs:

docker compose -f docker-compose.selfhosted.yml logs tts

Common causes:

KUGEL_LICENSE_KEY not set — the container refuses to start without a valid key.
GPU not accessible — run the nvidia-smi Docker check from the prerequisites section.
License already active on another instance — each license key is tied to one KUGEL_INSTANCE_ID. Use a different ID or contact support.

Container is unhealthy after 3 minutes

The healthcheck allows 3 minutes for startup. If it is still unhealthy:

docker compose -f docker-compose.selfhosted.yml logs --tail 100 tts

Look for errors during model loading. The most common cause is insufficient GPU VRAM for the selected model variant.

Weights re-downloaded on every restart

The weights_cache volume must be mounted and writable. Check that the volume exists and is attached:

docker volume inspect weights_cache
docker compose -f docker-compose.selfhosted.yml ps

Port already in use

Change the host port in .env.selfhosted:

TTS_PORT=9000

Then restart the container.

Getting Started

Speech Generation

Voices

Integrations

Deployment

SDK Reference

Self-Hosted Deployment

Prerequisites

Quick Start

1. Create the environment file

2. Start the container

3. Verify it is healthy

4. Generate speech

Configuration

Required

Model selection

Performance

Storage

Observability

API Compatibility

Voices

Listing voices

Using a voice

Volumes

Upgrading

Troubleshooting

Container exits immediately

Container is unhealthy after 3 minutes

Weights re-downloaded on every restart

Port already in use

Getting Started

Speech Generation

Voices

Integrations

Deployment

SDK Reference

Documentation Index

​Prerequisites

​Quick Start

​1. Create the environment file

​2. Start the container

​3. Verify it is healthy

​4. Generate speech

​Configuration

​Required

​Model selection

​Performance

​Storage

​Observability

​API Compatibility

​Voices

​Listing voices

​Using a voice

​Volumes

​Upgrading

​Troubleshooting

​Container exits immediately

​Container is unhealthy after 3 minutes

​Weights re-downloaded on every restart

​Port already in use

Prerequisites

Quick Start

1. Create the environment file

2. Start the container

3. Verify it is healthy

4. Generate speech

Configuration

Required

Model selection

Performance

Storage

Observability

API Compatibility

Voices

Listing voices

Using a voice

Volumes

Upgrading

Troubleshooting

Container exits immediately

Container is unhealthy after 3 minutes

Weights re-downloaded on every restart

Port already in use