Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kugelaudio.com/llms.txt

Use this file to discover all available pages before exploring further.

Self-hosted migration: monolithic image to compose stack

KugelAudio self-hosted deployments previously ran as a single monolithic container. That image is retired. The new bundle is a Docker Compose stack: ingress, normalizer, and tts-turbo start by default; tts-standard is available as an opt-in high-quality engine via the with-standard profile. The stack is defined in backend/docker-compose.selfhosted.yml. The migration is a one-time cutover; both stacks cannot share a host at the same time on the default ports.

What’s changing

OldNew
kugelaudio/kugelaudio-tts-selfhosted:<version>kugelaudio/ingress + kugelaudio/normalizer + kugelaudio/tts
Single container, single process treeCompose services orchestrated by Docker Compose
Ray Serve as the in-process routergRPC between ingress and TTS engines
One KUGEL_* env blockPer-service env block, required values fail-fast
One model cache mountPer-engine cache volumes plus a shared voice store
Restart by docker restart <id>docker compose -f <file> up -d / down
The wire protocol on :8000 (HTTP and WebSocket) is unchanged. Existing SDK and client integrations continue to work without code changes.

Prerequisites

  • Docker Engine 24+ with the Compose plugin (docker compose version).
  • For the TTS engines: an NVIDIA GPU host with the NVIDIA Container Toolkit installed.
  • KUGEL_LICENSE_KEY and KUGEL_INSTANCE_ID issued by KugelAudio.
  • A Hugging Face access token (HF_TOKEN) with read access to the KugelAudio model repositories.

Migration steps

  1. Back up customer state.
    • Voice references uploaded to the old container’s voice store.
    • Any local config or .env files mounted into the old container.
    • License key and instance ID — required by both stacks.
  2. Stop the old container.
    docker stop kugelaudio-tts-selfhosted
    docker rm   kugelaudio-tts-selfhosted
    
  3. Fetch the new compose file. It lives in the repo at backend/docker-compose.selfhosted.yml. Copy it to a working directory on the host, e.g. /opt/kugelaudio/.
  4. Resolve image tags. The compose file ships with :TBD-<service> placeholders. Replace each one with the published per-service tag you were given by KugelAudio support, for example:
    image: kugelaudio/ingress:1.0.0
    image: kugelaudio/normalizer:1.0.0
    image: kugelaudio/tts:1.0.0
    
  5. Create the .env file next to the compose file. Minimum:
    KUGEL_LICENSE_KEY=<your license key>
    KUGEL_INSTANCE_ID=<your instance id>
    HF_TOKEN=<your hugging face token>
    TTS_MASTER_API_KEY=<a strong random shared secret>
    
    Optional knobs (license-server URL for support-directed staging/private deployments, Sentry DSN, neural TN, port overrides) are documented inline at the top of the compose file.
  6. Restore the voice store. Create a named volume and copy the previous voice references into /data/voices inside it. Both TTS services mount this same volume read-write.
  7. Bring up the stack.
    docker compose -f docker-compose.selfhosted.yml up -d
    docker compose -f docker-compose.selfhosted.yml ps
    
    The ingress service waits on normalizer and tts-turbo to be healthy before accepting traffic, so the first start can take several minutes while the TTS engine downloads model weights. On multi-GPU hosts, add --profile with-standard to start the optional tts-standard engine.

How to verify

Run each of these against the freshly-started stack:
  1. Ingress health:
    curl -fsS http://localhost:8000/health
    
    Expect a 200 response.
  2. Per-service health:
    docker compose -f docker-compose.selfhosted.yml ps
    
    The default services should report healthy. If a TTS engine is still starting after 5 minutes, check docker compose logs tts-turbo — it is most likely still pulling weights.
  3. End-to-end TTS request:
    curl -fsS -X POST http://localhost:8000/v1/tts/generate \
         -H "Authorization: Bearer $TTS_MASTER_API_KEY" \
         -H "Content-Type: application/json; charset=utf-8" \
         -d '{"text": "Hallo Welt", "model_id": "kugel-2-turbo", "voice_id": 1071, "language": "de"}' \
         --output /tmp/hello.pcm
    file /tmp/hello.pcm
    
    You should get a non-empty audio/pcm response. If the request 502s, check the ingress logs for the upstream error from tts-turbo.

Rollback

If the new stack misbehaves, the old monolithic image is still pullable from Docker Hub until the deprecation window closes. To roll back:
docker compose -f docker-compose.selfhosted.yml down
docker run -d --name kugelaudio-tts-selfhosted \
  --gpus all \
  -e KUGEL_LICENSE_KEY \
  -e KUGEL_INSTANCE_ID \
  -p 8000:8000 \
  kugelaudio/kugelaudio-tts-selfhosted:<your previous version>
Keep the previous-version tag handy in your runbook until the new stack has been verified through at least one production cycle.

Internal references

  • Teardown plan: .claude/plans/rayserve-teardown.md
  • Compose file: backend/docker-compose.selfhosted.yml
  • Per-service Dockerfiles: backend/ingress/Dockerfile, backend/normalizer/Dockerfile, backend/tts/Dockerfile