Self-hosted migration: monolithic image to compose stack

KugelAudio self-hosted deployments previously ran as a single monolithic container. That image is retired. The new bundle is a Docker Compose stack: ingress, normalizer, and tts-turbo start by default; tts-standard is available as an opt-in high-quality engine via the with-standard profile. The stack is defined in backend/docker-compose.selfhosted.yml. The migration is a one-time cutover; both stacks cannot share a host at the same time on the default ports.

What’s changing

Old	New
`kugelaudio/kugelaudio-tts-selfhosted:<version>`	`kugelaudio/ingress` + `kugelaudio/normalizer` + `kugelaudio/tts`
Single container, single process tree	Compose services orchestrated by Docker Compose
Ray Serve as the in-process router	gRPC between ingress and TTS engines
One `KUGEL_*` env block	Per-service env block, required values fail-fast
One model cache mount	Per-engine cache volumes plus a shared voice store
Restart by `docker restart <id>`	`docker compose -f <file> up -d / down`

The wire protocol on :8000 (HTTP and WebSocket) is unchanged. Existing SDK and client integrations continue to work without code changes.

Prerequisites

Docker Engine 24+ with the Compose plugin (docker compose version).
For the TTS engines: an NVIDIA GPU host with the NVIDIA Container Toolkit installed.
KUGEL_LICENSE_KEY and KUGEL_INSTANCE_ID issued by KugelAudio.
A Hugging Face access token (HF_TOKEN) with read access to the KugelAudio model repositories.

Migration steps

Back up customer state.
- Voice references uploaded to the old container’s voice store.
- Any local config or .env files mounted into the old container.
- License key and instance ID — required by both stacks.

Stop the old container.

docker stop kugelaudio-tts-selfhosted
docker rm   kugelaudio-tts-selfhosted

Fetch the new compose file. It lives in the repo at backend/docker-compose.selfhosted.yml. Copy it to a working directory on the host, e.g. /opt/kugelaudio/.
Resolve image tags. The compose file ships with :TBD-<service> placeholders. Replace each one with the published per-service tag you were given by KugelAudio support, for example:
```
image: kugelaudio/ingress:1.0.0
image: kugelaudio/normalizer:1.0.0
image: kugelaudio/tts:1.0.0
```
Create the .env file next to the compose file. Minimum:
```
KUGEL_LICENSE_KEY=<your license key>
KUGEL_INSTANCE_ID=<your instance id>
HF_TOKEN=<your hugging face token>
TTS_MASTER_API_KEY=<a strong random shared secret>
```
Optional knobs (license-server URL for support-directed staging/private deployments, Sentry DSN, neural TN, port overrides) are documented inline at the top of the compose file.
Restore the voice store. Create a named volume and copy the previous voice references into /data/voices inside it. Both TTS services mount this same volume read-write.
Bring up the stack.
```
docker compose -f docker-compose.selfhosted.yml up -d
docker compose -f docker-compose.selfhosted.yml ps
```
The ingress service waits on normalizer and tts-turbo to be healthy before accepting traffic, so the first start can take several minutes while the TTS engine downloads model weights. On multi-GPU hosts, add --profile with-standard to start the optional tts-standard engine.

How to verify

Run each of these against the freshly-started stack:

Ingress health:
```
curl -fsS http://localhost:8000/health
```
Expect a 200 response.
Per-service health:
```
docker compose -f docker-compose.selfhosted.yml ps
```
The default services should report healthy. If a TTS engine is still starting after 5 minutes, check docker compose logs tts-turbo — it is most likely still pulling weights.

End-to-end TTS request:

curl -fsS -X POST http://localhost:8000/v1/tts/generate \
     -H "Authorization: Bearer $TTS_MASTER_API_KEY" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d '{"text": "Hallo Welt", "model_id": "kugel-2-turbo", "voice_id": 1071, "language": "de"}' \
     --output /tmp/hello.pcm
file /tmp/hello.pcm

You should get a non-empty audio/pcm response. If the request 502s, check the ingress logs for the upstream error from tts-turbo.

Rollback

If the new stack misbehaves, the old monolithic image is still pullable from Docker Hub until the deprecation window closes. To roll back:

docker compose -f docker-compose.selfhosted.yml down
docker run -d --name kugelaudio-tts-selfhosted \
  --gpus all \
  -e KUGEL_LICENSE_KEY \
  -e KUGEL_INSTANCE_ID \
  -p 8000:8000 \
  kugelaudio/kugelaudio-tts-selfhosted:<your previous version>

Keep the previous-version tag handy in your runbook until the new stack has been verified through at least one production cycle.

Internal references

Teardown plan: .claude/plans/rayserve-teardown.md
Compose file: backend/docker-compose.selfhosted.yml
Per-service Dockerfiles: backend/ingress/Dockerfile, backend/normalizer/Dockerfile, backend/tts/Dockerfile

Documentation Index

​Self-hosted migration: monolithic image to compose stack

​What’s changing

​Prerequisites

​Migration steps

​How to verify

​Rollback

​Internal references