KugelAudio has a built-in Vapi custom TTS endpoint — no proxy server needed. Just point your Vapi assistant at our API and you’re done.
Setup
1. Get your KugelAudio API key and a voice ID
Custom TTS must be configured via the Vapi API — the Vapi dashboard doesn’t support custom voice providers yet.
Use PATCH /assistant/{id} to update an existing assistant, or include the voice field when creating a new one with POST /assistant:
{
"voice": {
"provider": "custom-voice",
"server": {
"url": "https://api.kugelaudio.com/vapi/synthesize?voice_id=YOUR_VOICE_ID",
"secret": "YOUR_KUGELAUDIO_API_KEY",
"timeoutSeconds": 30
}
}
}
secret — your KugelAudio API key. Vapi sends it as x-vapi-secret on every request; KugelAudio authenticates it automatically.
voice_id — the numeric voice ID from Step 1.
That’s it. No code, no extra server, no proxy.
To select a specific model, add &model_id=kugel-1-turbo (default) or &model_id=kugel-1 to the URL.
How it works
Vapi sends one POST per phrase to /vapi/synthesize:
{
"message": {
"type": "voice-request",
"text": "Hello, how can I help you today?",
"sampleRate": 24000
}
}
KugelAudio streams back raw PCM16 at the requested sample rate — exactly what Vapi expects.
| Parameter | Value |
|---|
| Format | Raw PCM (no WAV/container header) |
| Bit depth | 16-bit signed, little-endian |
| Channels | 1 (mono) |
| Sample rate | Matches message.sampleRate from Vapi’s request |
Custom server (optional)
If you need custom logic — language routing, per-user voices, SSML preprocessing — you can run a lightweight proxy using our JS SDK.
Prerequisites
- Node.js 18+
- KugelAudio API key
kugelaudio npm package (npm install kugelaudio)
1. Create the TTS endpoint
import express from 'express';
import { KugelAudio } from 'kugelaudio';
const app = express();
app.use(express.json());
// Pre-connect at startup — eliminates ~300-500ms cold start on the first phrase.
const client = await KugelAudio.create({
apiKey: process.env.KUGELAUDIO_API_KEY!,
});
app.post('/synthesize', (req, res) => {
const { text, sampleRate } = req.body.message;
res.setHeader('Content-Type', 'audio/pcm');
res.setHeader('Transfer-Encoding', 'chunked');
// toReadable() returns a Node.js Readable before any audio arrives,
// so pipe() can be called immediately without a race condition.
const readable = client.tts.toReadable({
text,
modelId: 'kugel-1-turbo',
sampleRate, // must match what Vapi requests (typically 16000 or 24000 Hz)
language: 'en',
});
readable.pipe(res);
});
app.listen(3000);
Set the voice override in your assistant configuration to point at your endpoint:
{
"voice": {
"provider": "custom-voice",
"server": {
"url": "https://your-server.example.com/synthesize",
"secret": "your-webhook-secret",
"timeoutSeconds": 30
}
}
}
Pre-connecting for lowest latency
KugelAudio.create() (or await client.connect()) pre-establishes the WebSocket connection at startup. Without this, the first phrase of every new server process incurs ~300-500ms extra latency while the WebSocket handshake completes.
// At app startup — WebSocket connects here
const client = await KugelAudio.create({ apiKey: process.env.KUGELAUDIO_API_KEY! });
// All subsequent toReadable() calls are fast (~100-150ms TTFA)
Because Vapi sends one HTTP request per phrase, each request reuses the pooled WebSocket connection automatically. This keeps latency consistent across all phrases, not just the first one.
Error handling
Handle synthesis errors to prevent Vapi from receiving a broken response:
app.post('/synthesize', (req, res) => {
const { text, sampleRate } = req.body.message;
res.setHeader('Content-Type', 'audio/pcm');
res.setHeader('Transfer-Encoding', 'chunked');
const readable = client.tts.toReadable({
text,
modelId: 'kugel-1-turbo',
sampleRate,
language: 'en',
});
readable.on('error', (err) => {
console.error('TTS synthesis error:', err);
if (!res.headersSent) {
res.status(500).end();
} else {
res.destroy(err);
}
});
readable.pipe(res);
});
Webhook authentication
Validate the x-vapi-secret header to ensure requests come from Vapi:
app.post('/synthesize', (req, res) => {
const secret = req.headers['x-vapi-secret'];
if (secret !== process.env.VAPI_WEBHOOK_SECRET) {
return res.status(401).json({ error: 'Unauthorized' });
}
// ... rest of handler
});
Why toReadable() instead of manual onChunk?
A common pattern users try is:
// ❌ Prone to race conditions — avoid this pattern
const readable = new Readable({ read() {} });
client.tts.stream(options, {
onChunk: (chunk) => readable.push(Buffer.from(chunk.audio, 'base64')),
onFinal: () => readable.push(null),
});
readable.pipe(res);
The problem: stream() is async, so the Readable object may already receive push(null) before you attach a pipe() listener — especially if the text is very short. toReadable() creates the stream synchronously and starts streaming in the background, guaranteeing the stream is ready before any data arrives.