Skip to main content
KugelAudio has a built-in Vapi custom TTS endpoint — no proxy server needed. Just point your Vapi assistant at our API and you’re done.

Setup

1. Get your KugelAudio API key and a voice ID

2. Configure your Vapi assistant

Custom TTS must be configured via the Vapi API — the Vapi dashboard doesn’t support custom voice providers yet. Use PATCH /assistant/{id} to update an existing assistant, or include the voice field when creating a new one with POST /assistant:
{
  "voice": {
    "provider": "custom-voice",
    "server": {
      "url": "https://api.kugelaudio.com/vapi/synthesize?voice_id=YOUR_VOICE_ID",
      "secret": "YOUR_KUGELAUDIO_API_KEY",
      "timeoutSeconds": 30
    }
  }
}
  • secret — your KugelAudio API key. Vapi sends it as x-vapi-secret on every request; KugelAudio authenticates it automatically.
  • voice_id — the numeric voice ID from Step 1.
That’s it. No code, no extra server, no proxy.
To select a specific model, add &model_id=kugel-1-turbo (default) or &model_id=kugel-1 to the URL.

How it works

Vapi sends one POST per phrase to /vapi/synthesize:
{
  "message": {
    "type": "voice-request",
    "text": "Hello, how can I help you today?",
    "sampleRate": 24000
  }
}
KugelAudio streams back raw PCM16 at the requested sample rate — exactly what Vapi expects.

Audio format

ParameterValue
FormatRaw PCM (no WAV/container header)
Bit depth16-bit signed, little-endian
Channels1 (mono)
Sample rateMatches message.sampleRate from Vapi’s request

Custom server (optional)

If you need custom logic — language routing, per-user voices, SSML preprocessing — you can run a lightweight proxy using our JS SDK.

Prerequisites

  • Node.js 18+
  • KugelAudio API key
  • kugelaudio npm package (npm install kugelaudio)

1. Create the TTS endpoint

import express from 'express';
import { KugelAudio } from 'kugelaudio';

const app = express();
app.use(express.json());

// Pre-connect at startup — eliminates ~300-500ms cold start on the first phrase.
const client = await KugelAudio.create({
  apiKey: process.env.KUGELAUDIO_API_KEY!,
});

app.post('/synthesize', (req, res) => {
  const { text, sampleRate } = req.body.message;

  res.setHeader('Content-Type', 'audio/pcm');
  res.setHeader('Transfer-Encoding', 'chunked');

  // toReadable() returns a Node.js Readable before any audio arrives,
  // so pipe() can be called immediately without a race condition.
  const readable = client.tts.toReadable({
    text,
    modelId: 'kugel-1-turbo',
    sampleRate,   // must match what Vapi requests (typically 16000 or 24000 Hz)
    language: 'en',
  });

  readable.pipe(res);
});

app.listen(3000);

2. Configure Vapi

Set the voice override in your assistant configuration to point at your endpoint:
{
  "voice": {
    "provider": "custom-voice",
    "server": {
      "url": "https://your-server.example.com/synthesize",
      "secret": "your-webhook-secret",
      "timeoutSeconds": 30
    }
  }
}

Pre-connecting for lowest latency

KugelAudio.create() (or await client.connect()) pre-establishes the WebSocket connection at startup. Without this, the first phrase of every new server process incurs ~300-500ms extra latency while the WebSocket handshake completes.
// At app startup — WebSocket connects here
const client = await KugelAudio.create({ apiKey: process.env.KUGELAUDIO_API_KEY! });

// All subsequent toReadable() calls are fast (~100-150ms TTFA)
Because Vapi sends one HTTP request per phrase, each request reuses the pooled WebSocket connection automatically. This keeps latency consistent across all phrases, not just the first one.

Error handling

Handle synthesis errors to prevent Vapi from receiving a broken response:
app.post('/synthesize', (req, res) => {
  const { text, sampleRate } = req.body.message;

  res.setHeader('Content-Type', 'audio/pcm');
  res.setHeader('Transfer-Encoding', 'chunked');

  const readable = client.tts.toReadable({
    text,
    modelId: 'kugel-1-turbo',
    sampleRate,
    language: 'en',
  });

  readable.on('error', (err) => {
    console.error('TTS synthesis error:', err);
    if (!res.headersSent) {
      res.status(500).end();
    } else {
      res.destroy(err);
    }
  });

  readable.pipe(res);
});

Webhook authentication

Validate the x-vapi-secret header to ensure requests come from Vapi:
app.post('/synthesize', (req, res) => {
  const secret = req.headers['x-vapi-secret'];
  if (secret !== process.env.VAPI_WEBHOOK_SECRET) {
    return res.status(401).json({ error: 'Unauthorized' });
  }

  // ... rest of handler
});

Why toReadable() instead of manual onChunk?

A common pattern users try is:
// ❌ Prone to race conditions — avoid this pattern
const readable = new Readable({ read() {} });
client.tts.stream(options, {
  onChunk: (chunk) => readable.push(Buffer.from(chunk.audio, 'base64')),
  onFinal: () => readable.push(null),
});
readable.pipe(res);
The problem: stream() is async, so the Readable object may already receive push(null) before you attach a pipe() listener — especially if the text is very short. toReadable() creates the stream synchronously and starts streaming in the background, guaranteeing the stream is ready before any data arrives.