Basic Generation
Generate complete audio and receive it all at once:
const audio = await client.tts.generate({
text: 'Hello, this is a test of the KugelAudio text-to-speech system.',
modelId: 'kugel-3', // Canonical production model (see /models)
voiceId: 1071, // Optional: specific voice ID
cfgScale: 2.0, // Guidance scale (1.0-5.0)
temperature: undefined, // Sampling variance 0.0-1.0; omit for server default (~0.5)
maxNewTokens: 2048, // Maximum tokens to generate
sampleRate: 24000, // Output sample rate
normalize: true, // Enable text normalization (default)
language: 'en', // Language for normalization (see below)
wordTimestamps: false, // Request word-level timestamps (default: false)
speed: 1.0, // Playback speed 0.8-1.2 (pitch-preserving WSOLA)
});
// Audio properties
console.log(`Duration: ${audio.durationMs}ms`);
console.log(`Samples: ${audio.samples}`);
console.log(`Sample rate: ${audio.sampleRate} Hz`);
console.log(`Generation time: ${audio.generationMs}ms`);
console.log(`RTF: ${audio.rtf}`); // Real-time factor
// audio.audio is an ArrayBuffer with PCM16 data
Playing Audio in Browser
import { createWavBlob } from 'kugelaudio';
const audio = await client.tts.generate({
text: 'Hello, world!',
modelId: 'kugel-3',
});
// Create WAV blob for playback
const wavBlob = createWavBlob(audio.audio, audio.sampleRate);
const url = URL.createObjectURL(wavBlob);
// Play with Audio element
const audioElement = new Audio(url);
audioElement.play();
// Or with Web Audio API
const audioContext = new AudioContext();
const arrayBuffer = await wavBlob.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();
Streaming Audio
Receive audio chunks as they are generated for lower latency:
await client.tts.stream(
{
text: 'Hello, this is streaming audio.',
modelId: 'kugel-3',
},
{
onOpen: () => {
console.log('WebSocket connected');
},
onChunk: (chunk) => {
console.log(`Chunk ${chunk.index}: ${chunk.samples} samples`);
// chunk.audio is base64-encoded PCM16 data
playAudioChunk(chunk);
},
onFinal: (stats) => {
console.log(`Total duration: ${stats.durationMs}ms`);
console.log(`Generation time: ${stats.generationMs}ms`);
console.log(`RTF: ${stats.rtf}`);
},
onError: (error) => {
console.error('TTS error:', error);
},
onClose: () => {
console.log('WebSocket closed');
},
}
);
Streaming to a Node.js Readable (Vapi / HTTP endpoints)
For server-side integrations that expect a Node.js Readable stream — such as Vapi custom TTS endpoints or Express/Fastify handlers — use client.tts.toReadable() instead of wiring onChunk manually.
toReadable() avoids a common race-condition: the stream object is returned before any audio arrives, so you can safely pipe() or attach listeners immediately.
import express from 'express';
import { KugelAudio } from 'kugelaudio';
const app = express();
app.use(express.json());
// Pre-connect at startup for fastest TTFA
const client = await KugelAudio.create({ apiKey: process.env.KUGELAUDIO_API_KEY! });
// Vapi custom TTS endpoint
app.post('/synthesize', (req, res) => {
const { text, sampleRate } = req.body.message;
res.setHeader('Content-Type', 'audio/pcm');
res.setHeader('Transfer-Encoding', 'chunked');
const readable = client.tts.toReadable({
text,
modelId: 'kugel-3',
sampleRate, // honour the sample rate Vapi requests
language: 'en',
});
readable.pipe(res);
});
toReadable() is Node.js only. It requires the built-in stream module and will throw in browser environments. Use the callback-based stream() API for browser code.
Call await KugelAudio.create(...) (or await client.connect()) at application startup. This pre-establishes the WebSocket connection so that subsequent toReadable() calls skip the connection overhead and start streaming audio immediately (see Latency).
Processing Audio Chunks
import { base64ToArrayBuffer, decodePCM16 } from 'kugelaudio';
// In streaming callback:
onChunk: (chunk) => {
// Decode base64 to ArrayBuffer
const pcmBuffer = base64ToArrayBuffer(chunk.audio);
// Convert PCM16 to Float32 for Web Audio API
const float32Data = decodePCM16(chunk.audio);
// Play with Web Audio API
const audioBuffer = audioContext.createBuffer(1, float32Data.length, chunk.sampleRate);
audioBuffer.copyToChannel(float32Data, 0);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();
}
Word Timestamps
Request word-level time alignments alongside audio. Useful for subtitle synchronization, lip-sync, and barge-in handling.
With Generate
const audio = await client.tts.generate({
text: 'Hello, how are you today?',
modelId: 'kugel-3',
wordTimestamps: true,
});
// Access word timestamps from the response
for (const ts of audio.wordTimestamps) {
console.log(`${ts.word}: ${ts.startMs}ms - ${ts.endMs}ms (score: ${ts.score.toFixed(2)})`);
}
// Example output:
// Hello: 0ms - 320ms (score: 0.98)
// how: 350ms - 480ms (score: 0.95)
// are: 500ms - 580ms (score: 0.97)
// you: 600ms - 720ms (score: 0.96)
// today: 750ms - 1100ms (score: 0.94)
With Streaming
await client.tts.stream(
{
text: 'Hello, how are you today?',
modelId: 'kugel-3',
wordTimestamps: true,
},
{
onChunk: (chunk) => {
playAudio(chunk.audio);
},
onWordTimestamps: (timestamps) => {
for (const ts of timestamps) {
console.log(`${ts.word}: ${ts.startMs}-${ts.endMs}ms`);
}
},
onFinal: (stats) => {
console.log(`Done: ${stats.durationMs}ms`);
},
}
);
Word timestamps add no extra audio latency. They arrive shortly after the corresponding audio chunk — see Latency for typical numbers.
Models
List Available Models
const models = await client.models.list();
for (const model of models) {
console.log(`${model.id}: ${model.name}`);
console.log(` Description: ${model.description}`);
console.log(` Max Input: ${model.maxInputLength} characters`);
console.log(` Sample Rate: ${model.sampleRate} Hz`);
}
Utility Functions
base64ToArrayBuffer
Convert base64 string to ArrayBuffer:
import { base64ToArrayBuffer } from 'kugelaudio';
const buffer = base64ToArrayBuffer(chunk.audio);
decodePCM16
Convert base64 PCM16 to Float32Array for Web Audio API:
import { decodePCM16 } from 'kugelaudio';
const floatData = decodePCM16(chunk.audio);
createWavFile
Create a WAV file from PCM16 data:
import { createWavFile } from 'kugelaudio';
const wavBuffer = createWavFile(pcmArrayBuffer, 24000);
createWavBlob
Create a playable Blob from PCM16 data:
import { createWavBlob } from 'kugelaudio';
const blob = createWavBlob(pcmArrayBuffer, 24000);
const url = URL.createObjectURL(blob);
client.tts.toReadable (Node.js only)
Convert a TTS stream directly to a Node.js Readable for use in HTTP handlers, pipelines, and server-side integrations. See Streaming to a Node.js Readable for a full example.
// Returns a Node.js Readable that emits raw PCM16 binary chunks
const readable = client.tts.toReadable({
text: 'Hello, world!',
modelId: 'kugel-3',
sampleRate: 24000,
language: 'en',
});
readable.pipe(res); // pipe directly to an HTTP response or any writable
For real-time LLM pipelines, use Streaming Sessions instead of one-shot stream(). Input text handling is covered in Text Normalization.