Skip to main content
The official JavaScript/TypeScript SDK for KugelAudio provides a modern, type-safe interface for text-to-speech generation in Node.js and browsers.

Installation

npm install kugelaudio
Or with yarn/pnpm:
yarn add kugelaudio
# or
pnpm add kugelaudio

Quick Start

import { KugelAudio } from 'kugelaudio';

// Initialize the client
const client = new KugelAudio({ apiKey: 'your_api_key' });

// Generate speech
const audio = await client.tts.generate({
  text: 'Hello, world!',
  modelId: 'kugel-1-turbo',
});

// audio.audio is an ArrayBuffer with PCM16 data
console.log(`Duration: ${audio.durationMs}ms`);

Pre-connecting for Low Latency

For latency-sensitive applications, pre-establish the WebSocket connection at startup to eliminate cold start latency (~300-500ms) from your first TTS request.
import { KugelAudio } from 'kugelaudio';

// Create a pre-connected client (~500ms happens here)
const client = await KugelAudio.create({ apiKey: 'your_api_key' });

// First request is now fast (~100-150ms TTFA instead of ~500ms)
await client.tts.stream(
  { text: 'Hello, world!', modelId: 'kugel-1-turbo' },
  { onChunk: (chunk) => playAudio(chunk.audio) }
);

Manual Connection

import { KugelAudio } from 'kugelaudio';

// Initialize client
const client = new KugelAudio({ apiKey: 'your_api_key' });

// Pre-connect at startup (~500ms happens here)
await client.connect();

// Check connection status
console.log(`Connected: ${client.isConnected()}`);

// First request is now fast
await client.tts.stream(
  { text: 'Hello, world!' },
  { onChunk: (chunk) => playAudio(chunk.audio) }
);
Without pre-connecting, the first TTS request includes WebSocket connection setup (~300-500ms). Subsequent requests reuse the connection and are fast (~100-150ms TTFA). Pre-connecting moves this overhead to application startup.

Client Configuration

import { KugelAudio } from 'kugelaudio';

// Simple setup
const client = new KugelAudio({ apiKey: 'your_api_key' });

// With custom options
const client = new KugelAudio({
  apiKey: 'your_api_key',           // Required: Your API key
  apiUrl: 'https://api.kugelaudio.com',  // Optional: API base URL
  timeout: 60000,                    // Optional: Request timeout in ms
});

Local Development

For local development, point directly to your TTS server:
const client = new KugelAudio({
  apiKey: 'your_api_key',
  apiUrl: 'http://localhost:8000',
});
Or with separate backend and TTS servers:
const client = new KugelAudio({
  apiKey: 'your_api_key',
  apiUrl: 'http://localhost:8001',   // Backend for REST API
  ttsUrl: 'http://localhost:8000',   // TTS server for WebSocket streaming
});

Text-to-Speech

Basic Generation

Generate complete audio and receive it all at once:
const audio = await client.tts.generate({
  text: 'Hello, this is a test of the KugelAudio text-to-speech system.',
  modelId: 'kugel-1-turbo',  // 'kugel-1-turbo' (fast) or 'kugel-1' (quality)
  voiceId: 123,               // Optional: specific voice ID
  cfgScale: 2.0,              // Guidance scale (1.0-5.0)
  maxNewTokens: 2048,         // Maximum tokens to generate
  sampleRate: 24000,          // Output sample rate
  normalize: true,            // Enable text normalization (default)
  language: 'en',             // Language for normalization (see below)
  wordTimestamps: false,      // Request word-level timestamps (default: false)
});

// Audio properties
console.log(`Duration: ${audio.durationMs}ms`);
console.log(`Samples: ${audio.samples}`);
console.log(`Sample rate: ${audio.sampleRate} Hz`);
console.log(`Generation time: ${audio.generationMs}ms`);
console.log(`RTF: ${audio.rtf}`);  // Real-time factor

// audio.audio is an ArrayBuffer with PCM16 data

Playing Audio in Browser

import { createWavBlob } from 'kugelaudio';

const audio = await client.tts.generate({
  text: 'Hello, world!',
  modelId: 'kugel-1-turbo',
});

// Create WAV blob for playback
const wavBlob = createWavBlob(audio.audio, audio.sampleRate);
const url = URL.createObjectURL(wavBlob);

// Play with Audio element
const audioElement = new Audio(url);
audioElement.play();

// Or with Web Audio API
const audioContext = new AudioContext();
const arrayBuffer = await wavBlob.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();

Streaming Audio

Receive audio chunks as they are generated for lower latency:
await client.tts.stream(
  {
    text: 'Hello, this is streaming audio.',
    modelId: 'kugel-1-turbo',
  },
  {
    onOpen: () => {
      console.log('WebSocket connected');
    },
    onChunk: (chunk) => {
      console.log(`Chunk ${chunk.index}: ${chunk.samples} samples`);
      // chunk.audio is base64-encoded PCM16 data
      playAudioChunk(chunk);
    },
    onFinal: (stats) => {
      console.log(`Total duration: ${stats.durationMs}ms`);
      console.log(`Generation time: ${stats.generationMs}ms`);
      console.log(`RTF: ${stats.rtf}`);
    },
    onError: (error) => {
      console.error('TTS error:', error);
    },
    onClose: () => {
      console.log('WebSocket closed');
    },
  }
);

Streaming to a Node.js Readable (Vapi / HTTP endpoints)

For server-side integrations that expect a Node.js Readable stream — such as Vapi custom TTS endpoints or Express/Fastify handlers — use client.tts.toReadable() instead of wiring onChunk manually. toReadable() avoids a common race-condition: the stream object is returned before any audio arrives, so you can safely pipe() or attach listeners immediately.
import express from 'express';
import { KugelAudio } from 'kugelaudio';

const app = express();
app.use(express.json());

// Pre-connect at startup for fastest TTFA
const client = await KugelAudio.create({ apiKey: process.env.KUGELAUDIO_API_KEY! });

// Vapi custom TTS endpoint
app.post('/synthesize', (req, res) => {
  const { text, sampleRate } = req.body.message;

  res.setHeader('Content-Type', 'audio/pcm');
  res.setHeader('Transfer-Encoding', 'chunked');

  const readable = client.tts.toReadable({
    text,
    modelId: 'kugel-1-turbo',
    sampleRate,   // honour the sample rate Vapi requests
    language: 'en',
  });

  readable.pipe(res);
});
toReadable() is Node.js only. It requires the built-in stream module and will throw in browser environments. Use the callback-based stream() API for browser code.
Call await KugelAudio.create(...) (or await client.connect()) at application startup. This pre-establishes the WebSocket connection so that subsequent toReadable() calls skip the connection overhead (~300-500ms) and start streaming audio immediately.

Processing Audio Chunks

import { base64ToArrayBuffer, decodePCM16 } from 'kugelaudio';

// In streaming callback:
onChunk: (chunk) => {
  // Decode base64 to ArrayBuffer
  const pcmBuffer = base64ToArrayBuffer(chunk.audio);
  
  // Convert PCM16 to Float32 for Web Audio API
  const float32Data = decodePCM16(chunk.audio);
  
  // Play with Web Audio API
  const audioBuffer = audioContext.createBuffer(1, float32Data.length, chunk.sampleRate);
  audioBuffer.copyToChannel(float32Data, 0);
  
  const source = audioContext.createBufferSource();
  source.buffer = audioBuffer;
  source.connect(audioContext.destination);
  source.start();
}

Text Normalization

Text normalization converts numbers, dates, times, and other non-verbal text into spoken words:
  • “I have 3 apples” → “I have three apples”
  • “The meeting is at 2:30 PM” → “The meeting is at two thirty PM”
  • “€50.99” → “fifty euros and ninety-nine cents”
// With explicit language (recommended - fastest)
const audio = await client.tts.generate({
  text: 'I bought 3 items for €50.99 on 01/15/2024.',
  normalize: true,
  language: 'en',  // Specify language for best performance
});

// With auto-detection (may cause incorrect normalizations)
const audio = await client.tts.generate({
  text: 'Ich habe 3 Artikel für 50,99€ gekauft.',
  normalize: true,
  // language not specified - will auto-detect
});

Supported Languages

CodeLanguageCodeLanguage
deGermannlDutch
enEnglishplPolish
frFrenchsvSwedish
esSpanishdaDanish
itItaliannoNorwegian
ptPortuguesefiFinnish
csCzechhuHungarian
roRomanianelGreek
ukUkrainianbgBulgarian
trTurkishviVietnamese
arArabichiHindi
zhChinesejaJapanese
koKorean
Using normalize: true without specifying language may cause incorrect normalizations, especially for short texts or languages that share similar vocabulary. Always specify language when you know it.

Spell Tags

Use <spell> tags to spell out text letter by letter. This is useful for email addresses, codes, acronyms, or any text that should be pronounced character by character:
// Spell out an email address
const audio = await client.tts.generate({
  text: 'Contact me at <spell>kajo@kugelaudio.com</spell>',
  normalize: true,
  language: 'en',
});
// Output: "Contact me at K, A, J, O, at, K, U, G, E, L, A, U, D, I, O, dot, C, O, M"

// Spell out an acronym
const audio = await client.tts.generate({
  text: 'The <spell>API</spell> is easy to use.',
  normalize: true,
  language: 'en',
});
// Output: "The A, P, I is easy to use."

// German example with language-specific translations
const audio = await client.tts.generate({
  text: 'Meine E-Mail ist <spell>test@beispiel.de</spell>',
  normalize: true,
  language: 'de',
});
// Output: "Meine E-Mail ist T, E, S, T, ät, B, E, I, S, P, I, E, L, Punkt, D, E"
Spell tags also work with streaming:
await client.tts.stream(
  {
    // Even if the tag is split across the stream, it works correctly
    text: 'My verification code is <spell>ABC-123-XYZ</spell>.',
    normalize: true,
    language: 'en',
  },
  {
    onChunk: (chunk) => playAudio(chunk.audio),
  }
);
Special Characters: Characters like @, ., - are translated to language-specific words. For example, @ becomes “at” in English, “ät” in German, and “arobase” in French.
Model recommendation: For clearer letter-by-letter pronunciation, use modelId: 'kugel-1' instead of kugel-1-turbo.

Word Timestamps

Request word-level time alignments alongside audio. Useful for subtitle synchronization, lip-sync, and barge-in handling.

With Generate

const audio = await client.tts.generate({
  text: 'Hello, how are you today?',
  modelId: 'kugel-1-turbo',
  wordTimestamps: true,
});

// Access word timestamps from the response
for (const ts of audio.wordTimestamps) {
  console.log(`${ts.word}: ${ts.startMs}ms - ${ts.endMs}ms (score: ${ts.score.toFixed(2)})`);
}

// Example output:
// Hello: 0ms - 320ms (score: 0.98)
// how: 350ms - 480ms (score: 0.95)
// are: 500ms - 580ms (score: 0.97)
// you: 600ms - 720ms (score: 0.96)
// today: 750ms - 1100ms (score: 0.94)

With Streaming

await client.tts.stream(
  {
    text: 'Hello, how are you today?',
    modelId: 'kugel-1-turbo',
    wordTimestamps: true,
  },
  {
    onChunk: (chunk) => {
      playAudio(chunk.audio);
    },
    onWordTimestamps: (timestamps) => {
      for (const ts of timestamps) {
        console.log(`${ts.word}: ${ts.startMs}-${ts.endMs}ms`);
      }
    },
    onFinal: (stats) => {
      console.log(`Done: ${stats.durationMs}ms`);
    },
  }
);
Word timestamps add no extra audio latency. They arrive ~50-200ms after the corresponding audio chunk.

Voices

List Available Voices

// List all available voices
const voices = await client.voices.list();

for (const voice of voices) {
  console.log(`${voice.id}: ${voice.name}`);
  console.log(`  Category: ${voice.category}`);
  console.log(`  Languages: ${voice.supportedLanguages.join(', ')}`);
}

// Filter by language
const germanVoices = await client.voices.list({ language: 'de' });

// Get only public voices
const publicVoices = await client.voices.list({ includePublic: true });

// Limit results
const first10 = await client.voices.list({ limit: 10 });

Get a Specific Voice

const voice = await client.voices.get(123);
console.log(`Voice: ${voice.name}`);
console.log(`Category: ${voice.category}`);

Create a Voice

Create a new voice with optional reference audio files:
// Browser — use File objects from an <input type="file">
const fileInput = document.getElementById('audio-upload') as HTMLInputElement;
const files = Array.from(fileInput.files!);

const voice = await client.voices.create({
  name: 'My Custom Voice',
  sex: 'female',
  description: 'A warm, conversational voice',
  category: 'cloned',
  referenceFiles: files,
});
console.log(`Created voice: ${voice.id}`);
// Node.js — use Blob constructed from a Buffer
import { readFileSync } from 'fs';

const buf = readFileSync('reference.wav');
const blob = new Blob([buf], { type: 'audio/wav' });

const voice = await client.voices.create({
  name: 'My Custom Voice',
  sex: 'female',
  referenceFiles: [blob],
});

Update a Voice

const voice = await client.voices.update(123, {
  name: 'Updated Name',
  description: 'New description',
});

Delete a Voice

await client.voices.delete(123);

Manage Reference Audio

// List references for a voice
const refs = await client.voices.listReferences(123);
for (const ref of refs) {
  console.log(`${ref.id}: ${ref.name}`);
}

// Add a new reference
const ref = await client.voices.addReference(123, audioFile, 'Optional transcript.');

// Delete a reference
await client.voices.deleteReference(123, 456);

Publish a Voice

Request that your voice be made publicly available. An admin will verify it before it becomes visible to others.
const voice = await client.voices.publish(123);
console.log(`Pending verification: ${voice.pendingVerification}`);

Generate Voice Sample

Trigger sample audio generation for a voice:
const voice = await client.voices.generateSample(123);
console.log(`Sample URL: ${voice.sampleUrl}`);

LLM Integration — Streaming Sessions

For real-time LLM pipelines, use client.tts.streamingSession() instead of client.tts.stream(). The session endpoint (/ws/tts/stream) keeps a persistent WebSocket connection and accumulates LLM tokens server-side, starting generation at natural sentence boundaries.

Why not flush per sentence?

Calling send(token, flush=true) on every sentence feels intuitive, but it actually increases latency:
  • Each flush triggers a full model prefill (the fixed cost of loading context into the model).
  • The server’s KV cache cannot be reused across separate flushes, so each segment is cold.
  • Word-level flushing can add 200–500 ms per sentence compared to letting the server batch.
Let the server handle chunking via chunkLengthSchedule and autoMode.

Basic usage

const session = client.tts.streamingSession(
  {
    voiceId: 123,
    modelId: 'kugel-1-turbo',
    // autoMode: emit at first sentence boundary (lowest TTFA)
    autoMode: true,
    chunkLengthSchedule: [50, 100, 150, 250],
  },
  {
    onChunk: (chunk) => playAudio(chunk.audio),
    onChunkComplete: (chunkId, audioSecs, genMs) => {
      console.log(`Chunk ${chunkId}: ${audioSecs.toFixed(2)}s audio in ${genMs}ms`);
    },
    onSessionClosed: (totalSecs) => {
      console.log(`Session complete: ${totalSecs.toFixed(2)}s total audio`);
    },
    onError: (err) => console.error('TTS error:', err),
  }
);

session.connect();

// Feed LLM tokens as they arrive
for await (const delta of openai.chat.completions.stream(...)) {
  const text = delta.choices[0]?.delta?.content;
  if (text) session.send(text);
}

// Flush remaining buffer and close
session.close();

Chunking presets

PresetConfigBest for
Low-latencyautoMode: true, chunkLengthSchedule: [50, 100, 150, 250]Voice assistants, chat bots
BalancedchunkLengthSchedule: [80, 150, 250] (default)General LLM streaming
High-qualitychunkLengthSchedule: [120, 200, 300]Narration, long-form audio
autoMode: true and small chunkLengthSchedule values minimise time-to-first-audio. Use larger values when prosody quality matters more than TTFA.
Avoid calling send(text, true) (flush=true) on every sentence. This bypasses server-side semantic chunking, forces a cold model prefill per segment, and degrades both latency and audio quality.

Models

List Available Models

const models = await client.models.list();

for (const model of models) {
  console.log(`${model.id}: ${model.name}`);
  console.log(`  Description: ${model.description}`);
  console.log(`  Parameters: ${model.parameters}`);
  console.log(`  Max Input: ${model.maxInputLength} characters`);
  console.log(`  Sample Rate: ${model.sampleRate} Hz`);
}

Error Handling

import { KugelAudio } from 'kugelaudio';
import {
  KugelAudioError,
  AuthenticationError,
  RateLimitError,
  InsufficientCreditsError,
  ValidationError,
  ConnectionError,
} from 'kugelaudio';

try {
  const audio = await client.tts.generate({ text: 'Hello!' });
} catch (error) {
  if (error instanceof AuthenticationError) {
    console.error('Invalid API key');
  } else if (error instanceof RateLimitError) {
    console.error('Rate limit exceeded, please wait');
  } else if (error instanceof InsufficientCreditsError) {
    console.error('Not enough credits, please top up');
  } else if (error instanceof ValidationError) {
    console.error(`Invalid request: ${error.message}`);
  } else if (error instanceof ConnectionError) {
    console.error('Failed to connect to server');
  } else if (error instanceof KugelAudioError) {
    console.error(`API error: ${error.message}`);
  }
}

TypeScript Types

KugelAudioOptions

interface KugelAudioOptions {
  apiKey: string;      // Required
  apiUrl?: string;     // Default: 'https://api.kugelaudio.com'
  ttsUrl?: string;     // Default: same as apiUrl
  timeout?: number;    // Default: 60000 (ms)
}

GenerateOptions

interface GenerateOptions {
  text: string;              // Required: Text to synthesize
  modelId?: string;          // Default: 'kugel-1-turbo'
  voiceId?: number;          // Optional: Voice ID
  cfgScale?: number;         // Default: 2.0
  maxNewTokens?: number;     // Default: 2048
  sampleRate?: number;       // Default: 24000
  normalize?: boolean;       // Default: true - Enable text normalization
  language?: string;         // ISO 639-1 code for normalization (e.g., 'en', 'de')
  wordTimestamps?: boolean;  // Default: false - Request word-level timestamps
}
Using normalize: true without language may cause incorrect normalizations. Always specify language when you know it.

AudioChunk

interface AudioChunk {
  audio: string;       // Base64-encoded PCM16 audio
  encoding: string;    // 'pcm_s16le'
  index: number;       // Chunk index (0-based)
  sampleRate: number;  // Sample rate (24000)
  samples: number;     // Number of samples in chunk
}

WordTimestamp

interface WordTimestamp {
  word: string;      // The word
  startMs: number;   // Start time in milliseconds
  endMs: number;     // End time in milliseconds
  charStart: number; // Character start index in original text
  charEnd: number;   // Character end index in original text
  score: number;     // Alignment confidence score (0.0 - 1.0)
}

AudioResponse

interface AudioResponse {
  audio: ArrayBuffer;              // Complete PCM16 audio
  sampleRate: number;              // Sample rate (24000)
  samples: number;                 // Total samples
  durationMs: number;              // Duration in milliseconds
  generationMs: number;            // Generation time in milliseconds
  rtf: number;                     // Real-time factor
  wordTimestamps: WordTimestamp[];  // Per-word timing (when wordTimestamps: true)
}

GenerationStats

interface GenerationStats {
  final: true;
  chunks: number;         // Number of chunks generated
  totalSamples: number;   // Total samples generated
  durationMs: number;     // Audio duration in ms
  generationMs: number;   // Generation time in ms
  rtf: number;           // Real-time factor
}

StreamCallbacks

Used with the one-shot client.tts.stream() endpoint:
interface StreamCallbacks {
  onOpen?: () => void;
  onChunk?: (chunk: AudioChunk) => void;
  onWordTimestamps?: (timestamps: WordTimestamp[]) => void;
  onFinal?: (stats: GenerationStats) => void;
  onError?: (error: Error) => void;
  onClose?: () => void;
}

StreamConfig

Configuration for client.tts.streamingSession() (LLM integration endpoint):
interface StreamConfig {
  voiceId?: number;
  modelId?: string;           // Default: 'kugel-1-turbo'
  cfgScale?: number;
  maxNewTokens?: number;
  sampleRate?: number;
  flushTimeoutMs?: number;
  maxBufferLength?: number;
  normalize?: boolean;
  language?: string;          // ISO 639-1 code — specify to avoid auto-detect latency
  wordTimestamps?: boolean;
  /**
   * Minimum buffer sizes (chars) before each successive chunk is auto-emitted.
   * Smaller = lower TTFA; larger = better prosody context.
   * Default: [5, 80, 150, 250]
   */
  chunkLengthSchedule?: number[];
  /**
   * When true, start generating at the very first clean sentence boundary.
   * Equivalent to ElevenLabs auto_mode=true. Lowest possible TTFA.
   */
  autoMode?: boolean;
}

StreamingSessionCallbacks

interface StreamingSessionCallbacks {
  onChunk?: (chunk: AudioChunk) => void;
  onChunkComplete?: (chunkId: number, audioSeconds: number, genMs: number) => void;
  onSessionClosed?: (totalAudioSeconds: number, totalTextChunks: number, totalAudioChunks: number) => void;
  onGenerationStarted?: (chunkId: number, text: string) => void;
  onWordTimestamps?: (timestamps: WordTimestamp[]) => void;
  onError?: (error: Error) => void;
}

Model

interface Model {
  id: string;             // 'kugel-1-turbo' or 'kugel-1'
  name: string;           // Human-readable name
  description: string;    // Model description
  maxInputLength: number; // Maximum input characters
  sampleRate: number;     // Output sample rate
}

Voice

interface Voice {
  id: number;                    // Voice ID
  voiceId: number;               // Same as id (backward compat)
  name: string;                  // Voice name
  description?: string;          // Description
  category?: VoiceCategory;      // 'premade' | 'cloned' | 'designed' | 'conversational' | 'narrative' | 'narrative_story' | 'characters'
  sex?: string;                  // 'male' | 'female' | 'neutral'
  age?: string;                  // 'young' | 'middle_aged' | 'old'
  supportedLanguages: string[];  // ['en', 'de', ...]
  avatarUrl?: string;            // Avatar image URL
  sampleUrl?: string;            // Sample audio URL
}

VoiceDetail

Extended voice information (returned by create, update, get, publish, generateSample):
interface VoiceDetail {
  id: number;
  name: string;
  description: string;
  generativeVoiceDescription: string;
  supportedLanguages: string[];
  category: string;
  age?: string;
  sex?: string;
  quality: string;                 // 'low' | 'mid' | 'high'
  isPublic: boolean;
  verified: boolean;
  pendingVerification: boolean;
  sampleUrl?: string;
  avatarUrl?: string;
  sampleText: string;
}

VoiceReference

interface VoiceReference {
  id: number;
  voiceId: number;
  name: string;
  referenceText: string;
  s3Path: string;
  audioUrl?: string;
  isGenerated: boolean;
}

CreateVoiceOptions

interface CreateVoiceOptions {
  name: string;
  sex: string;
  description?: string;
  category?: string;
  age?: string;
  quality?: string;
  supportedLanguages?: string[];
  isPublic?: boolean;
  sampleText?: string;
  referenceFiles?: Array<File | Blob>;
}

UpdateVoiceOptions

interface UpdateVoiceOptions {
  name?: string;
  description?: string;
  category?: string;
  age?: string;
  sex?: string;
  quality?: string;
  supportedLanguages?: string[];
  isPublic?: boolean;
  sampleText?: string;
}

Utility Functions

base64ToArrayBuffer

Convert base64 string to ArrayBuffer:
import { base64ToArrayBuffer } from 'kugelaudio';

const buffer = base64ToArrayBuffer(chunk.audio);

decodePCM16

Convert base64 PCM16 to Float32Array for Web Audio API:
import { decodePCM16 } from 'kugelaudio';

const floatData = decodePCM16(chunk.audio);

createWavFile

Create a WAV file from PCM16 data:
import { createWavFile } from 'kugelaudio';

const wavBuffer = createWavFile(pcmArrayBuffer, 24000);

createWavBlob

Create a playable Blob from PCM16 data:
import { createWavBlob } from 'kugelaudio';

const blob = createWavBlob(pcmArrayBuffer, 24000);
const url = URL.createObjectURL(blob);

client.tts.toReadable (Node.js only)

Convert a TTS stream directly to a Node.js Readable for use in HTTP handlers, pipelines, and server-side integrations. See Streaming to a Node.js Readable for a full example.
// Returns a Node.js Readable that emits raw PCM16 binary chunks
const readable = client.tts.toReadable({
  text: 'Hello, world!',
  modelId: 'kugel-1-turbo',
  sampleRate: 24000,
  language: 'en',
});

readable.pipe(res); // pipe directly to an HTTP response or any writable

Complete Example

import { KugelAudio, createWavBlob, base64ToArrayBuffer } from 'kugelaudio';

async function main() {
  // Initialize client
  const client = new KugelAudio({ apiKey: 'your_api_key' });

  // List available models
  console.log('Available Models:');
  const models = await client.models.list();
  for (const model of models) {
    console.log(`  - ${model.id}: ${model.name} (${model.parameters})`);
  }

  // List available voices
  console.log('\nAvailable Voices:');
  const voices = await client.voices.list({ limit: 5 });
  for (const voice of voices) {
    console.log(`  - ${voice.id}: ${voice.name}`);
  }

  // Generate audio with streaming
  console.log('\nGenerating audio (streaming)...');
  const chunks: ArrayBuffer[] = [];
  let ttfa: number | undefined;
  const startTime = Date.now();

  await client.tts.stream(
    {
      text: 'Welcome to KugelAudio. This is an example of high-quality text-to-speech synthesis.',
      modelId: 'kugel-1-turbo',
    },
    {
      onChunk: (chunk) => {
        if (!ttfa) {
          ttfa = Date.now() - startTime;
          console.log(`Time to first audio: ${ttfa}ms`);
        }
        chunks.push(base64ToArrayBuffer(chunk.audio));
      },
      onFinal: (stats) => {
        console.log(`Generated ${stats.durationMs}ms of audio`);
        console.log(`Generation time: ${stats.generationMs}ms`);
        console.log(`RTF: ${stats.rtf}x`);
      },
    }
  );
}

main();

Browser Support

The SDK works in modern browsers with WebSocket support. For Node.js, ensure you have a WebSocket implementation available.