Skip to main content
The official Java SDK for KugelAudio provides a simple, type-safe interface for text-to-speech generation. Requires Java 17+.

Installation

Add the dependency to your pom.xml:
<dependency>
  <groupId>com.kugelaudio</groupId>
  <artifactId>kugelaudio</artifactId>
  <version>0.1.0</version>
</dependency>
Or with Gradle:
implementation 'com.kugelaudio:kugelaudio:0.1.0'

Quick Start

import com.kugelaudio.sdk.KugelAudio;
import com.kugelaudio.sdk.KugelAudioOptions;
import com.kugelaudio.sdk.GenerateRequest;
import com.kugelaudio.sdk.AudioResponse;

KugelAudio client = new KugelAudio(
    KugelAudioOptions.builder("your_api_key").build()
);

AudioResponse audio = client.tts().generate(
    GenerateRequest.builder("Hello, world!")
        .modelId("kugel-3")
        .language("en")
        .build()
);

audio.saveWav(java.nio.file.Path.of("output.wav"));
client.close();

Pre-connecting for Low Latency

By default, new KugelAudio(options) immediately starts a WebSocket connection in the background. This means the connection handshake is absorbed at startup rather than on the first request — see Latency.
// Connection starts in background automatically (autoConnect = true by default)
KugelAudio client = new KugelAudio(
    KugelAudioOptions.builder("your_api_key").build()
);

// If you need to guarantee the connection is ready before the first request:
client.connect();
System.out.println("Connected: " + client.isConnected());

// Or use the blocking factory method:
KugelAudio client = KugelAudio.createConnected(
    KugelAudioOptions.builder("your_api_key").build()
);
Without pre-connecting, the first TTS request includes WebSocket connection setup. Subsequent requests reuse the connection. See Latency for typical numbers. The default autoConnect = true moves this overhead to client construction.

Explore the SDK

  • Configuration — client options, authentication modes, regions
  • Generate & Stream — one-shot generation, streaming, normalization, word timestamps
  • LLM Sessions — streaming sessions, barge-in, multi-context sessions
  • Voices — list, create, and manage voices
  • Dictionaries — per-project pronunciation and replacement lists
  • Types — data models, audio utilities, and a complete example