> ## Documentation Index
> Fetch the complete documentation index at: https://docs.kugelaudio.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Generate Speech

> Generate high-quality speech from text using KugelAudio

Generate complete audio from text. This is the simplest way to get started - provide text and receive audio back.

## Basic Generation

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    from kugelaudio import KugelAudio

    client = KugelAudio(api_key="your_api_key")

    audio = client.tts.generate(
        text="Hello, this is a test of the KugelAudio text-to-speech system.",
        model_id="kugel-3",
    )

    # Save to file
    audio.save("output.wav")

    # Or get WAV bytes
    wav_bytes = audio.to_wav_bytes()
    ```
  </Tab>

  <Tab title="JavaScript">
    ```typescript theme={null}
    import { KugelAudio } from 'kugelaudio';

    const client = new KugelAudio({ apiKey: 'your_api_key' });

    const audio = await client.tts.generate({
      text: 'Hello, this is a test of the KugelAudio text-to-speech system.',
      modelId: 'kugel-3',
    });

    // audio.audio is an ArrayBuffer with PCM16 data
    console.log(`Duration: ${audio.durationMs}ms`);
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={null}
    import com.kugelaudio.sdk.KugelAudio;
    import com.kugelaudio.sdk.KugelAudioOptions;
    import com.kugelaudio.sdk.GenerateRequest;
    import com.kugelaudio.sdk.AudioResponse;

    KugelAudio client = new KugelAudio(
        KugelAudioOptions.builder("your_api_key").build()
    );

    AudioResponse audio = client.tts().generate(
        GenerateRequest.builder("Hello, this is a test of the KugelAudio text-to-speech system.")
            .modelId("kugel-3")
            .language("en")
            .build()
    );

    // Save to WAV file
    audio.saveWav(java.nio.file.Path.of("output.wav"));

    // Or get raw PCM bytes
    byte[] pcmData = audio.getAudio();
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    curl -X POST https://api.kugelaudio.com/v1/tts/generate \
      -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "text": "Hello, this is a test of the KugelAudio text-to-speech system.",
        "model_id": "kugel-3"
      }' \
      --output output.pcm

    # The response is raw PCM16 audio (signed 16-bit LE, mono, 24kHz)
    # Convert to WAV for playback:
    ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.wav
    ```
  </Tab>
</Tabs>

## Generation Parameters

The parameters you'll touch most often (Python/REST `snake_case`; JavaScript uses `camelCase`):

* `text` (required) and `model_id` — use `kugel-3`
* `voice_id` — the voice to speak with ([Using voices](/features/voices))
* `cfg_scale` — expressiveness (see the guide below)
* `normalize` + `language` — [text normalization](/features/text-processing); always set the language when you know it
* `word_timestamps` — [word-level timestamps](/streaming/word-timestamps)
* `speed` — playback speed (see Speed Control below)

The complete table — every field with type, default, range, and error behavior — lives in the [Generate Speech API reference](/api-reference/tts/generate#request-body).

### CFG Scale Guide

The `cfg_scale` parameter controls how closely the model follows the voice characteristics. Accepted range: **`1.2`–`2.5`** (inclusive). Values outside this range are clamped into it.

| Range   | Style              | Best For                               |
| ------- | ------------------ | -------------------------------------- |
| 1.2-1.5 | Relaxed, natural   | Conversational AI, long-form narration |
| 2.0     | Balanced (default) | General purpose                        |
| 2.5     | Expressive         | Storytelling, emphasis-heavy content   |

### Speed Control

The `speed` parameter adjusts playback rate using pitch-preserving time-stretching (WSOLA), so the voice pitch stays natural even at different speeds. Range: `0.8` (20% slower) to `1.2` (20% faster).

<Tip>
  **Dashboard**: The playground in the KugelAudio dashboard includes a **Slow / Normal / Fast** speed toggle next to the model selector. Changes are reflected live in the SDK code snippet shown below the generator.
</Tip>

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Global speed — whole request at 80% speed
    audio = client.tts.generate(
        text="Bitte rufen Sie uns an unter: 0 30 12 34 56 78.",
        language="de",
        speed=0.8,
    )
    ```
  </Tab>

  <Tab title="JavaScript">
    ```typescript theme={null}
    // Global speed
    const audio = await client.tts.generate({
      text: 'Bitte rufen Sie uns an unter: 0 30 12 34 56 78.',
      language: 'de',
      speed: 0.8,
    });
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={null}
    // Global speed
    AudioResponse audio = client.tts().generate(
        GenerateRequest.builder("Bitte rufen Sie uns an unter: 0 30 12 34 56 78.")
            .language("de")
            .speed(0.8)
            .build()
    );
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    curl -X POST https://api.kugelaudio.com/v1/tts/generate \
      -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "text": "Bitte rufen Sie uns an unter: 0 30 12 34 56 78.",
        "language": "de",
        "speed": 0.8
      }' \
      --output output.pcm
    ```
  </Tab>
</Tabs>

| `speed` value | Rate             | Typical use                             |
| ------------- | ---------------- | --------------------------------------- |
| `0.8`         | 20% slower       | Phone numbers, addresses, medical terms |
| `1.0`         | Normal (default) | General purpose                         |
| `1.2`         | 20% faster       | Notifications, fast-paced content       |

Speed applies to the whole request; to change the rate for just part of it,
wrap that text in
[`<prosody rate="...">`](/prompting/speed#per-span-speed-with-prosody-rate):

```text theme={null}
Unsere Rückrufnummer lautet <prosody rate="slow">0800 5834552.</prosody> Danke!
```

For pauses, codes, and pronunciation fixes, see the
[Prompting guide](/prompting/overview): [`<break>` tags](/prompting/breaks),
[`<spell>` tags](/prompting/spell), and the
[unsupported-tags table](/prompting/overview#unsupported-tags).

## Full Example with All Options

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    audio = client.tts.generate(
        text="Hello, this is a test of the KugelAudio text-to-speech system.",
        model_id="kugel-3",
        voice_id=1071,
        cfg_scale=2.0,
        max_new_tokens=2048,
        sample_rate=24000,
        normalize=True,
        language="en",
        word_timestamps=False,
        speed=1.0,
    )

    # Inspect the response
    print(f"Duration: {audio.duration_seconds:.2f}s")
    print(f"Samples: {audio.samples}")
    print(f"Sample rate: {audio.sample_rate} Hz")
    print(f"Generation time: {audio.generation_ms:.0f}ms")
    print(f"RTF: {audio.rtf:.2f}")

    # Save to WAV file
    audio.save("output.wav")

    # Get raw PCM bytes
    pcm_data = audio.audio

    # Get WAV bytes (with header)
    wav_bytes = audio.to_wav_bytes()
    ```
  </Tab>

  <Tab title="JavaScript">
    ```typescript theme={null}
    const audio = await client.tts.generate({
      text: 'Hello, this is a test of the KugelAudio text-to-speech system.',
      modelId: 'kugel-3',
      voiceId: 1071,
      cfgScale: 2.0,
      maxNewTokens: 2048,
      sampleRate: 24000,
      normalize: true,
      language: 'en',
      wordTimestamps: false,
      speed: 1.0,
    });

    // Inspect the response
    console.log(`Duration: ${audio.durationMs}ms`);
    console.log(`Samples: ${audio.samples}`);
    console.log(`Sample rate: ${audio.sampleRate} Hz`);
    console.log(`Generation time: ${audio.generationMs}ms`);
    console.log(`RTF: ${audio.rtf}`);

    // audio.audio is an ArrayBuffer with PCM16 data
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={null}
    AudioResponse audio = client.tts().generate(
        GenerateRequest.builder("Hello, this is a test of the KugelAudio text-to-speech system.")
            .modelId("kugel-3")
            .voiceId(1071)
            .cfgScale(2.0)
            .maxNewTokens(2048)
            .sampleRate(24000)
            .normalize(true)
            .language("en")
            .wordTimestamps(false)
            .speed(1.0)
            .build()
    );

    // Inspect the response
    System.out.printf("Duration: %.2fs%n", audio.getDurationSeconds());
    System.out.printf("Samples: %d%n", audio.getTotalSamples());
    System.out.printf("Sample rate: %d Hz%n", audio.getSampleRate());
    System.out.printf("Generation time: %.0fms%n", audio.getGenerationMs());
    System.out.printf("RTF: %.2f%n", audio.getRtf());

    // Save to WAV file
    audio.saveWav(java.nio.file.Path.of("output.wav"));
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    curl -X POST https://api.kugelaudio.com/v1/tts/generate \
      -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "text": "Hello, this is a test of the KugelAudio text-to-speech system.",
        "model_id": "kugel-3",
        "voice_id": 1071,
        "cfg_scale": 2.0,
        "max_new_tokens": 2048,
        "sample_rate": 24000,
        "normalize": true,
        "language": "en",
        "word_timestamps": false,
        "speed": 1.0
      }' \
      --output output.pcm
    ```
  </Tab>
</Tabs>

## Async Generation

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    import asyncio

    async def main():
        audio = await client.tts.generate_async(
            text="Async generation example.",
            model_id="kugel-3",
        )
        audio.save("async_output.wav")

    asyncio.run(main())
    ```
  </Tab>

  <Tab title="JavaScript">
    ```typescript theme={null}
    // JavaScript SDK is async by default
    const audio = await client.tts.generate({
      text: 'Async generation example.',
      modelId: 'kugel-3',
    });
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={null}
    // Java SDK is synchronous by default.
    // Use a thread pool for concurrent requests:
    import java.util.concurrent.*;

    ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();
    Future<AudioResponse> future = executor.submit(() ->
        client.tts().generate(
            GenerateRequest.builder("Async generation example.")
                .modelId("kugel-3")
                .language("en")
                .build()
        )
    );
    AudioResponse audio = future.get();
    audio.saveWav(java.nio.file.Path.of("async_output.wav"));
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    # cURL requests are synchronous by default — the response
    # streams back as the audio is generated.
    curl -X POST https://api.kugelaudio.com/v1/tts/generate \
      -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "text": "Async generation example.",
        "model_id": "kugel-3"
      }' \
      --output async_output.pcm
    ```
  </Tab>
</Tabs>

## Playing Audio in the Browser

The JavaScript SDK provides utility functions for audio playback:

```typescript theme={null}
import { KugelAudio, createWavBlob } from 'kugelaudio';

const client = new KugelAudio({ apiKey: 'your_api_key' });

const audio = await client.tts.generate({
  text: 'Hello, world!',
  modelId: 'kugel-3',
});

// Create WAV blob for playback
const wavBlob = createWavBlob(audio.audio, audio.sampleRate);
const url = URL.createObjectURL(wavBlob);

// Play with Audio element
const audioElement = new Audio(url);
audioElement.play();

// Or with Web Audio API
const audioContext = new AudioContext();
const arrayBuffer = await wavBlob.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();
```

## Pre-connecting for Low Latency

For latency-sensitive applications, pre-establish the WebSocket connection at startup to keep the handshake out of your first request — see [Latency](/latency).

<Tabs>
  <Tab title="Python (Async)">
    ```python theme={null}
    import asyncio
    from kugelaudio import KugelAudio

    async def main():
        # Create a pre-connected client (handshake happens here)
        client = await KugelAudio.create(api_key="your_api_key")
        
        # First request is now fast — no handshake on the hot path
        audio = await client.tts.generate_async(
            text="Hello, world!",
            model_id="kugel-3",
        )
        audio.save("output.wav")
        
        await client.aclose()

    asyncio.run(main())
    ```
  </Tab>

  <Tab title="Python (Sync)">
    ```python theme={null}
    from kugelaudio import KugelAudio

    client = KugelAudio(api_key="your_api_key")

    # Pre-connect at startup (handshake happens here)
    client.connect()

    # First request is now fast
    audio = client.tts.generate(
        text="Hello, world!",
        model_id="kugel-3",
    )
    ```
  </Tab>

  <Tab title="JavaScript">
    ```typescript theme={null}
    import { KugelAudio } from 'kugelaudio';

    // Create a pre-connected client (handshake happens here)
    const client = await KugelAudio.create({ apiKey: 'your_api_key' });

    // First request is now fast — no handshake on the hot path
    const audio = await client.tts.generate({
      text: 'Hello, world!',
      modelId: 'kugel-3',
    });
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={null}
    import com.kugelaudio.sdk.KugelAudio;
    import com.kugelaudio.sdk.KugelAudioOptions;

    // autoConnect warms the WebSocket in the background during construction
    KugelAudio client = new KugelAudio(
        KugelAudioOptions.builder("your_api_key")
            .autoConnect(true)
            .build()
    );

    // First request is now fast — connection is already established
    AudioResponse audio = client.tts().generate(
        GenerateRequest.builder("Hello, world!")
            .modelId("kugel-3")
            .language("en")
            .build()
    );

    client.close();
    ```
  </Tab>
</Tabs>

<Tip>
  Without pre-connecting, the first TTS request includes WebSocket connection setup.
  Subsequent requests reuse the connection. See [Latency](/latency) for typical numbers.
  Pre-connecting moves this overhead to application startup.
</Tip>

## Word Timestamps

Request per-word time alignments alongside the generated audio. Useful for subtitles, karaoke, lip-sync, and barge-in handling.

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    audio = client.tts.generate(
        text="Hello, how are you today?",
        model_id="kugel-3",
        word_timestamps=True,
    )

    for ts in audio.word_timestamps:
        print(f"{ts.word}: {ts.start_ms}ms - {ts.end_ms}ms (score: {ts.score:.2f})")

    # Output:
    # Hello: 0ms - 320ms (score: 0.98)
    # how: 350ms - 480ms (score: 0.95)
    # are: 500ms - 580ms (score: 0.97)
    # you: 600ms - 720ms (score: 0.96)
    # today: 750ms - 1100ms (score: 0.94)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```typescript theme={null}
    const audio = await client.tts.generate({
      text: 'Hello, how are you today?',
      modelId: 'kugel-3',
      wordTimestamps: true,
    });

    for (const ts of audio.wordTimestamps) {
      console.log(`${ts.word}: ${ts.startMs}ms - ${ts.endMs}ms (score: ${ts.score.toFixed(2)})`);
    }
    ```
  </Tab>

  <Tab title="Java">
    ```java theme={null}
    import com.kugelaudio.sdk.WordTimestamp;

    AudioResponse audio = client.tts().generate(
        GenerateRequest.builder("Hello, how are you today?")
            .modelId("kugel-3")
            .language("en")
            .wordTimestamps(true)
            .build()
    );

    for (WordTimestamp ts : audio.getWordTimestamps()) {
        System.out.printf("%s: %dms - %dms (score: %.2f)%n",
            ts.getWord(), ts.getStartMs(), ts.getEndMs(), ts.getScore());
    }
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    curl -X POST https://api.kugelaudio.com/v1/tts/generate \
      -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "text": "Hello, how are you today?",
        "model_id": "kugel-3",
        "word_timestamps": true
      }' \
      --output output.pcm
    ```

    <Note>
      Word timestamps are included in the response headers or as a JSON
      preamble before the audio bytes. Use an SDK for convenient access
      to parsed timestamp objects.
    </Note>
  </Tab>
</Tabs>

<Tip>
  Word timestamps add no extra audio latency. For streaming use cases, see the [Streaming Guide](/streaming/word-timestamps).
</Tip>

## Next Steps

<CardGroup cols={2}>
  <Card title="Streaming" icon="wave-pulse" href="/streaming/overview">
    Lower latency with real-time audio streaming
  </Card>

  <Card title="Text Processing" icon="spell-check" href="/features/text-processing">
    Text normalization and spell tags
  </Card>

  <Card title="Voices" icon="microphone" href="/features/voices">
    Browse and use different voices
  </Card>

  <Card title="Models" icon="microchip" href="/models">
    Learn about available models
  </Card>
</CardGroup>