Voice Cloning

Voice cloning allows you to create a synthetic voice that sounds like a specific person from just a few seconds of reference audio.

Voice cloning is available on Business and Enterprise plans.

How It Works

Upload reference audio - Provide 10-30 seconds of clean speech
Processing - Our AI analyzes the voice characteristics
Voice created - Use your new voice in any TTS request

Requirements

Audio Quality

For best results, your reference audio should be:

Duration: 10-30 seconds of speech
Format: WAV, MP3, OGG, M4A, or FLAC
Sample rate: 16kHz or higher
Channels: Mono preferred
Quality: Clean, no background noise

Content Guidelines

✅ Good audio:

Clear speech with natural pacing
Single speaker only
Minimal background noise
Natural emotional range
Free of filler words (um, uh, ah, hmm) unless you want them in the output

❌ Avoid:

Multiple speakers
Background music
Heavy reverb or echo
Whispered or shouted speech
Heavily compressed audio
Recordings with frequent filler sounds or hesitations

Your samples define the voice. The cloned voice will reproduce everything present in your reference audio — including filler sounds like “um”, “ah”, “hmm”, pauses, breathing patterns, and any other speech habits. If your reference audio contains these sounds, they will appear in the generated output and cannot be removed after cloning.For the most controllable results, use clean recordings without fillers. You can then add natural-sounding hesitations through your text prompts when needed (e.g., writing “um” or ”…” in the input text).

Creating a Voice Clone

Via Dashboard

Go to Dashboard → Voices → Create Voice
Upload your reference audio
Enter a name and description
Click Create Voice
Wait for processing (usually 2-5 minutes)

Via SDK

Python
JavaScript
cURL

from kugelaudio import KugelAudio

client = KugelAudio(api_key="YOUR_API_KEY")

# Create a voice with reference audio
voice = client.voices.create(
    name="My Custom Voice",
    sex="female",
    description="Cloned from reference audio",
    category="cloned",
    reference_files=["reference.wav"],
)

print(f"Created voice: {voice.id}")
print(f"Name: {voice.name}")

import { KugelAudio } from 'kugelaudio';

const client = new KugelAudio({ apiKey: 'YOUR_API_KEY' });

// Create a voice with reference audio (browser)
const fileInput = document.getElementById('audio-upload') as HTMLInputElement;
const file = fileInput.files![0];

const voice = await client.voices.create({
  name: 'My Custom Voice',
  sex: 'female',
  description: 'Cloned from reference audio',
  category: 'cloned',
  referenceFiles: [file],
});

console.log(`Created voice: ${voice.id}`);

curl -X POST https://api.kugelaudio.com/v1/voices \
  -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
  -F 'metadata={"name":"My Custom Voice","sex":"female","description":"Cloned from reference audio","category":"cloned"};type=application/json' \
  -F "files=@reference.wav"

Using Cloned Voices

Once created, use your cloned voice like any other:

Python
JavaScript
cURL

from kugelaudio import KugelAudio

client = KugelAudio(api_key="YOUR_API_KEY")

# Use your cloned voice
audio = client.tts.generate(
    text="Hello, this is my cloned voice speaking!",
    model_id="kugel-1-turbo",
    voice_id=YOUR_CLONED_VOICE_ID,
)

audio.save("cloned_output.wav")

import { KugelAudio } from 'kugelaudio';

const client = new KugelAudio({ apiKey: 'YOUR_API_KEY' });

const audio = await client.tts.generate({
  text: 'Hello, this is my cloned voice speaking!',
  modelId: 'kugel-1-turbo',
  voiceId: YOUR_CLONED_VOICE_ID,
});

curl -X POST https://api.kugelaudio.com/v1/tts/generate \
  -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is my cloned voice speaking!",
    "model_id": "kugel-1-turbo",
    "voice_id": YOUR_CLONED_VOICE_ID
  }' \
  --output cloned_output.pcm

Best Practices

Optimizing Voice Quality

Use high-quality source audio

The quality of your cloned voice depends heavily on the source audio. Use professional recordings when possible.

Provide diverse samples

Include a range of intonations, emotions, and sentence types in your reference audio for a more natural clone.

Adjust CFG scale

Experiment with different cfg_scale values. Cloned voices often benefit from slightly lower values (1.5-2.0) for more natural output.

Use the right model

The kugel-1 model generally produces better results for voice cloning due to its larger capacity.

Remove filler sounds from samples

If your output contains unwanted “um”s, “ah”s, or hesitations, re-record or edit your reference audio to remove them. The model faithfully reproduces what it hears in the samples — clean input produces clean, controllable output. You can always add fillers via your text prompts later.

Troubleshooting

Issue	Solution
Voice sounds robotic	Use higher quality source audio, try lower CFG scale
Voice sounds different	Ensure source audio is clean, try different text samples
Accent not preserved	Include more diverse samples, use longer reference audio
Inconsistent output	Try different CFG values (2.0–3.0)
Unwanted filler sounds (um, ah, hmm)	Re-record or edit reference audio to remove fillers — see Content Guidelines

Managing Cloned Voices

List Your Voices

Python
JavaScript
cURL

voices = client.voices.list()

for voice in voices:
    print(f"{voice.id}: {voice.name} ({voice.category})")

const voices = await client.voices.list();

for (const voice of voices) {
  console.log(`${voice.id}: ${voice.name} (${voice.category})`);
}

curl "https://api.kugelaudio.com/v1/voices" \
  -H "Authorization: Bearer $KUGELAUDIO_API_KEY"

Update Voice

Python
JavaScript
cURL

voice = client.voices.update(
    voice_id=1071,
    name="Updated Name",
    description="Updated description",
)
print(f"Updated: {voice.name}")

const voice = await client.voices.update(1071, {
  name: 'Updated Name',
  description: 'Updated description',
});
console.log(`Updated: ${voice.name}`);

curl -X PATCH https://api.kugelaudio.com/v1/voices/1071 \
  -H "Authorization: Bearer $KUGELAUDIO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Updated Name",
    "description": "Updated description"
  }'

Delete Voice

Python
JavaScript
cURL

client.voices.delete(voice_id=1071)

await client.voices.delete(1071);

curl -X DELETE https://api.kugelaudio.com/v1/voices/1071 \
  -H "Authorization: Bearer $KUGELAUDIO_API_KEY"

Managing Reference Audio

You can add and remove reference audio files after creating a voice.

List References

Python
JavaScript

refs = client.voices.list_references(voice_id=1071)
for ref in refs:
    print(f"{ref.id}: {ref.name}")

const refs = await client.voices.listReferences(1071);
for (const ref of refs) {
  console.log(`${ref.id}: ${ref.name}`);
}

Add Reference

Python
JavaScript

ref = client.voices.add_reference(
    voice_id=1071,
    file_path="new_reference.wav",
    reference_text="Optional transcript of the audio.",
)
print(f"Added reference: {ref.id}")

const file = new File([audioBuffer], 'new_reference.wav', { type: 'audio/wav' });
const ref = await client.voices.addReference(1071, file, 'Optional transcript.');
console.log(`Added reference: ${ref.id}`);

Delete Reference

Python
JavaScript

client.voices.delete_reference(voice_id=1071, reference_id=456)

await client.voices.deleteReference(1071, 456);

Publishing Voices

Request that your voice be made public. It will be marked as pending verification until reviewed by an admin.

Python
JavaScript

voice = client.voices.publish(voice_id=1071)
print(f"Pending verification: {voice.pending_verification}")

const voice = await client.voices.publish(1071);
console.log(`Pending verification: ${voice.pendingVerification}`);

Generating Voice Samples

Trigger sample audio generation for a voice. This is done automatically on creation, but you can re-trigger it manually.

Python
JavaScript

voice = client.voices.generate_sample(voice_id=1071)
print(f"Sample URL: {voice.sample_url}")

const voice = await client.voices.generateSample(1071);
console.log(`Sample URL: ${voice.sampleUrl}`);

AI Transparency & Watermarking

All audio generated by KugelAudio — including voice-cloned output — is automatically watermarked using AudioSeal, an imperceptible neural watermarking technique.

This watermarking is required under EU AI Act Article 50 (Regulation (EU) 2024/1689), which mandates that AI-generated audio content be marked in a machine-detectable way. The watermark is inaudible to humans and survives common post-processing operations (re-encoding, light compression).

The watermark encodes:

A KugelAudio-issued identifier linking the audio to the originating API key
A generation timestamp

This allows KugelAudio and auditors to verify whether a piece of audio was generated by the system, supporting abuse detection and regulatory compliance. What this means for you as an API customer:

You do not need to do anything — watermarking is applied automatically on every synthesis request.
If you redistribute AI-generated audio, you are responsible for complying with applicable disclosure obligations in your jurisdiction (e.g. labelling synthetic media in advertising or public communications).
The watermark does not affect audio quality at perceptible levels.

Privacy & Ethics

Only clone voices you have permission to use. Misuse of voice cloning technology may violate laws and our Terms of Service.

Guidelines

Get consent - Always obtain permission before cloning someone’s voice
Disclose synthetic speech - Be transparent when using cloned voices in public-facing contexts
No impersonation - Don’t use cloned voices to deceive or defraud
Respect rights - Don’t clone voices of public figures without authorization

Verification

For Business and Enterprise plans, we offer voice verification to ensure ethical use:

Upload proof of consent
Our team reviews the submission
Voice is marked as “verified”
Verified voices have no usage restrictions

Limits

Plan	Cloned Voices	Storage
Free	0	-
Starter	0	-
Business	10	100MB
Enterprise	Unlimited	Unlimited

Next Steps

Using Voices

Browse and use available voices

Generate Speech

Generate audio with your cloned voice

Models

Learn about available models

Getting Started

Speech Generation

Voices

Integrations

Deployment

SDK Reference

How It Works

Requirements

Audio Quality

Content Guidelines

Creating a Voice Clone

Via Dashboard

Via SDK

Using Cloned Voices

Best Practices

Optimizing Voice Quality

Troubleshooting

Managing Cloned Voices

List Your Voices

Update Voice

Delete Voice

Managing Reference Audio

List References

Add Reference

Delete Reference

Publishing Voices

Generating Voice Samples

AI Transparency & Watermarking

Privacy & Ethics

Guidelines

Verification

Limits

Next Steps

Using Voices

Generate Speech

Models

Getting Started

Speech Generation

Voices

Integrations

Deployment

SDK Reference

Documentation Index

​How It Works

​Requirements

​Audio Quality

​Content Guidelines

​Creating a Voice Clone

​Via Dashboard

​Via SDK

​Using Cloned Voices

​Best Practices

​Optimizing Voice Quality

​Troubleshooting

​Managing Cloned Voices

​List Your Voices

​Update Voice

​Delete Voice

​Managing Reference Audio

​List References

​Add Reference

​Delete Reference

​Publishing Voices

​Generating Voice Samples

​AI Transparency & Watermarking

​Privacy & Ethics

​Guidelines

​Verification

​Limits

​Next Steps

Using Voices

Generate Speech

Models

How It Works

Requirements

Audio Quality

Content Guidelines

Creating a Voice Clone

Via Dashboard

Via SDK

Using Cloned Voices

Best Practices

Optimizing Voice Quality

Troubleshooting

Managing Cloned Voices

List Your Voices

Update Voice

Delete Voice

Managing Reference Audio

List References

Add Reference

Delete Reference

Publishing Voices

Generating Voice Samples

AI Transparency & Watermarking

Privacy & Ethics

Guidelines

Verification

Limits

Next Steps