Skip to main content
Voice cloning allows you to create a synthetic voice that sounds like a specific person from just a few seconds of reference audio.
Voice cloning is available on Business and Enterprise plans.

How It Works

  1. Upload reference audio - Provide 10-30 seconds of clean speech
  2. Processing - Our AI analyzes the voice characteristics
  3. Voice created - Use your new voice in any TTS request

Requirements

Audio Quality

For best results, your reference audio should be:
  • Duration: 10-30 seconds of speech
  • Format: WAV, MP3, or FLAC
  • Sample rate: 16kHz or higher
  • Channels: Mono preferred
  • Quality: Clean, no background noise

Content Guidelines

Good audio:
  • Clear speech with natural pacing
  • Single speaker only
  • Minimal background noise
  • Natural emotional range
  • Free of filler words (um, uh, ah, hmm) unless you want them in the output
Avoid:
  • Multiple speakers
  • Background music
  • Heavy reverb or echo
  • Whispered or shouted speech
  • Heavily compressed audio
  • Recordings with frequent filler sounds or hesitations
Your samples define the voice. The cloned voice will reproduce everything present in your reference audio — including filler sounds like “um”, “ah”, “hmm”, pauses, breathing patterns, and any other speech habits. If your reference audio contains these sounds, they will appear in the generated output and cannot be removed after cloning.For the most controllable results, use clean recordings without fillers. You can then add natural-sounding hesitations through your text prompts when needed (e.g., writing “um” or ”…” in the input text).

Creating a Voice Clone

Via Dashboard

  1. Go to DashboardVoicesCreate Voice
  2. Upload your reference audio
  3. Enter a name and description
  4. Click Create Voice
  5. Wait for processing (usually 2-5 minutes)

Via SDK

from kugelaudio import KugelAudio

client = KugelAudio(api_key="YOUR_API_KEY")

# Create a voice with reference audio
voice = client.voices.create(
    name="My Custom Voice",
    sex="female",
    description="Cloned from reference audio",
    category="cloned",
    reference_files=["reference.wav"],
)

print(f"Created voice: {voice.id}")
print(f"Name: {voice.name}")

Using Cloned Voices

Once created, use your cloned voice like any other:
from kugelaudio import KugelAudio

client = KugelAudio(api_key="YOUR_API_KEY")

# Use your cloned voice
audio = client.tts.generate(
    text="Hello, this is my cloned voice speaking!",
    model_id="kugel-1-turbo",
    voice_id=YOUR_CLONED_VOICE_ID,
)

audio.save("cloned_output.wav")

Best Practices

Optimizing Voice Quality

The quality of your cloned voice depends heavily on the source audio. Use professional recordings when possible.
Include a range of intonations, emotions, and sentence types in your reference audio for a more natural clone.
Experiment with different cfg_scale values. Cloned voices often benefit from slightly lower values (1.5-2.0) for more natural output.
The kugel-1 model generally produces better results for voice cloning due to its larger capacity.
If your output contains unwanted “um”s, “ah”s, or hesitations, re-record or edit your reference audio to remove them. The model faithfully reproduces what it hears in the samples — clean input produces clean, controllable output. You can always add fillers via your text prompts later.

Troubleshooting

IssueSolution
Voice sounds roboticUse higher quality source audio, try lower CFG scale
Voice sounds differentEnsure source audio is clean, try different text samples
Accent not preservedInclude more diverse samples, use longer reference audio
Inconsistent outputTry different CFG values (2.0–3.0)
Unwanted filler sounds (um, ah, hmm)Re-record or edit reference audio to remove fillers — see Content Guidelines

Managing Cloned Voices

List Your Voices

voices = client.voices.list()

for voice in voices:
    print(f"{voice.id}: {voice.name} ({voice.category})")

Update Voice

voice = client.voices.update(
    voice_id=123,
    name="Updated Name",
    description="Updated description",
)
print(f"Updated: {voice.name}")

Delete Voice

client.voices.delete(voice_id=123)

Managing Reference Audio

You can add and remove reference audio files after creating a voice.

List References

refs = client.voices.list_references(voice_id=123)
for ref in refs:
    print(f"{ref.id}: {ref.name}")

Add Reference

ref = client.voices.add_reference(
    voice_id=123,
    file_path="new_reference.wav",
    reference_text="Optional transcript of the audio.",
)
print(f"Added reference: {ref.id}")

Delete Reference

client.voices.delete_reference(voice_id=123, reference_id=456)

Publishing Voices

Request that your voice be made public. It will be marked as pending verification until reviewed by an admin.
voice = client.voices.publish(voice_id=123)
print(f"Pending verification: {voice.pending_verification}")

Generating Voice Samples

Trigger sample audio generation for a voice. This is done automatically on creation, but you can re-trigger it manually.
voice = client.voices.generate_sample(voice_id=123)
print(f"Sample URL: {voice.sample_url}")

AI Transparency & Watermarking

All audio generated by KugelAudio — including voice-cloned output — is automatically watermarked using AudioSeal, an imperceptible neural watermarking technique.
This watermarking is required under EU AI Act Article 50 (Regulation (EU) 2024/1689), which mandates that AI-generated audio content be marked in a machine-detectable way. The watermark is inaudible to humans and survives common post-processing operations (re-encoding, light compression).
The watermark encodes:
  • A KugelAudio-issued identifier linking the audio to the originating API key
  • A generation timestamp
This allows KugelAudio and auditors to verify whether a piece of audio was generated by the system, supporting abuse detection and regulatory compliance. What this means for you as an API customer:
  • You do not need to do anything — watermarking is applied automatically on every synthesis request.
  • If you redistribute AI-generated audio, you are responsible for complying with applicable disclosure obligations in your jurisdiction (e.g. labelling synthetic media in advertising or public communications).
  • The watermark does not affect audio quality at perceptible levels.

Privacy & Ethics

Only clone voices you have permission to use. Misuse of voice cloning technology may violate laws and our Terms of Service.

Guidelines

  1. Get consent - Always obtain permission before cloning someone’s voice
  2. Disclose synthetic speech - Be transparent when using cloned voices in public-facing contexts
  3. No impersonation - Don’t use cloned voices to deceive or defraud
  4. Respect rights - Don’t clone voices of public figures without authorization

Verification

For Business and Enterprise plans, we offer voice verification to ensure ethical use:
  1. Upload proof of consent
  2. Our team reviews the submission
  3. Voice is marked as “verified”
  4. Verified voices have no usage restrictions

Limits

PlanCloned VoicesStorage
Free0-
Starter0-
Business10100MB
EnterpriseUnlimitedUnlimited

Next Steps

Using Voices

Browse and use available voices

Generate Speech

Generate audio with your cloned voice

Models

Learn about available models