Voice cloning is available on Business and Enterprise plans.
How It Works
- Upload reference audio - Provide 10-30 seconds of clean speech
- Processing - Our AI analyzes the voice characteristics
- Voice created - Use your new voice in any TTS request
Requirements
Audio Quality
For best results, your reference audio should be:- Duration: 10-30 seconds of speech
- Format: WAV, MP3, or FLAC
- Sample rate: 16kHz or higher
- Channels: Mono preferred
- Quality: Clean, no background noise
Content Guidelines
✅ Good audio:- Clear speech with natural pacing
- Single speaker only
- Minimal background noise
- Natural emotional range
- Free of filler words (um, uh, ah, hmm) unless you want them in the output
- Multiple speakers
- Background music
- Heavy reverb or echo
- Whispered or shouted speech
- Heavily compressed audio
- Recordings with frequent filler sounds or hesitations
Creating a Voice Clone
Via Dashboard
- Go to Dashboard → Voices → Create Voice
- Upload your reference audio
- Enter a name and description
- Click Create Voice
- Wait for processing (usually 2-5 minutes)
Via SDK
- Python
- JavaScript
- cURL
Using Cloned Voices
Once created, use your cloned voice like any other:- Python
- JavaScript
- cURL
Best Practices
Optimizing Voice Quality
Use high-quality source audio
Use high-quality source audio
The quality of your cloned voice depends heavily on the source audio. Use professional recordings when possible.
Provide diverse samples
Provide diverse samples
Include a range of intonations, emotions, and sentence types in your reference audio for a more natural clone.
Adjust CFG scale
Adjust CFG scale
Experiment with different
cfg_scale values. Cloned voices often benefit from slightly lower values (1.5-2.0) for more natural output.Use the right model
Use the right model
The
kugel-1 model generally produces better results for voice cloning due to its larger capacity.Remove filler sounds from samples
Remove filler sounds from samples
If your output contains unwanted “um”s, “ah”s, or hesitations, re-record or edit your reference audio to remove them. The model faithfully reproduces what it hears in the samples — clean input produces clean, controllable output. You can always add fillers via your text prompts later.
Troubleshooting
| Issue | Solution |
|---|---|
| Voice sounds robotic | Use higher quality source audio, try lower CFG scale |
| Voice sounds different | Ensure source audio is clean, try different text samples |
| Accent not preserved | Include more diverse samples, use longer reference audio |
| Inconsistent output | Try different CFG values (2.0–3.0) |
| Unwanted filler sounds (um, ah, hmm) | Re-record or edit reference audio to remove fillers — see Content Guidelines |
Managing Cloned Voices
List Your Voices
- Python
- JavaScript
- cURL
Update Voice
- Python
- JavaScript
- cURL
Delete Voice
- Python
- JavaScript
- cURL
Managing Reference Audio
You can add and remove reference audio files after creating a voice.List References
- Python
- JavaScript
Add Reference
- Python
- JavaScript
Delete Reference
- Python
- JavaScript
Publishing Voices
Request that your voice be made public. It will be marked as pending verification until reviewed by an admin.- Python
- JavaScript
Generating Voice Samples
Trigger sample audio generation for a voice. This is done automatically on creation, but you can re-trigger it manually.- Python
- JavaScript
AI Transparency & Watermarking
All audio generated by KugelAudio — including voice-cloned output — is automatically watermarked using AudioSeal, an imperceptible neural watermarking technique.This watermarking is required under EU AI Act Article 50 (Regulation (EU) 2024/1689), which mandates that AI-generated audio content be marked in a machine-detectable way. The watermark is inaudible to humans and survives common post-processing operations (re-encoding, light compression).
- A KugelAudio-issued identifier linking the audio to the originating API key
- A generation timestamp
- You do not need to do anything — watermarking is applied automatically on every synthesis request.
- If you redistribute AI-generated audio, you are responsible for complying with applicable disclosure obligations in your jurisdiction (e.g. labelling synthetic media in advertising or public communications).
- The watermark does not affect audio quality at perceptible levels.
Privacy & Ethics
Guidelines
- Get consent - Always obtain permission before cloning someone’s voice
- Disclose synthetic speech - Be transparent when using cloned voices in public-facing contexts
- No impersonation - Don’t use cloned voices to deceive or defraud
- Respect rights - Don’t clone voices of public figures without authorization
Verification
For Business and Enterprise plans, we offer voice verification to ensure ethical use:- Upload proof of consent
- Our team reviews the submission
- Voice is marked as “verified”
- Verified voices have no usage restrictions
Limits
| Plan | Cloned Voices | Storage |
|---|---|---|
| Free | 0 | - |
| Starter | 0 | - |
| Business | 10 | 100MB |
| Enterprise | Unlimited | Unlimited |
Next Steps
Using Voices
Browse and use available voices
Generate Speech
Generate audio with your cloned voice
Models
Learn about available models