Error Handling
from kugelaudio import KugelAudio
from kugelaudio.exceptions import (
KugelAudioError,
AuthenticationError,
RateLimitError,
InsufficientCreditsError,
ValidationError,
NotFoundError,
)
# ConnectionError is exported from the package root as KugelAudioConnectionError
from kugelaudio import KugelAudioConnectionError
try:
audio = client.tts.generate(text="Hello!")
except AuthenticationError:
print("Invalid API key")
except RateLimitError:
print("Rate limit exceeded, please wait")
except InsufficientCreditsError:
print("Not enough credits, please top up")
except NotFoundError:
print("Voice, model, or dictionary not found")
except ValidationError as e:
print(f"Invalid request: {e}")
except KugelAudioConnectionError as e:
print(f"WebSocket/network error: {e}")
except KugelAudioError as e:
print(f"API error: {e}")
All exceptions inherit from KugelAudioError:
| Exception | Raised when |
|---|
AuthenticationError | API key is missing, invalid, or revoked. |
RateLimitError | Request rate limit exceeded. |
InsufficientCreditsError | The account/wallet has no remaining credits. |
NotFoundError | A referenced voice, model, dictionary, or entry doesn’t exist or isn’t visible to the caller (HTTP 404). |
ValidationError | The request was malformed or a parameter was out of range. |
ConnectionError | A WebSocket/network error occurred. Exported from the package root as KugelAudioConnectionError to avoid shadowing the built-in. |
Data Models
All models are importable from kugelaudio (e.g. from kugelaudio import AudioChunk, StreamConfig).
AudioChunk
Represents a single audio chunk from streaming:
class AudioChunk:
audio: bytes # Raw PCM16 audio data
index: int # Chunk index (0-based)
sample_rate: int # Sample rate (24000)
samples: int # Number of samples in chunk
@property
def duration_seconds(self) -> float:
"""Duration of this chunk in seconds."""
def to_float32(self) -> list[float]:
"""Convert PCM16 to float32 samples in [-1.0, 1.0]."""
AudioResponse
Complete audio response from generation:
class AudioResponse:
audio: bytes # Complete PCM16 audio
sample_rate: int # Sample rate (24000)
samples: int # Total samples
duration_ms: float # Duration in milliseconds
generation_ms: float # Generation time in milliseconds
rtf: float # Real-time factor
word_timestamps: list[WordTimestamp] # Per-word timing (when word_timestamps=True)
usage: SessionUsage | None # Per-request usage (audio time + charge); None if not reported
@property
def duration_seconds(self) -> float:
"""Duration in seconds."""
def to_float32(self) -> list[float]:
"""Convert PCM16 to float32 samples in [-1.0, 1.0]."""
def save(self, path: str, format: str = "wav") -> None:
"""Save audio to a file. format is 'wav' or 'raw' (headerless PCM)."""
def to_wav_bytes(self) -> bytes:
"""Get WAV file as bytes."""
WordTimestamp
Word-level time alignment for a generated audio chunk:
class WordTimestamp:
word: str # The aligned word
start_ms: int # Start time in milliseconds (relative to chunk)
end_ms: int # End time in milliseconds (relative to chunk)
char_start: int # Start character offset in original text
char_end: int # End character offset in original text
score: float # Alignment confidence (0.0 - 1.0)
@property
def duration_ms(self) -> int:
"""end_ms - start_ms."""
SessionUsage
Per-conversation usage for billing your own customers. Available on
StreamingSession.last_usage (per session), MultiContextSession.usage_for(...)
(per context), and AudioResponse.usage (per generate() request).
class SessionUsage:
audio_seconds: float # Audio generated (the unit we bill on)
cost_cents: float | None # Actual charge in EUR cents; None if undetermined
currency: str | None # Currency of cost_cents ("eur"); set only when cost_cents is
characters: int | None # Input characters; omitted on multi-context per-context usage
model_id: str | None # Model that produced the audio
@property
def cost_available(self) -> bool:
"""True when an authoritative charge was returned (cost_cents is not None)."""
cost_cents is None (and cost_available is False) when the charge
cannot be determined at session end — e.g. a transient billing error or an
internal session. It is never a misleading 0. audio_seconds is always
reported, so you can still reconcile from the audio you received.
Model
TTS model information (returned by client.models.list()):
class Model:
id: str # e.g. 'kugel-3'
name: str # Human-readable name
description: str # Model description
parameters: str # Parameter-count label (e.g. '7B')
max_input_length: int # Maximum input characters (default 5000)
sample_rate: int # Output sample rate (default 24000)
StreamConfig
Configuration object for streaming sessions. Every field is also accepted as a
keyword argument on streaming_session(...) / streaming_session_sync(...),
so you only need StreamConfig directly when calling session.update_config().
class StreamConfig:
voice_id: int | None = None
model_id: str | None = None
cfg_scale: float = 2.0
output_format: str | None = None # e.g. "pcm_24000", "ulaw_8000", "alaw_8000"
temperature: float | None = None
max_new_tokens: int = 2048
sample_rate: int = 24000
flush_timeout_ms: int = 500
max_buffer_length: int = 1000
normalize: bool = True
language: str | None = None
word_timestamps: bool = False
chunk_length_schedule: list[int] | None = None # default [5, 80, 150, 250]
auto_mode: bool | None = None
speed: float = 1.0
Dictionary, DictionaryEntry & results
class Dictionary:
id: int
project_id: int
name: str
description: str | None = None
language: str | None = None
is_active: bool = True
created_at: str | None = None
updated_at: str | None = None
class DictionaryEntry:
id: int
dictionary_id: int
word: str
replacement: str
ipa: str | None = None
case_sensitive: bool = False
created_at: str | None = None
updated_at: str | None = None
class DictionaryEntryList: # paginated response from entries.list()
entries: list[DictionaryEntry]
total: int
limit: int
offset: int
class BulkReplaceResult: # returned by entries.replace_all()
upserted: int
deleted: int
total: int
Enums
category, sex, and age on voice models are string enums (importable from
kugelaudio):
class VoiceCategory(str, Enum):
PREMADE, CLONED, DESIGNED, CONVERSATIONAL, NARRATIVE, NARRATIVE_STORY, CHARACTERS
class VoiceSex(str, Enum):
MALE, FEMALE, NEUTRAL
class VoiceAge(str, Enum):
YOUNG, MIDDLE_AGED, MIDDLE_AGE, OLD
VoiceCategory falls back to CLONED for any value the SDK doesn’t
recognize, so newer server-side categories never raise on deserialization.
VoiceListResponse
Paginated response from voices.list():
class VoiceListResponse:
voices: List[Voice] # Voices on this page
total: int # Total number of matching voices
limit: int # Page size used
offset: int # Offset used
Voice
Voice information (items in voices.list().voices):
class Voice:
id: int # Voice ID
name: str # Voice name
description: str | None = None
category: VoiceCategory | None = None # see Enums
sex: VoiceSex | None = None
age: VoiceAge | None = None
supported_languages: list[str] = [] # ['en', 'de', ...]
sample_text: str | None = None
avatar_url: str | None = None # Avatar image URL
sample_url: str | None = None # Sample audio URL
is_public: bool = False
verified: bool = False
VoiceDetail
Extended voice information (returned by get, create, update, publish, generate_sample):
class VoiceDetail:
id: int
name: str
description: str = ""
generative_voice_description: str = ""
supported_languages: list[str] = []
category: VoiceCategory | None = None
age: VoiceAge | None = None
sex: VoiceSex | None = None
quality: str = "mid" # 'low', 'mid', 'high'
is_public: bool = False
verified: bool = False
pending_verification: bool = False
sample_url: str | None = None
avatar_url: str | None = None
sample_text: str = ""
VoiceReference
Voice reference audio metadata:
class VoiceReference:
id: int
voice_id: int
name: str
reference_text: str
s3_path: str
audio_url: Optional[str]
is_generated: bool
Next steps
- Quickstart — install and first generation
- Streaming — where
StreamConfig and SessionUsage are used