Types & Errors - KugelAudio

Error Handling

from kugelaudio import KugelAudio
from kugelaudio.exceptions import (
    KugelAudioError,
    AuthenticationError,
    RateLimitError,
    InsufficientCreditsError,
    ValidationError,
    NotFoundError,
)
# ConnectionError is exported from the package root as KugelAudioConnectionError
from kugelaudio import KugelAudioConnectionError

try:
    audio = client.tts.generate(text="Hello!")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limit exceeded, please wait")
except InsufficientCreditsError:
    print("Not enough credits, please top up")
except NotFoundError:
    print("Voice, model, or dictionary not found")
except ValidationError as e:
    print(f"Invalid request: {e}")
except KugelAudioConnectionError as e:
    print(f"WebSocket/network error: {e}")
except KugelAudioError as e:
    print(f"API error: {e}")

All exceptions inherit from KugelAudioError:

Exception	Raised when
`AuthenticationError`	API key is missing, invalid, or revoked.
`RateLimitError`	Request rate limit exceeded.
`InsufficientCreditsError`	The account/wallet has no remaining credits.
`NotFoundError`	A referenced voice, model, dictionary, or entry doesn’t exist or isn’t visible to the caller (HTTP 404).
`ValidationError`	The request was malformed or a parameter was out of range.
`ConnectionError`	A WebSocket/network error occurred. Exported from the package root as `KugelAudioConnectionError` to avoid shadowing the built-in.

Data Models

All models are importable from kugelaudio (e.g. from kugelaudio import AudioChunk, StreamConfig).

AudioChunk

Represents a single audio chunk from streaming:

class AudioChunk:
    audio: bytes          # Raw PCM16 audio data
    index: int            # Chunk index (0-based)
    sample_rate: int      # Sample rate (24000)
    samples: int          # Number of samples in chunk

    @property
    def duration_seconds(self) -> float:
        """Duration of this chunk in seconds."""

    def to_float32(self) -> list[float]:
        """Convert PCM16 to float32 samples in [-1.0, 1.0]."""

AudioResponse

Complete audio response from generation:

class AudioResponse:
    audio: bytes                          # Complete PCM16 audio
    sample_rate: int                      # Sample rate (24000)
    samples: int                          # Total samples
    duration_ms: float                    # Duration in milliseconds
    generation_ms: float                  # Generation time in milliseconds
    rtf: float                            # Real-time factor
    word_timestamps: list[WordTimestamp]  # Per-word timing (when word_timestamps=True)
    usage: SessionUsage | None            # Per-request usage (audio time + charge); None if not reported

    @property
    def duration_seconds(self) -> float:
        """Duration in seconds."""

    def to_float32(self) -> list[float]:
        """Convert PCM16 to float32 samples in [-1.0, 1.0]."""

    def save(self, path: str, format: str = "wav") -> None:
        """Save audio to a file. format is 'wav' or 'raw' (headerless PCM)."""

    def to_wav_bytes(self) -> bytes:
        """Get WAV file as bytes."""

WordTimestamp

Word-level time alignment for a generated audio chunk:

class WordTimestamp:
    word: str          # The aligned word
    start_ms: int      # Start time in milliseconds (relative to chunk)
    end_ms: int        # End time in milliseconds (relative to chunk)
    char_start: int    # Start character offset in original text
    char_end: int      # End character offset in original text
    score: float       # Alignment confidence (0.0 - 1.0)

    @property
    def duration_ms(self) -> int:
        """end_ms - start_ms."""

SessionUsage

Per-conversation usage for billing your own customers. Available on StreamingSession.last_usage (per session), MultiContextSession.usage_for(...) (per context), and AudioResponse.usage (per generate() request).

class SessionUsage:
    audio_seconds: float          # Audio generated (the unit we bill on)
    cost_cents: float | None      # Actual charge in EUR cents; None if undetermined
    currency: str | None          # Currency of cost_cents ("eur"); set only when cost_cents is
    characters: int | None        # Input characters; omitted on multi-context per-context usage
    model_id: str | None          # Model that produced the audio

    @property
    def cost_available(self) -> bool:
        """True when an authoritative charge was returned (cost_cents is not None)."""

cost_cents is None (and cost_available is False) when the charge cannot be determined at session end — e.g. a transient billing error or an internal session. It is never a misleading 0. audio_seconds is always reported, so you can still reconcile from the audio you received.

Model

TTS model information (returned by client.models.list()):

class Model:
    id: str                   # e.g. 'kugel-3'
    name: str                 # Human-readable name
    description: str          # Model description
    parameters: str           # Parameter-count label (e.g. '7B')
    max_input_length: int     # Maximum input characters (default 5000)
    sample_rate: int          # Output sample rate (default 24000)

StreamConfig

Configuration object for streaming sessions. Every field is also accepted as a keyword argument on streaming_session(...) / streaming_session_sync(...), so you only need StreamConfig directly when calling session.update_config().

class StreamConfig:
    voice_id: int | None = None
    model_id: str | None = None
    cfg_scale: float = 2.0
    output_format: str | None = None  # e.g. "pcm_24000", "ulaw_8000", "alaw_8000"
    temperature: float | None = None
    max_new_tokens: int = 2048
    sample_rate: int = 24000
    flush_timeout_ms: int = 500
    max_buffer_length: int = 1000
    normalize: bool = True
    language: str | None = None
    word_timestamps: bool = False
    chunk_length_schedule: list[int] | None = None  # default [5, 80, 150, 250]
    auto_mode: bool | None = None
    speed: float = 1.0

Dictionary, DictionaryEntry & results

class Dictionary:
    id: int
    project_id: int
    name: str
    description: str | None = None
    language: str | None = None
    is_active: bool = True
    created_at: str | None = None
    updated_at: str | None = None

class DictionaryEntry:
    id: int
    dictionary_id: int
    word: str
    replacement: str
    ipa: str | None = None
    case_sensitive: bool = False
    created_at: str | None = None
    updated_at: str | None = None

class DictionaryEntryList:      # paginated response from entries.list()
    entries: list[DictionaryEntry]
    total: int
    limit: int
    offset: int

class BulkReplaceResult:        # returned by entries.replace_all()
    upserted: int
    deleted: int
    total: int

Enums

category, sex, and age on voice models are string enums (importable from kugelaudio):

class VoiceCategory(str, Enum):
    PREMADE, CLONED, DESIGNED, CONVERSATIONAL, NARRATIVE, NARRATIVE_STORY, CHARACTERS

class VoiceSex(str, Enum):
    MALE, FEMALE, NEUTRAL

class VoiceAge(str, Enum):
    YOUNG, MIDDLE_AGED, MIDDLE_AGE, OLD

VoiceCategory falls back to CLONED for any value the SDK doesn’t recognize, so newer server-side categories never raise on deserialization.

VoiceListResponse

Paginated response from voices.list():

class VoiceListResponse:
    voices: List[Voice]   # Voices on this page
    total: int            # Total number of matching voices
    limit: int            # Page size used
    offset: int           # Offset used

Voice

Voice information (items in voices.list().voices):

class Voice:
    id: int                                # Voice ID
    name: str                              # Voice name
    description: str | None = None
    category: VoiceCategory | None = None  # see Enums
    sex: VoiceSex | None = None
    age: VoiceAge | None = None
    supported_languages: list[str] = []    # ['en', 'de', ...]
    sample_text: str | None = None
    avatar_url: str | None = None          # Avatar image URL
    sample_url: str | None = None          # Sample audio URL
    is_public: bool = False
    verified: bool = False

VoiceDetail

Extended voice information (returned by get, create, update, publish, generate_sample):

class VoiceDetail:
    id: int
    name: str
    description: str = ""
    generative_voice_description: str = ""
    supported_languages: list[str] = []
    category: VoiceCategory | None = None
    age: VoiceAge | None = None
    sex: VoiceSex | None = None
    quality: str = "mid"                   # 'low', 'mid', 'high'
    is_public: bool = False
    verified: bool = False
    pending_verification: bool = False
    sample_url: str | None = None
    avatar_url: str | None = None
    sample_text: str = ""

VoiceReference

Voice reference audio metadata:

class VoiceReference:
    id: int
    voice_id: int
    name: str
    reference_text: str
    s3_path: str
    audio_url: Optional[str]
    is_generated: bool

Next steps

Quickstart — install and first generation
Streaming — where StreamConfig and SessionUsage are used

​Error Handling

​Data Models

​AudioChunk

​AudioResponse

​WordTimestamp

​SessionUsage

​Model

​StreamConfig

​Dictionary, DictionaryEntry & results

​Enums

​VoiceListResponse

​Voice

​VoiceDetail

​VoiceReference

​Next steps

Error Handling

Data Models

AudioChunk

AudioResponse

WordTimestamp

SessionUsage

Model

StreamConfig

Dictionary, DictionaryEntry & results

Enums

VoiceListResponse

Voice

VoiceDetail

VoiceReference

Next steps