Synthesis

The synthesis module converts text to speech using AI voice models.

Import

from openstackai.voice import Synthesizer
from openstackai.voice.synthesis import SynthesisResult

Synthesizer Class

Constructor

Synthesizer(
    model: str = "tts-1",        # TTS model
    voice: str = "alloy",         # Voice selection
    speed: float = 1.0,           # Speech speed (0.25-4.0)
    response_format: str = "mp3"  # Output format
)

Available Voices

Voice	Description
`alloy`	Neutral, balanced
`echo`	Warm, conversational
`fable`	Expressive, British accent
`onyx`	Deep, authoritative
`nova`	Friendly, energetic
`shimmer`	Clear, professional

Basic Usage

Generate Speech

synthesizer = Synthesizer(voice="nova")

# Generate audio
result = synthesizer.speak("Hello, how can I help you today?")

# Save to file
result.save("greeting.mp3")

# Get bytes
audio_bytes = result.audio_data

Stream Speech

# For longer text, stream to reduce latency
for chunk in synthesizer.stream("This is a longer message that will be streamed..."):
    play_audio(chunk.data)

SynthesisResult

The result object contains:

result.audio_data    # Raw audio bytes
result.format        # Audio format (mp3, wav, etc.)
result.duration      # Duration in seconds
result.sample_rate   # Sample rate
result.voice         # Voice used

Save Methods

# Save with format
result.save("output.mp3")
result.save("output.wav", format="wav")
result.save("output.ogg", format="opus")

Output Formats

# MP3 (default, smallest)
synth = Synthesizer(response_format="mp3")

# Opus (low latency streaming)
synth = Synthesizer(response_format="opus")

# AAC (high quality)
synth = Synthesizer(response_format="aac")

# FLAC (lossless)
synth = Synthesizer(response_format="flac")

# WAV (uncompressed)
synth = Synthesizer(response_format="wav")

# PCM (raw audio)
synth = Synthesizer(response_format="pcm")

Quality Models

# Standard quality (faster, cheaper)
synth = Synthesizer(model="tts-1")

# HD quality (higher fidelity)
synth = Synthesizer(model="tts-1-hd")

Speed Control

# Slower speech
synth = Synthesizer(speed=0.75)

# Faster speech
synth = Synthesizer(speed=1.5)

# Range: 0.25 to 4.0

Batch Processing

texts = [
    "Welcome to our service.",
    "How can I assist you today?",
    "Thank you for your patience."
]

results = synthesizer.batch_speak(texts)

for i, result in enumerate(results):
    result.save(f"audio_{i}.mp3")

SSML Support

For advanced control (when supported):

ssml_text = """
<speak>
    <emphasis level="strong">Welcome</emphasis> to our service.
    <break time="500ms"/>
    How may I <prosody rate="slow">assist you</prosody> today?
</speak>
"""

result = synthesizer.speak(ssml_text, ssml=True)

Async Usage

async def generate_speech_async():
    synth = Synthesizer()
    
    # Async generation
    result = await synth.speak_async("Hello, world!")
    
    # Async streaming
    async for chunk in synth.stream_async("Long text here..."):
        await play_audio_async(chunk)

Import​

Synthesizer Class​

Constructor​

Available Voices​

Basic Usage​

Generate Speech​

Stream Speech​

SynthesisResult​

Save Methods​

Output Formats​

Quality Models​

Speed Control​

Batch Processing​

SSML Support​

Async Usage​

See Also​

Import

Synthesizer Class

Constructor

Available Voices

Basic Usage

Generate Speech

Stream Speech

SynthesisResult

Save Methods

Output Formats

Quality Models

Speed Control

Batch Processing

SSML Support

Async Usage

See Also