Skip to main content

Synthesis

The synthesis module converts text to speech using AI voice models.

Import

from openstackai.voice import Synthesizer
from openstackai.voice.synthesis import SynthesisResult

Synthesizer Class

Constructor

Synthesizer(
model: str = "tts-1", # TTS model
voice: str = "alloy", # Voice selection
speed: float = 1.0, # Speech speed (0.25-4.0)
response_format: str = "mp3" # Output format
)

Available Voices

VoiceDescription
alloyNeutral, balanced
echoWarm, conversational
fableExpressive, British accent
onyxDeep, authoritative
novaFriendly, energetic
shimmerClear, professional

Basic Usage

Generate Speech

synthesizer = Synthesizer(voice="nova")

# Generate audio
result = synthesizer.speak("Hello, how can I help you today?")

# Save to file
result.save("greeting.mp3")

# Get bytes
audio_bytes = result.audio_data

Stream Speech

# For longer text, stream to reduce latency
for chunk in synthesizer.stream("This is a longer message that will be streamed..."):
play_audio(chunk.data)

SynthesisResult

The result object contains:

result.audio_data    # Raw audio bytes
result.format # Audio format (mp3, wav, etc.)
result.duration # Duration in seconds
result.sample_rate # Sample rate
result.voice # Voice used

Save Methods

# Save with format
result.save("output.mp3")
result.save("output.wav", format="wav")
result.save("output.ogg", format="opus")

Output Formats

# MP3 (default, smallest)
synth = Synthesizer(response_format="mp3")

# Opus (low latency streaming)
synth = Synthesizer(response_format="opus")

# AAC (high quality)
synth = Synthesizer(response_format="aac")

# FLAC (lossless)
synth = Synthesizer(response_format="flac")

# WAV (uncompressed)
synth = Synthesizer(response_format="wav")

# PCM (raw audio)
synth = Synthesizer(response_format="pcm")

Quality Models

# Standard quality (faster, cheaper)
synth = Synthesizer(model="tts-1")

# HD quality (higher fidelity)
synth = Synthesizer(model="tts-1-hd")

Speed Control

# Slower speech
synth = Synthesizer(speed=0.75)

# Faster speech
synth = Synthesizer(speed=1.5)

# Range: 0.25 to 4.0

Batch Processing

texts = [
"Welcome to our service.",
"How can I assist you today?",
"Thank you for your patience."
]

results = synthesizer.batch_speak(texts)

for i, result in enumerate(results):
result.save(f"audio_{i}.mp3")

SSML Support

For advanced control (when supported):

ssml_text = """
<speak>
<emphasis level="strong">Welcome</emphasis> to our service.
<break time="500ms"/>
How may I <prosody rate="slow">assist you</prosody> today?
</speak>
"""

result = synthesizer.speak(ssml_text, ssml=True)

Async Usage

async def generate_speech_async():
synth = Synthesizer()

# Async generation
result = await synth.speak_async("Hello, world!")

# Async streaming
async for chunk in synth.stream_async("Long text here..."):
await play_audio_async(chunk)

See Also