Voice Module
The Voice module enables real-time voice interactions with AI agents, including speech-to-text, text-to-speech, and streaming audio processing.
Overview
from openstackai.voice import VoiceSession, Transcriber, Synthesizer
from openstackai.voice.stream import AudioStream, AudioChunk
Key Components
| Component | Description |
|---|---|
| VoiceSession | Manages voice conversation sessions |
| Transcription | Speech-to-text conversion |
| Synthesis | Text-to-speech generation |
| AudioStream | Real-time audio streaming |
Quick Start
Basic Voice Interaction
from openstackai.voice import VoiceSession
# Create voice session
session = VoiceSession(
model="gpt-4o-realtime",
voice="alloy"
)
# Start conversation
async with session.connect() as voice:
# Send audio
await voice.send_audio(audio_data)
# Receive response
async for chunk in voice.receive():
play_audio(chunk)
Transcription Only
from openstackai.voice import Transcriber
transcriber = Transcriber(model="whisper-1")
# Transcribe audio file
result = transcriber.transcribe("recording.wav")
print(result.text)
# Transcribe with timestamps
result = transcriber.transcribe(
"meeting.mp3",
timestamps=True,
language="en"
)
Text-to-Speech Only
from openstackai.voice import Synthesizer
synth = Synthesizer(voice="nova")
# Generate speech
audio = synth.speak("Hello, how can I help you today?")
audio.save("greeting.mp3")
# Stream speech
for chunk in synth.stream("This is a longer message..."):
play_audio(chunk)
Audio Formats
Supported formats:
- PCM16: Raw PCM audio (16-bit)
- WAV: Waveform Audio
- MP3: MPEG Audio Layer III
- OGG: Ogg Vorbis
from openstackai.voice.stream import AudioFormat
chunk = AudioChunk(
data=audio_bytes,
format=AudioFormat.PCM16,
sample_rate=24000
)
Voice Options
Available voices:
alloy- Neutral, balancedecho- Warm, conversationalfable- Expressive, narrativeonyx- Deep, authoritativenova- Friendly, upbeatshimmer- Clear, professional
Real-time Streaming
from openstackai.voice import VoiceSession
async def voice_assistant():
session = VoiceSession()
async with session.connect() as voice:
# Enable turn detection
voice.enable_turn_detection(
threshold=0.5,
silence_duration=0.8
)
# Continuous conversation
while True:
# User speaks
user_audio = await voice.listen()
# Agent responds
await voice.respond(user_audio)
Integration with Agents
from openstackai import Agent
from openstackai.voice import VoiceSession
agent = Agent(
name="VoiceAssistant",
instructions="You are a helpful voice assistant."
)
# Attach voice capabilities
session = VoiceSession(agent=agent)
See Also
- VoiceSession - Session management
- Transcription - Speech-to-text
- Synthesis - Text-to-speech