Voice Module

The Voice module enables real-time voice interactions with AI agents, including speech-to-text, text-to-speech, and streaming audio processing.

Overview

from openstackai.voice import VoiceSession, Transcriber, Synthesizer
from openstackai.voice.stream import AudioStream, AudioChunk

Key Components

Component	Description
VoiceSession	Manages voice conversation sessions
Transcription	Speech-to-text conversion
Synthesis	Text-to-speech generation
AudioStream	Real-time audio streaming

Quick Start

Basic Voice Interaction

from openstackai.voice import VoiceSession

# Create voice session
session = VoiceSession(
    model="gpt-4o-realtime",
    voice="alloy"
)

# Start conversation
async with session.connect() as voice:
    # Send audio
    await voice.send_audio(audio_data)
    
    # Receive response
    async for chunk in voice.receive():
        play_audio(chunk)

Transcription Only

from openstackai.voice import Transcriber

transcriber = Transcriber(model="whisper-1")

# Transcribe audio file
result = transcriber.transcribe("recording.wav")
print(result.text)

# Transcribe with timestamps
result = transcriber.transcribe(
    "meeting.mp3",
    timestamps=True,
    language="en"
)

Text-to-Speech Only

from openstackai.voice import Synthesizer

synth = Synthesizer(voice="nova")

# Generate speech
audio = synth.speak("Hello, how can I help you today?")
audio.save("greeting.mp3")

# Stream speech
for chunk in synth.stream("This is a longer message..."):
    play_audio(chunk)

Audio Formats

Supported formats:

PCM16: Raw PCM audio (16-bit)
WAV: Waveform Audio
MP3: MPEG Audio Layer III
OGG: Ogg Vorbis

from openstackai.voice.stream import AudioFormat

chunk = AudioChunk(
    data=audio_bytes,
    format=AudioFormat.PCM16,
    sample_rate=24000
)

Voice Options

Available voices:

alloy - Neutral, balanced
echo - Warm, conversational
fable - Expressive, narrative
onyx - Deep, authoritative
nova - Friendly, upbeat
shimmer - Clear, professional

Real-time Streaming

from openstackai.voice import VoiceSession

async def voice_assistant():
    session = VoiceSession()
    
    async with session.connect() as voice:
        # Enable turn detection
        voice.enable_turn_detection(
            threshold=0.5,
            silence_duration=0.8
        )
        
        # Continuous conversation
        while True:
            # User speaks
            user_audio = await voice.listen()
            
            # Agent responds
            await voice.respond(user_audio)

Integration with Agents

from openstackai import Agent
from openstackai.voice import VoiceSession

agent = Agent(
    name="VoiceAssistant",
    instructions="You are a helpful voice assistant."
)

# Attach voice capabilities
session = VoiceSession(agent=agent)

Overview​

Key Components​

Quick Start​

Basic Voice Interaction​

Transcription Only​

Text-to-Speech Only​

Audio Formats​

Voice Options​

Real-time Streaming​

Integration with Agents​

See Also​

Overview

Key Components

Quick Start

Basic Voice Interaction

Transcription Only

Text-to-Speech Only

Audio Formats

Voice Options

Real-time Streaming

Integration with Agents

See Also