VoiceSession

The VoiceSession class manages real-time voice conversations with AI agents.

Import

from openstackai.voice import VoiceSession
from openstackai.voice.session import SessionState

Constructor

VoiceSession(
    model: str = "gpt-4o-realtime",   # Voice model
    voice: str = "alloy",              # Voice selection
    agent: Agent = None,               # Associated agent
    language: str = "en",              # Language code
    sample_rate: int = 24000,          # Audio sample rate
    turn_detection: bool = True        # Auto turn detection
)

Session States

State	Description
`IDLE`	Session created but not connected
`CONNECTING`	Establishing connection
`CONNECTED`	Active session
`LISTENING`	Receiving user audio
`PROCESSING`	AI processing input
`SPEAKING`	AI generating response
`DISCONNECTED`	Session ended

Basic Usage

Async Context Manager

session = VoiceSession()

async with session.connect() as voice:
    # Session is active here
    await voice.send_audio(audio_data)
    response = await voice.receive()

Manual Connection

session = VoiceSession()

await session.connect()
try:
    # Use session...
    pass
finally:
    await session.disconnect()

Methods

send_audio()

Send audio data to the session:

await session.send_audio(
    audio_data: bytes,
    commit: bool = True  # End of utterance
)

receive()

Receive audio response:

# Get complete response
response = await session.receive()

# Stream response
async for chunk in session.stream_receive():
    play_audio(chunk.data)

listen()

Listen for user input:

# Auto turn detection
user_audio = await session.listen()

# With timeout
user_audio = await session.listen(timeout=10.0)

respond()

Send audio and get response:

response = await session.respond(user_audio)

interrupt()

Interrupt current response:

await session.interrupt()

Configuration

Turn Detection

session.configure_turn_detection(
    enabled=True,
    threshold=0.5,        # Sensitivity
    prefix_padding=300,   # ms before speech
    silence_duration=800  # ms of silence to end turn
)

Audio Settings

session.configure_audio(
    sample_rate=24000,
    channels=1,
    format="pcm16"
)

Events

@session.on("speech_started")
async def on_speech_started():
    print("User started speaking")

@session.on("speech_ended")
async def on_speech_ended():
    print("User stopped speaking")

@session.on("response_started")
async def on_response_started():
    print("AI started responding")

@session.on("transcript_available")
async def on_transcript(text: str):
    print(f"Transcript: {text}")

Properties

Property	Type	Description
`state`	SessionState	Current state
`model`	str	Model being used
`voice`	str	Voice selection
`is_connected`	bool	Connection status
`session_id`	str	Unique session ID

Import​

Constructor​

Session States​

Basic Usage​

Async Context Manager​

Manual Connection​

Methods​

send_audio()​

receive()​

listen()​

respond()​

interrupt()​

Configuration​

Turn Detection​

Audio Settings​

Events​

Properties​

See Also​