Skip to main content

VoiceSession

The VoiceSession class manages real-time voice conversations with AI agents.

Import

from openstackai.voice import VoiceSession
from openstackai.voice.session import SessionState

Constructor

VoiceSession(
model: str = "gpt-4o-realtime", # Voice model
voice: str = "alloy", # Voice selection
agent: Agent = None, # Associated agent
language: str = "en", # Language code
sample_rate: int = 24000, # Audio sample rate
turn_detection: bool = True # Auto turn detection
)

Session States

StateDescription
IDLESession created but not connected
CONNECTINGEstablishing connection
CONNECTEDActive session
LISTENINGReceiving user audio
PROCESSINGAI processing input
SPEAKINGAI generating response
DISCONNECTEDSession ended

Basic Usage

Async Context Manager

session = VoiceSession()

async with session.connect() as voice:
# Session is active here
await voice.send_audio(audio_data)
response = await voice.receive()

Manual Connection

session = VoiceSession()

await session.connect()
try:
# Use session...
pass
finally:
await session.disconnect()

Methods

send_audio()

Send audio data to the session:

await session.send_audio(
audio_data: bytes,
commit: bool = True # End of utterance
)

receive()

Receive audio response:

# Get complete response
response = await session.receive()

# Stream response
async for chunk in session.stream_receive():
play_audio(chunk.data)

listen()

Listen for user input:

# Auto turn detection
user_audio = await session.listen()

# With timeout
user_audio = await session.listen(timeout=10.0)

respond()

Send audio and get response:

response = await session.respond(user_audio)

interrupt()

Interrupt current response:

await session.interrupt()

Configuration

Turn Detection

session.configure_turn_detection(
enabled=True,
threshold=0.5, # Sensitivity
prefix_padding=300, # ms before speech
silence_duration=800 # ms of silence to end turn
)

Audio Settings

session.configure_audio(
sample_rate=24000,
channels=1,
format="pcm16"
)

Events

Register event handlers:

@session.on("speech_started")
async def on_speech_started():
print("User started speaking")

@session.on("speech_ended")
async def on_speech_ended():
print("User stopped speaking")

@session.on("response_started")
async def on_response_started():
print("AI started responding")

@session.on("transcript_available")
async def on_transcript(text: str):
print(f"Transcript: {text}")

Properties

PropertyTypeDescription
stateSessionStateCurrent state
modelstrModel being used
voicestrVoice selection
is_connectedboolConnection status
session_idstrUnique session ID

See Also