Skip to main content

Transcription

The transcription module converts speech audio to text using AI models.

Import

from openstackai.voice import Transcriber
from openstackai.voice.transcription import TranscriptionResult

Transcriber Class

Constructor

Transcriber(
model: str = "whisper-1", # Transcription model
language: str = None, # Language hint (auto-detect if None)
response_format: str = "json" # Output format
)

Basic Usage

Transcribe File

transcriber = Transcriber()

# Simple transcription
result = transcriber.transcribe("audio.wav")
print(result.text)

With Options

result = transcriber.transcribe(
"meeting.mp3",
language="en",
timestamps=True,
word_timestamps=True
)

# Access segments
for segment in result.segments:
print(f"[{segment.start:.2f}s] {segment.text}")

Transcribe Bytes

with open("audio.wav", "rb") as f:
audio_data = f.read()

result = transcriber.transcribe_bytes(
audio_data,
format="wav"
)

TranscriptionResult

The result object contains:

result.text          # Full transcription text
result.language # Detected language
result.confidence # Overall confidence score
result.duration # Audio duration in seconds
result.segments # List of segments with timestamps
result.words # Word-level timestamps (if requested)

Segment Structure

segment.id        # Segment index
segment.start # Start time (seconds)
segment.end # End time (seconds)
segment.text # Segment text
segment.confidence # Segment confidence

Streaming Transcription

For real-time transcription:

async def stream_transcribe(audio_stream):
transcriber = Transcriber()

async for result in transcriber.stream(audio_stream):
print(f"Partial: {result.text}")

if result.is_final:
print(f"Final: {result.text}")

Language Support

Supported languages include:

  • English (en)
  • Spanish (es)
  • French (fr)
  • German (de)
  • Chinese (zh)
  • Japanese (ja)
  • Korean (ko)
  • And 50+ more...
# Force language
result = transcriber.transcribe(
"audio.wav",
language="es" # Spanish
)

# Auto-detect
result = transcriber.transcribe("audio.wav")
print(f"Detected: {result.language}")

Translation

Translate audio to English:

# Transcribe + translate
result = transcriber.translate("french_audio.wav")
# Output is in English regardless of source language

Batch Processing

files = ["audio1.wav", "audio2.wav", "audio3.wav"]
results = transcriber.batch_transcribe(files)

for file, result in zip(files, results):
print(f"{file}: {result.text}")

Output Formats

# JSON (default)
result = transcriber.transcribe("audio.wav", response_format="json")

# Plain text
text = transcriber.transcribe("audio.wav", response_format="text")

# SRT subtitles
srt = transcriber.transcribe("audio.wav", response_format="srt")

# VTT subtitles
vtt = transcriber.transcribe("audio.wav", response_format="vtt")

See Also