Transcription

The transcription module converts speech audio to text using AI models.

Import

from openstackai.voice import Transcriber
from openstackai.voice.transcription import TranscriptionResult

Transcriber Class

Constructor

Transcriber(
    model: str = "whisper-1",    # Transcription model
    language: str = None,         # Language hint (auto-detect if None)
    response_format: str = "json" # Output format
)

Basic Usage

Transcribe File

transcriber = Transcriber()

# Simple transcription
result = transcriber.transcribe("audio.wav")
print(result.text)

With Options

result = transcriber.transcribe(
    "meeting.mp3",
    language="en",
    timestamps=True,
    word_timestamps=True
)

# Access segments
for segment in result.segments:
    print(f"[{segment.start:.2f}s] {segment.text}")

Transcribe Bytes

with open("audio.wav", "rb") as f:
    audio_data = f.read()

result = transcriber.transcribe_bytes(
    audio_data,
    format="wav"
)

TranscriptionResult

The result object contains:

result.text          # Full transcription text
result.language      # Detected language
result.confidence    # Overall confidence score
result.duration      # Audio duration in seconds
result.segments      # List of segments with timestamps
result.words         # Word-level timestamps (if requested)

Segment Structure

segment.id        # Segment index
segment.start     # Start time (seconds)
segment.end       # End time (seconds)
segment.text      # Segment text
segment.confidence # Segment confidence

Streaming Transcription

For real-time transcription:

async def stream_transcribe(audio_stream):
    transcriber = Transcriber()
    
    async for result in transcriber.stream(audio_stream):
        print(f"Partial: {result.text}")
        
        if result.is_final:
            print(f"Final: {result.text}")

Language Support

Supported languages include:

English (en)
Spanish (es)
French (fr)
German (de)
Chinese (zh)
Japanese (ja)
Korean (ko)
And 50+ more...

# Force language
result = transcriber.transcribe(
    "audio.wav",
    language="es"  # Spanish
)

# Auto-detect
result = transcriber.transcribe("audio.wav")
print(f"Detected: {result.language}")

Translation

Translate audio to English:

# Transcribe + translate
result = transcriber.translate("french_audio.wav")
# Output is in English regardless of source language

Batch Processing

files = ["audio1.wav", "audio2.wav", "audio3.wav"]
results = transcriber.batch_transcribe(files)

for file, result in zip(files, results):
    print(f"{file}: {result.text}")

Output Formats

# JSON (default)
result = transcriber.transcribe("audio.wav", response_format="json")

# Plain text
text = transcriber.transcribe("audio.wav", response_format="text")

# SRT subtitles
srt = transcriber.transcribe("audio.wav", response_format="srt")

# VTT subtitles
vtt = transcriber.transcribe("audio.wav", response_format="vtt")

Import​

Transcriber Class​

Constructor​

Basic Usage​

Transcribe File​

With Options​

Transcribe Bytes​

TranscriptionResult​

Segment Structure​

Streaming Transcription​

Language Support​

Translation​

Batch Processing​

Output Formats​

See Also​

Import

Transcriber Class

Constructor

Basic Usage

Transcribe File

With Options

Transcribe Bytes

TranscriptionResult

Segment Structure

Streaming Transcription

Language Support

Translation

Batch Processing

Output Formats

See Also