Skip to main content

Multimodal

Process images, audio, and video with AI agents.

See [[Multimodal-Module]] for full documentation.

Quick Start

from openstackai.multimodal import ImageContent, AudioContent

# Image analysis
image = ImageContent.from_file("photo.jpg")
description = image.describe()

# Audio transcription
audio = AudioContent.from_file("recording.mp3")
text = audio.transcribe()

Features

Image understanding and analysis
Audio transcription
Video frame analysis
Multi-modal conversations
Format conversion

Supported Formats

Type	Formats
Image	PNG, JPG, GIF, WebP
Audio	MP3, WAV, M4A, FLAC
Video	MP4, MOV, AVI

[[Multimodal-Module]] - Full module documentation
[[ImageContent]] - Image processing
[[AudioContent]] - Audio processing
[[VideoContent]] - Video processing

Quick Start
Features
Supported Formats
Related Pages