Skip to main content

Multimodal

Process images, audio, and video with AI agents.

See [[Multimodal-Module]] for full documentation.

Quick Start

from openstackai.multimodal import ImageContent, AudioContent

# Image analysis
image = ImageContent.from_file("photo.jpg")
description = image.describe()

# Audio transcription
audio = AudioContent.from_file("recording.mp3")
text = audio.transcribe()

Features

  • Image understanding and analysis
  • Audio transcription
  • Video frame analysis
  • Multi-modal conversations
  • Format conversion

Supported Formats

TypeFormats
ImagePNG, JPG, GIF, WebP
AudioMP3, WAV, M4A, FLAC
VideoMP4, MOV, AVI
  • [[Multimodal-Module]] - Full module documentation
  • [[ImageContent]] - Image processing
  • [[AudioContent]] - Audio processing
  • [[VideoContent]] - Video processing