Skip to main content

VideoContent

The Video class handles video data for multimodal AI interactions.

Import

from openstackai.multimodal import Video

Creating Video

From File

video = Video.from_file("recording.mp4")

From URL

video = Video.from_url("https://example.com/video.mp4")

From Bytes

with open("video.mp4", "rb") as f:
video = Video.from_bytes(f.read(), format="mp4")

Properties

PropertyTypeDescription
durationfloatDuration in seconds
widthintFrame width in pixels
heightintFrame height in pixels
fpsfloatFrames per second
formatstrVideo format
size_bytesintFile size
frame_countintTotal number of frames

Methods

extract_frames()

Extract frames from video:

# Extract frames at intervals
frames = video.extract_frames(interval=1.0) # Every 1 second

# Extract specific number of frames
frames = video.extract_frames(count=10) # 10 evenly spaced frames

# Extract at specific timestamps
frames = video.extract_frames(timestamps=[0.0, 5.0, 10.0])

extract_audio()

Extract audio track:

audio = video.extract_audio()
audio.save("audio.mp3")

trim()

Trim video:

# Trim to segment
trimmed = video.trim(start=10.0, end=30.0)

# First 60 seconds
trimmed = video.trim(end=60.0)

resize()

Resize video:

resized = video.resize(width=640, height=480)

save()

Save to file:

video.save("output.mp4")
video.save("output.webm", format="webm")

Using with Agents

Video Analysis

from openstackai import ask
from openstackai.multimodal import Video

video = Video.from_file("presentation.mp4")

# Extract key frames for analysis
frames = video.extract_frames(count=5)

response = ask(
"Describe what's happening in this video",
images=frames
)

With MultimodalContent

from openstackai.multimodal import MultimodalContent, Video

content = MultimodalContent()
content.add_text("Summarize this video lecture:")
content.add_video(Video.from_file("lecture.mp4"))

response = agent.run(content)

Frame-by-Frame Analysis

video = Video.from_file("surveillance.mp4")

for frame in video.extract_frames(interval=5.0):
analysis = ask("What do you see?", images=[frame])
print(f"Frame {frame.timestamp}s: {analysis}")

Format Support

FormatReadWriteNotes
MP4Most common
MOVQuickTime
WebMWeb optimized
AVILegacy format
MKVRead only
GIFAnimated

Video Processing

Get Thumbnail

thumbnail = video.get_thumbnail(time=5.0)
thumbnail.save("thumbnail.jpg")

Get Metadata

metadata = video.get_metadata()
print(f"Duration: {metadata['duration']}")
print(f"Codec: {metadata['codec']}")
print(f"Bitrate: {metadata['bitrate']}")

Convert Format

# Convert to web-friendly format
web_video = video.convert(
format="mp4",
codec="h264",
quality="medium"
)

Provider Support

ProviderVideo InputNotes
OpenAI GPT-4oVia frame extraction
Google GeminiNative video support
Anthropic Claude⚠️Via frame extraction

See Also