Skip to main content

extract

The extract function extracts structured data from unstructured text.

Import

from openstackai import extract

Basic Usage

# Extract with schema
data = extract(text, schema={"name": str, "email": str})

# Extract common entities
entities = extract.entities("John works at Microsoft")

Parameters

ParameterTypeDefaultDescription
textstrrequiredSource text
schemadictNoneExpected data structure
typestrNonePreset: "contact", "date", "product"

Examples

Extract Contact Info

from openstackai import extract

text = """
Contact John Smith at john@example.com
Phone: 555-123-4567
"""

contact = extract(text, schema={
"name": str,
"email": str,
"phone": str
})

print(contact)
# {"name": "John Smith", "email": "john@example.com", "phone": "555-123-4567"}

Extract Entities

text = "Apple announced iPhone 15 on September 12, 2023"

entities = extract.entities(text)
# {
# "organizations": ["Apple"],
# "products": ["iPhone 15"],
# "dates": ["September 12, 2023"]
# }

Extract from Documents

# Extract invoice data
invoice_data = extract(
invoice_text,
schema={
"invoice_number": str,
"date": str,
"total": float,
"items": list
}
)

Custom Extraction

# Extract product info
product = extract(
description,
schema={
"name": str,
"price": float,
"features": list,
"specifications": dict
}
)

Async Usage

import asyncio
from openstackai import extract

async def main():
data = await extract.async_(
email_text,
schema={"sender": str, "subject": str}
)
print(data)

asyncio.run(main())

Preset Types

TypeExtracted Fields
contactname, email, phone, address
datedates, times, durations
productname, price, description, features
receiptitems, totals, date, vendor
invoicenumber, date, items, total, tax

See Also

  • [[ask]] - Question answering
  • [[summarize]] - Summarization
  • [[analyze]] - Data analysis