Skip to main content
File upload enables agents to process and understand various file types including documents, images, code files, and data formats. By providing files alongside text queries, agents can analyze visual content, extract information from documents, review code, and work with structured data. This multimodal capability allows agents to go beyond text-only interactions, enabling them to understand and respond to questions about file contents, extract structured data, and combine visual or document context with natural language queries.

Supported File Types

Agents can process a wide variety of file formats:
CategorySupported FormatsUse Cases
DocumentsPDF, TXT, DOCX, MD, RTFReports, articles, documentation
ImagesPNG, JPG, JPEG, GIF, WEBPPhotos, diagrams, screenshots, charts
AudioMP3, WAV, M4A, OGGSpeech transcription, audio analysis
VideoMP4, MOV, AVI, WEBMVideo analysis, frame extraction
Supported file types can change depending on the model. Check your model provider’s documentation for specific file type support and limitations.

Non-Streaming

Upload and process files using the files parameter with run() or arun(). Files can be local paths or URLs:
from hypertic.agents import Agent
from hypertic.models import OpenAIChat

model = OpenAIChat(model="gpt-5.2")

agent = Agent(
    model=model
)

# Non-streaming with files
response = agent.run(
    query="What's in this image and the document?",
    files=[
        "https://yavuzceliker.github.io/sample-images/image-1021.jpg",
        "data/index.pdf"
    ]
)
print(f"Response: {response.content}")
print(f"Metadata: {response.metadata}")

Streaming

Stream responses when processing files using stream() or astream():
from hypertic.agents import Agent
from hypertic.models import OpenAIChat

model = OpenAIChat(model="gpt-5.2")

agent = Agent(
    model=model
)

# Streaming with files
for event in agent.stream(
    query="What is in the image and the document?",
    files=["data/image.jpg", "https://www.berkshirehathaway.com/letters/2024ltr.pdf"]
):
    if event.type == "content":
        print(event.content, end="", flush=True)
    elif event.type == "tool_calls":
        print(f"\nTool Calls: {event.tool_calls}")
    elif event.type == "tool_outputs":
        print(f"\nTool Outputs: {event.tool_outputs}")
    elif event.type == "metadata":
        print(f"\nMetadata: {event.metadata}")