File upload enables agents to process and understand various file types including documents, images, code files, and data formats. By providing files alongside text queries, agents can analyze visual content, extract information from documents, review code, and work with structured data.
This multimodal capability allows agents to go beyond text-only interactions, enabling them to understand and respond to questions about file contents, extract structured data, and combine visual or document context with natural language queries.
Supported File Types
Agents can process a wide variety of file formats:
| Category | Supported Formats | Use Cases |
|---|
| Documents | PDF, TXT, DOCX, MD, RTF | Reports, articles, documentation |
| Images | PNG, JPG, JPEG, GIF, WEBP | Photos, diagrams, screenshots, charts |
| Audio | MP3, WAV, M4A, OGG | Speech transcription, audio analysis |
| Video | MP4, MOV, AVI, WEBM | Video analysis, frame extraction |
Supported file types can change depending on the model. Check your model provider’s documentation for specific file type support and limitations.
Non-Streaming
Upload and process files using the files parameter with run() or arun(). Files can be local paths or URLs:
from hypertic.agents import Agent
from hypertic.models import OpenAIChat
model = OpenAIChat(model="gpt-5.2")
agent = Agent(
model=model
)
# Non-streaming with files
response = agent.run(
query="What's in this image and the document?",
files=[
"https://yavuzceliker.github.io/sample-images/image-1021.jpg",
"data/index.pdf"
]
)
print(f"Response: {response.content}")
print(f"Metadata: {response.metadata}")
Streaming
Stream responses when processing files using stream() or astream():
from hypertic.agents import Agent
from hypertic.models import OpenAIChat
model = OpenAIChat(model="gpt-5.2")
agent = Agent(
model=model
)
# Streaming with files
for event in agent.stream(
query="What is in the image and the document?",
files=["data/image.jpg", "https://www.berkshirehathaway.com/letters/2024ltr.pdf"]
):
if event.type == "content":
print(event.content, end="", flush=True)
elif event.type == "tool_calls":
print(f"\nTool Calls: {event.tool_calls}")
elif event.type == "tool_outputs":
print(f"\nTool Outputs: {event.tool_outputs}")
elif event.type == "metadata":
print(f"\nMetadata: {event.metadata}")