OpenAI Agents Python SDK - Comprehensive Codebase Study
Introduction
This is a comprehensive study of the OpenAI Agents Python SDK codebase. The SDK is a lightweight yet powerful framework for building multi-agent workflows that is provider-agnostic, supporting the OpenAI Responses and Chat Completions APIs, as well as 100+ other LLMs.
Project Structure
openai-agents-python/
├── src/agents/ # Core library implementation
├── tests/ # Test suite
├── examples/ # Sample projects showing SDK usage
├── docs/ # MkDocs documentation source
├── mkdocs.yml # Documentation site configuration
├── pyproject.toml # Python dependencies and tool configuration
└── Makefile # Common developer commandsCore Architecture
The SDK is built around several key concepts:
- Agents - LLMs configured with instructions, tools, guardrails, and handoffs
- Runner - The execution engine that runs agents
- Tools - Functions that agents can call to perform actions
- Items - The unit of work in an agent run (messages, tool calls, etc.)
- Guardrails - Safety checks for input and output validation
- Handoffs - Mechanism for delegating to other agents
- Sessions - Automatic conversation history management
- Tracing - Built-in tracking of agent runs
- Sandbox - Isolated execution environments for agents
Key Design Principles
1. Provider Agnostic
The SDK supports multiple model providers through a unified interface. You can use OpenAI's Responses API, Chat Completions API, or integrate with 100+ other LLMs through the MultiProvider system.
2. Type Safety
The entire SDK is written with Python type hints and uses Pydantic for data validation. This ensures type safety and provides excellent IDE support.
3. Async-First
The SDK is designed with async/await as the primary execution model. This allows for efficient concurrent operations, especially when dealing with multiple tools or handoffs.
4. Streaming Support
All operations support streaming, allowing you to get real-time updates as the agent processes requests.
5. Human-in-the-Loop
Built-in support for pausing agent runs for human approval, inspection, or intervention through the RunState system.
6. Extensibility
The SDK is designed to be extended through:
- Custom tools
- Custom model providers
- Custom guardrails
- Lifecycle hooks
- Extensions (MCP, memory backends, etc.)
Module Organization
Core Modules (src/agents/)
agent.py- Agent and AgentBase classesrun.py- Runner and execution orchestrationtool.py- Tool system and built-in toolsitems.py- Run items (messages, tool calls, etc.)guardrail.py- Input and output guardrailshandoffs/- Agent delegation systemmemory/- Session and conversation persistencemodels/- Model provider implementationssandbox/- Isolated execution environmentstracing/- Observability and debuggingrun_internal/- Internal runtime implementationextensions/- Additional functionality (MCP, memory, etc.)
Data Flow
Basic Agent Execution Flow
- Input - User provides input (string or list of items)
- Context Setup - RunContextWrapper is created with user context
- Guardrails - Input guardrails run (if configured)
- Model Call - Agent calls LLM with instructions and tools
- Tool Execution - If model calls tools, they are executed
- Output Guardrails - Output guardrails run (if configured)
- Result - Final output is returned
- Session Update - Conversation history is saved (if session is configured)
Streaming Flow
The streaming flow follows the same path but yields events as they happen:
- Model response chunks
- Tool call events
- Tool output events
- Final result
Key Abstractions
Agent
An agent is the main building block. It represents an AI assistant with:
- Instructions (system prompt)
- Tools it can use
- Guardrails for safety
- Handoffs to other agents
- Output schema for structured responses
Runner
The Runner is responsible for executing agents. It handles:
- Turn management (multi-turn conversations)
- Tool execution
- Handoff delegation
- Session persistence
- Error handling
- Streaming
Tool
Tools are functions that agents can call. They can be:
- Function tools (Python functions)
- Hosted tools (OpenAI tools like file search, web search)
- MCP tools (from Model Context Protocol servers)
- Agent tools (other agents exposed as tools)
- Shell tools (command execution)
- Computer tools (computer interaction)
Item
Items represent units of work in an agent run:
- MessageOutputItem - Messages from the LLM
- ToolCallItem - Tool calls made by the LLM
- ToolCallOutputItem - Results from tool execution
- HandoffCallItem - Handoff to another agent
- ReasoningItem - Model reasoning (for reasoning models)
RunState
RunState captures the complete state of an agent run, enabling:
- Pause and resume
- Human-in-the-loop workflows
- State inspection
- Debugging
Configuration Hierarchy
Configuration flows from most specific to most general:
- Agent-level - Settings on the Agent instance
- RunConfig-level - Settings for a specific run
- Global defaults - Default settings for the SDK
This allows fine-grained control while maintaining sensible defaults.
Error Handling
The SDK uses a structured error handling approach:
- UserError - Errors due to user input or configuration
- ModelBehaviorError - Errors from unexpected model behavior
- AgentsException - Base exception for SDK errors
- ToolTimeoutError - Tool execution timeout
- MaxTurnsExceeded - Agent exceeded maximum turns
- Guardrail Tripwires - Guardrail violations
Testing Strategy
The test suite uses:
- pytest for test execution
- inline-snapshot for snapshot tests
- pytest-asyncio for async test support
- coverage.py for coverage tracking
Tests are organized by functionality:
test_agent_*.py- Agent-specific teststest_run_*.py- Runner and execution teststest_tool_*.py- Tool system teststest_guardrails.py- Guardrail teststest_handoff_*.py- Handoff tests- Integration tests in
examples/
Documentation
Documentation is built with MkDocs and Material theme:
- Source in
docs/ - Built output in
site/ - API reference auto-generated from docstrings
- Multi-language support (ja, ko, zh)
Development Workflow
- Setup -
uv syncto install dependencies - Development - Make changes with
uv run python - Testing -
make teststo run test suite - Linting -
make lintandmake format - Type checking -
make typecheck - Documentation -
make build-docs - Coverage -
make coverage
Performance Considerations
- Async execution - Tools and guardrails run concurrently when possible
- Connection pooling - Reuse HTTP connections for model calls
- Prompt caching - Support for prompt cache keys
- Session compaction - Intelligent conversation history management
- Tool use tracking - Prevent infinite loops
Security Considerations
- Input validation - Guardrails for input sanitization
- Tool approval - Human approval for sensitive tools
- Sandbox isolation - Isolated execution for file system access
- API key management - Secure handling of API keys
- Tracing controls - Options to exclude sensitive data from traces
Next Steps
This overview provides a high-level understanding of the SDK. The following documents dive deep into each component:
- Agent System - Deep dive into agents
- Runner System - Execution engine
- Tool System - Tools and tool execution
- Items System - Run items and events
- Guardrails - Input/output validation
- Handoffs - Agent delegation
- Memory & Sessions - Conversation persistence
- Model Providers - LLM integration
- Sandbox System - Isolated execution
- Tracing - Observability
- Run State - State management
- Context - Execution context
- Lifecycle Hooks - Event callbacks
- Configuration - Settings and config
- Error Handling - Error management
- Streaming - Real-time updates
- Extensions - Extending the SDK
- MCP Integration - Model Context Protocol
- Best Practices - Usage patterns
- Architecture Patterns - Design patterns