This document compiles the best practices for using the OpenAI Agents Python SDK. These practices are distilled from real-world usage, community feedback, and the core development team's experience. Following these practices will help you build robust, maintainable, and efficient agent-based applications.
General Principles
1. Start Simple
Begin with simple agents and add complexity gradually:
Python
# Good - start simple
agent = Agent(
name="assistant",
instructions="You are a helpful assistant",
)
# Avoid - start complex
agent = Agent(
name="complex",
instructions="...",
tools=[...],
guardrails=[...],
handoffs=[...],
hooks=...,
# Too much complexity from the start
)
# Start with basic version
agent = Agent(instructions="Basic instructions")
result = await Runner.run(agent, input)
# Add tools based on needs
agent = Agent(instructions="...", tools=[tool1])
result = await Runner.run(agent, input)
# Add more as needed
agent = Agent(instructions="...", tools=[tool1, tool2])
3. Use Type Hints
Always use type hints for better IDE support and type safety:
Python
# Good@function_tooldefcalculate(a: int, b: int) -> int:
return a + b
# Avoid@function_tooldefcalculate(a, b):
return a + b
4. Write Clear Instructions
Write clear, specific agent instructions:
Python
# Good
agent = Agent(
name="summarizer",
instructions=(
"You are a summarization expert. Given a text, provide a concise summary ""in 3 bullet points. Each bullet point should be under 20 words. ""Focus on the main ideas and ignore minor details."
),
)
# Avoid
agent = Agent(
name="vague",
instructions="Summarize things", # Too vague
)
5. Test Thoroughly
Test your agents extensively:
Python
@pytest.mark.asyncioasyncdeftest_agent_basic():
"""Test basic agent behavior."""
result = await Runner.run(agent, "Hello")
assert result.final_output isnotNone@pytest.mark.asyncioasyncdeftest_agent_with_tools():
"""Test agent with tools."""
result = await Runner.run(agent, "Use the tool")
assert"tool"in result.final_output.lower()
Agent Design
1. Single Responsibility
Each agent should have a single, clear responsibility:
# Good
agent = Agent(name="customer_support_triage")
# Avoid
agent = Agent(name="agent1") # Not descriptive
3. Appropriate Instructions
Match instructions to the agent's purpose:
Python
# Good
coder = Agent(
name="coder",
instructions=(
"You are a senior software engineer. Write clean, well-documented code ""following best practices. Include error handling and type hints."
),
)
# Avoid
coder = Agent(
name="coder",
instructions="You are helpful", # Too generic
)
# Good@function_tooldefprocess_data(
user_id: str,
limit: int = 10,
filters: Optional[List[str]] = None,
) -> str:
...
# Avoid@function_tooldefprocess_data(user_id, limit=10, filters=None): # No type hints
3. Docstrings
Write clear docstrings:
Python
# Good@function_tooldefcalculate_discount(price: float, discount_percent: float) -> float:
"""
Calculate the discounted price.
Args:
price: The original price.
discount_percent: The discount percentage (0-100).
Returns:
The discounted price.
Example:
calculate_discount(100.0, 20) returns 80.0
"""return price * (1 - discount_percent / 100)
# Avoid@function_tooldefcalculate_discount(price: float, discount_percent: float) -> float:
"""Calculate discount."""# Too vague
4. Error Handling
Handle errors gracefully:
Python
# Good@function_tooldefapi_call(endpoint: str) -> str:
"""Call an API endpoint."""try:
response = requests.get(endpoint)
response.raise_for_status()
return response.text
except requests.RequestException as e:
returnf"API error: {str(e)}"# Avoid@function_tooldefapi_call(endpoint: str) -> str:
"""Call an API endpoint."""return requests.get(endpoint).text # No error handling
5. Input Validation
Validate inputs:
Python
# Good@function_tooldefdivide(a: float, b: float) -> float:
"""Divide two numbers."""if b == 0:
raise ValueError("Cannot divide by zero")
return a / b
# Avoid@function_tooldefdivide(a: float, b: float) -> float:
"""Divide two numbers."""return a / b # Will crash on b=0
Guardrails
1. Use Appropriate Guardrails
Use guardrails at the right level:
Python
# Good - input guardrail for filtering@input_guardraildefcheck_safety(input):
if"harmful"ininput.lower():
return GuardrailFunctionOutput(tripwire_triggered=True)
return GuardrailFunctionOutput(tripwire_triggered=False)
# Good - output guardrail for validation@output_guardraildefcheck_length(output):
iflen(output) > 1000:
return GuardrailFunctionOutput(tripwire_triggered=True)
return GuardrailFunctionOutput(tripwire_triggered=False)
# Good
api_key = os.getenv("OPENAI_API_KEY")
# Avoid
api_key = "sk-..."# Never hardcode
4. Validate Configuration
Validate configuration early:
Python
# Good
config = get_config()
ifnot validate_config(config):
raise ValueError("Invalid configuration")
result = await Runner.run(agent, input, run_config=config)
# Avoid
result = await Runner.run(agent, input, run_config=get_config()) # Might be invalid
5. Document Configuration
Document configuration options:
Python
"""
Configuration for agent runs.
Environment variables:
- MODEL: Model to use (default: gpt-4o)
- TEMPERATURE: Temperature (default: 0.7)
- MAX_TURNS: Maximum turns (default: 10)
"""
# Good - focused context@dataclassclassUserContext:
user_id: str
preferences: dict# Avoid - bloated context@dataclassclassEverythingContext:
user_id: str
preferences: dict
database_connection: Any# Don't put heavy resources here
cache: dict
logger: Any
3. Use Type Hints
Always use type hints:
Python
# Good@dataclassclassMyContext:
user_id: str
data: dict[str, Any]
# Avoid@dataclassclassMyContext:
user_id # No type hint
data
4. Document Context Fields
Document context fields:
Python
# Good@dataclassclassMyContext:
"""Context for user operations."""
user_id: str"""Unique identifier for the user."""
preferences: dict"""User preferences (theme, language, etc.)."""# Avoid@dataclassclassMyContext:
user_id: str# What is this?
preferences: dict# What keys?
5. Avoid Circular Dependencies
Avoid circular references in context:
Python
# Good - no circular references@dataclassclassContextA:
data: str@dataclassclassContextB:
context_a: ContextA
# Avoid - circular reference@dataclassclassContextA:
context_b: ContextB
@dataclassclassContextB:
context_a: ContextA
Error Handling
1. Catch Specific Exceptions
Catch specific exceptions:
Python
# Goodtry:
result = await Runner.run(agent, input)
except MaxTurnsExceeded:
handle_max_turns()
except InputGuardrailTripwireTriggered:
handle_guardrail()
# Avoidtry:
result = await Runner.run(agent, input)
except Exception: # Too broad
handle_all()
# Goodtry:
result = await Runner.run(agent, input)
except AgentsException as e:
logger.error(f"Agent error: {e}", exc_info=True)
raise# Avoidtry:
result = await Runner.run(agent, input)
except AgentsException as e:
raise# Lost logging opportunity
4. Use Custom Error Types
Define custom error types:
Python
classMyApplicationError(AgentsException):
"""Base error for my application."""passclassInsufficientCreditsError(MyApplicationError):
"""Error when user has insufficient credits."""pass
5. Handle Errors Gracefully
Provide graceful fallbacks:
Python
# Goodtry:
result = await Runner.run(primary_agent, input)
except Exception:
result = await Runner.run(fallback_agent, input)
# Avoid
result = await Runner.run(primary_agent, input) # Might crash
Performance
1. Use Async
Always use async for production:
Python
# Good
result = await Runner.run(agent, input)
# Avoid in production
result = Runner.run_sync(agent, input)
2. Set Reasonable Limits
Set reasonable limits:
Python
# Good
config = RunConfig(
max_turns=10,
session_settings=SessionSettings(max_items=50),
)
# Avoid - no limits
config = RunConfig() # Might run forever or use too many tokens
3. Use Appropriate Models
Choose the right model for the task:
Python
# Good - appropriate model
quick_agent = Agent(model="gpt-4o-mini") # Fast, cheap
complex_agent = Agent(model="gpt-4o") # More capable# Avoid - overkill
simple_task = Agent(model="gpt-4o") # Unnecessary expense
# Good
session = SQLiteSession(db_path="conversations.db")
result = await Runner.run(agent, input, session=session)
# Avoid - no session for long conversations
result = await Runner.run(agent, input) # Token waste
# Good@pytest.mark.asyncioasyncdeftest_with_mock():
with patch('external_api.call') as mock:
mock.return_value = "mocked result"
result = await Runner.run(agent, "Use API")
assert"mocked"in result.final_output
# Avoid - no mocking@pytest.mark.asyncioasyncdeftest_without_mock():
result = await Runner.run(agent, "Use API") # Calls real API
4. Test Edge Cases
Test edge cases:
Python
# Good@pytest.mark.asyncioasyncdeftest_empty_input():
result = await Runner.run(agent, "")
assert result.final_output isnotNone@pytest.mark.asyncioasyncdeftest_very_long_input():
result = await Runner.run(agent, "a" * 10000)
assert result.final_output isnotNone# Avoid - only test normal cases@pytest.mark.asyncioasyncdeftest_normal_case():
result = await Runner.run(agent, "Hello")
assert result.final_output isnotNone
5. Test Configuration
Test with different configurations:
Python
# Good@pytest.mark.asyncioasyncdeftest_with_fast_config():
config = RunConfig(model="gpt-4o-mini", max_turns=5)
result = await Runner.run(agent, input, run_config=config)
assert result.final_output isnotNone# Avoid - always use default config@pytest.mark.asyncioasyncdeftest_with_default():
result = await Runner.run(agent, input)
assert result.final_output isnotNone
Documentation
1. Document Agent Purpose
Document what each agent does:
Python
# Good"""
Customer Support Triage Agent
This agent triages customer support requests and routes them to
appropriate specialists based on the issue type.
"""
agent = Agent(
name="triage",
instructions="Triage customer support requests...",
)
# Avoid - no documentation
agent = Agent(name="triage", instructions="...")
2. Document Tools
Document tool behavior:
Python
# Good@function_tooldefcalculate_discount(price: float, discount_percent: float) -> float:
"""
Calculate the discounted price.
This function calculates the price after applying a discount percentage.
It validates that the discount is between 0 and 100.
Args:
price: The original price (must be positive).
discount_percent: The discount percentage (0-100).
Returns:
The discounted price.
Raises:
ValueError: If price is negative or discount is out of range.
Example:
>>> calculate_discount(100.0, 20)
80.0
"""
...
# Avoid@function_tooldefcalculate_discount(price: float, discount_percent: float) -> float:
"""Calculate discount."""
...
3. Document Configuration
Document configuration options:
Python
# Good"""
Configuration for agent runs.
Environment Variables:
- MODEL: Model to use (default: gpt-4o)
- TEMPERATURE: Temperature (default: 0.7)
- MAX_TURNS: Maximum turns (default: 10)
Configuration Files:
- config.yaml: Main configuration file
"""# Avoid - no documentation
config = RunConfig(...)
4. Document Extensions
Document extension behavior:
Python
# Good"""
Custom Memory Backend
This extension provides custom memory storage using XYZ database.
Configuration:
- connection_string: Database connection string
- table_name: Table name for storage
Usage:
session = CustomMemorySession(
connection_string="postgresql://...",
table_name="agent_sessions",
)
"""# Avoid - no documentationclassCustomMemorySession(SessionABC):
...
5. Document API
Document public API:
Python
# Gooddefrun_agent(agent: Agent, input: str) -> str:
"""
Run an agent with the given input.
This is a convenience function that creates a runner and executes
the agent with default configuration.
Args:
agent: The agent to run.
input: The input to provide to the agent.
Returns:
The final output from the agent.
Raises:
AgentsException: If the agent run fails.
Example:
>>> agent = Agent(instructions="You are helpful")
>>> run_agent(agent, "Hello")
'Hello! How can I help you?'
"""
...
# Avoid - no documentationdefrun_agent(agent, input):
...
Maintain code that is easy to understand and modify
Scale applications that perform well under load
Secure systems that protect sensitive data
Monitor applications with good observability
Test code that catches bugs early
Document code that others can understand
Deploy applications reliably
Remember that best practices evolve with experience. Start with these guidelines, learn from your own experience, and adapt them to your specific use cases.