Guardrails are safety checks that validate input and output at various points in an agent run. Think of guardrails as "security checkpoints" or "quality gates" that ensure the agent is operating within safe and acceptable boundaries. They can prevent harmful content, validate data formats, enforce business rules, and provide custom validation logic.
Core Concepts
What are Guardrails?
Guardrails are functions that run at specific points during agent execution to:
Validate - Check that data meets certain criteria
Filter - Remove or modify inappropriate content
Block - Prevent execution when safety thresholds are crossed
Transform - Modify content to meet requirements
Types of Guardrails
Input Guardrails - Run before the agent processes input
Output Guardrails - Run after the agent produces output
Tool Input Guardrails - Run before tool execution
Tool Output Guardrails - Run after tool execution
Guardrail Anatomy
Every guardrail has:
Python
@dataclassclassGuardrailFunctionOutput:
output_info: Any"""Information about the guardrail's check."""
tripwire_triggered: bool"""Whether the guardrail was triggered (blocked execution)."""
Tripwire Concept:
When tripwire_triggered = False - Execution continues normally
When tripwire_triggered = True - Execution is halted with an exception
The output_info can contain details about what was checked
Input Guardrails
Purpose
Input guardrails validate or filter input before it reaches the LLM. They run:
Only on the first turn of a run
Only for the starting agent (not for handoffs)
Optionally in parallel with the agent (default) or before the agent starts
Basic Input Guardrail
Python
from agents import Agent, input_guardrail, InputGuardrailFunctionOutput
@input_guardraildefcheck_off_topic(
context: RunContextWrapper,
agent: Agent,
input: str | list[TResponseInputItem],
) -> InputGuardrailFunctionOutput:
"""Check if input is off-topic."""ifisinstance(input, str):
if"politics"ininput.lower():
return InputGuardrailFunctionOutput(
output_info="Political content detected",
tripwire_triggered=True,
)
return InputGuardrailFunctionOutput(
output_info="Input is appropriate",
tripwire_triggered=False,
)
agent = Agent(
name="safe_agent",
instructions="Help with safe topics only",
input_guardrails=[check_off_topic],
)
# 1. User provides inputinput = "Hello, world!"# 2. Input guardrails run (first turn, starting agent only)if is_first_turn and is_starting_agent:
guardrail_results = []
# Run guardrails (parallel or sequential)for guardrail in input_guardrails:
result = await guardrail.run(context, agent, input)
guardrail_results.append(result)
# Check if tripwire triggeredif result.output.tripwire_triggered:
raise InputGuardrailTripwireTriggered(
guardrail_result=result,
input=input,
)
# 3. If no tripwires, continue with agent execution
response = await model.get_response(...)
Output Guardrail Flow
Python
# 1. Agent produces output
output = "Final answer"# 2. Output guardrails run
guardrail_results = []
for guardrail in output_guardrails:
result = await guardrail.run(context, agent, output)
guardrail_results.append(result)
# Check if tripwire triggeredif result.output.tripwire_triggered:
raise OutputGuardrailTripwireTriggered(
guardrail_result=result,
output=output,
)
# 3. If no tripwires, return resultreturn RunResult(final_output=output, ...)
Tool Guardrail Flow
Python
# 1. Model calls tool
tool_call = ResponseFunctionToolCall(
name="my_tool",
arguments='{"param": "value"}',
)
# 2. Run tool input guardrailsfor guardrail in tool.input_guardrails:
result = await guardrail.run(context, tool_name, arguments)
if result.output.tripwire_triggered:
handle_guardrail_tripwire(result)
return# Skip tool execution# 3. Execute tooltry:
output = await tool.execute(arguments, context)
except Exception as e:
output = handle_error(e)
# 4. Run tool output guardrailsfor guardrail in tool.output_guardrails:
result = await guardrail.run(context, tool_name, output)
if result.output.tripwire_triggered:
output = result.output_info # Use guardrail output# or raise exception# 5. Return output to modelreturn output
Guardrail Exceptions
InputGuardrailTripwireTriggered
Raised when an input guardrail tripwire is triggered:
Python
from agents import InputGuardrailTripwireTriggered
try:
result = await Runner.run(agent, input)
except InputGuardrailTripwireTriggered as e:
print(f"Input blocked: {e.guardrail_result.output.output_info}")
# Handle the blocked input
OutputGuardrailTripwireTriggered
Raised when an output guardrail tripwire is triggered:
Python
from agents import OutputGuardrailTripwireTriggered
try:
result = await Runner.run(agent, input)
except OutputGuardrailTripwireTriggered as e:
print(f"Output blocked: {e.guardrail_result.output.output_info}")
# Handle the blocked output
ToolInputGuardrailTripwireTriggered
Raised when a tool input guardrail tripwire is triggered:
Python
from agents import ToolInputGuardrailTripwireTriggered
# This is handled internally by the SDK# The tool is skipped and an error message is sent to the model
ToolOutputGuardrailTripwireTriggered
Raised when a tool output guardrail tripwire is triggered:
Python
from agents import ToolOutputGuardrailTripwireTriggered
# This is handled internally by the SDK# The output is replaced with the guardrail's output_info