The Sandbox system provides isolated execution environments for agents to perform real work with filesystems, run commands, and maintain state across longer time horizons. Think of a sandbox as a "secure workspace" or "container" where an agent can safely interact with files, execute code, and perform tasks without affecting your actual system.
Core Concepts
What is a Sandbox?
A sandbox is an isolated execution environment that:
Provides a filesystem for the agent to work with
Allows command execution in a controlled manner
Maintains state across multiple agent runs
Isolates the agent's actions from the host system
Supports snapshots for state preservation
Why Sandboxes Matter
Safety - Agents can't accidentally damage your system
Sandbox capabilities define what operations are allowed:
Python
from agents.sandbox.capabilities import Capability
classMyCapability(Capability):
name = "my_capability"
description = "My custom capability"asyncdefcheck(self, context) -> bool:
"""Check if capability is available."""returnTrue
Built-in Capabilities
The SDK includes several built-in capabilities:
ExecCapability - Execute commands
ReadCapability - Read files
WriteCapability - Write files
NetworkCapability - Network access
BrowserCapability - Browser automation
Capability Configuration
Python
from agents.sandbox.capabilities import ExecCapability, ReadCapability
capabilities = [
ExecCapability(allow=["python", "node"]), # Allow specific commands
ReadCapability(allow=["/workspace/**"]), # Allow reading specific paths
WriteCapability(allow=["/workspace/**"]), # Allow writing specific paths
]
Sandbox Session Management
Sandbox Session
Sandbox sessions maintain state across runs:
Python
from agents.sandbox.session import BaseSandboxSession
# Create a session
session = await client.create_session(manifest)
# Use the session
config = SandboxRunConfig(session=session)
result1 = await Runner.run(agent, input1, run_config=config)
result2 = await Runner.run(agent, input2, run_config=config)
# Both runs use the same sandbox session# Cleanupawait session.close()
Session State
Sandbox sessions can be resumed:
Python
# First run
result1 = await Runner.run(agent, input1, run_config=config)
session_state = result1._sandbox
# Later run
config2 = SandboxRunConfig(
session_state=session_state,
)
result2 = await Runner.run(agent, input2, run_config=config2)
# Resumes from previous state
Sandbox Snapshots
Creating Snapshots
Snapshots capture sandbox state:
Python
from agents.sandbox.snapshot import LocalSnapshot
# Create a snapshot
snapshot = await client.create_snapshot(session)
# Save snapshot reference
snapshot_id = snapshot.id
from agents.sandbox.memory.rollouts import create_rollout
rollout_id = await create_rollout(
client=client,
memory_config=MemoryGenerateConfig(
instructions="Remember important information",
),
)
config = SandboxRunConfig(
memory_rollout_id=rollout_id,
)
Memory Types
Sandbox supports different memory types:
File-based memory - Store in files
Database memory - Store in database
Custom memory - Implement your own
Sandbox Execution
Executing Commands
Agents can execute commands in the sandbox:
Python
@function_tooldefrun_command(context: ToolContext, command: str) -> str:
"""Execute a command in the sandbox."""
sandbox = context.run_config.sandbox
result = await sandbox.client.exec(command)
return result.stdout
Reading Files
Agents can read files in the sandbox:
Python
@function_tooldefread_file(context: ToolContext, path: str) -> str:
"""Read a file from the sandbox."""
sandbox = context.run_config.sandbox
content = await sandbox.client.read_file(path)
return content
Writing Files
Agents can write files in the sandbox:
Python
@function_tooldefwrite_file(context: ToolContext, path: str, content: str) -> str:
"""Write a file to the sandbox."""
sandbox = context.run_config.sandbox
await sandbox.client.write_file(path, content)
returnf"Wrote {path}"
Sandbox Errors
Error Types
Sandbox operations can raise specific errors:
Python
from agents.sandbox.errors import (
SandboxError,
ExecTimeoutError,
ExecTransportError,
WorkspaceReadNotFoundError,
WorkspaceWriteTypeError,
)
try:
result = await sandbox.client.exec(command)
except ExecTimeoutError:
print("Command timed out")
except ExecTransportError:
print("Transport error")
except WorkspaceReadNotFoundError:
print("File not found")
Error Handling
Handle sandbox errors gracefully:
Python
@function_tooldefsafe_exec(context: ToolContext, command: str) -> str:
"""Execute command with error handling."""try:
result = await context.run_config.sandbox.client.exec(command)
return result.stdout
except ExecTimeoutError:
returnf"Command timed out: {command}"except SandboxError as e:
returnf"Sandbox error: {str(e)}"
Sandbox Best Practices
1. Use Appropriate Sandboxes
Choose the right sandbox for your use case:
Python
# Good - local for development
client = UnixLocalSandboxClient()
# Good - cloud for production
client = E2BSandboxClient()
# Avoid - cloud for simple local testing
client = E2BSandboxClient() # Unnecessary overhead
# Good - reasonable timeout
client = UnixLocalSandboxClient(timeout=300)
# Avoid - no timeout (could hang forever)
client = UnixLocalSandboxClient()
4. Use Snapshots for Reproducibility
Use snapshots for testing:
Python
# Good - use snapshot for testing
config = SandboxRunConfig(snapshot="test-snapshot")
# Avoid - fresh sandbox each time (less reproducible)
config = SandboxRunConfig()
5. Clean Up Sessions
Always clean up sessions:
Python
# Good - explicit cleanuptry:
session = await client.create_session(manifest)
# Use sessionfinally:
await session.close()
# Avoid - leak sessions
session = await client.create_session(manifest)
# Forgot to close
Common Sandbox Patterns
1. Code Repository Analysis
Analyze a codebase:
Python
agent = SandboxAgent(
name="code_analyzer",
instructions="Analyze the codebase structure",
default_manifest=Manifest(
entries={
"repo": GitRepo(repo="owner/repo", ref="main"),
},
),
)
result = await Runner.run(
agent,
"Analyze the repository structure and identify key files",
run_config=SandboxRunConfig(client=UnixLocalSandboxClient()),
)
2. Data Processing Pipeline
Process data in sandbox:
Python
agent = SandboxAgent(
name="data_processor",
instructions="Process the data files",
default_manifest=Manifest(
entries={
"data": LocalDir(path="./data"),
"scripts": LocalDir(path="./scripts"),
},
),
)
result = await Runner.run(
agent,
"Run the processing scripts on the data",
run_config=SandboxRunConfig(client=UnixLocalSandboxClient()),
)
3. Testing Environment
Run tests in isolated environment:
Python
agent = SandboxAgent(
name="tester",
instructions="Run the test suite",
default_manifest=Manifest(
entries={
"repo": GitRepo(repo="owner/repo"),
},
),
)
result = await Runner.run(
agent,
"Run the test suite and report results",
run_config=SandboxRunConfig(client=UnixLocalSandboxClient()),
)
4. Build Process
Build projects in sandbox:
Python
agent = SandboxAgent(
name="builder",
instructions="Build the project",
default_manifest=Manifest(
entries={
"repo": GitRepo(repo="owner/repo"),
},
),
)
result = await Runner.run(
agent,
"Build the project and report any errors",
run_config=SandboxRunConfig(client=UnixLocalSandboxClient()),
)
5. Documentation Generation
Generate documentation:
Python
agent = SandboxAgent(
name="doc_generator",
instructions="Generate documentation",
default_manifest=Manifest(
entries={
"repo": GitRepo(repo="owner/repo"),
},
),
)
result = await Runner.run(
agent,
"Generate API documentation from the code",
run_config=SandboxRunConfig(client=UnixLocalSandboxClient()),
)
Sandbox and Tracing
Sandbox Tracing
Sandbox operations are traced:
Python
result = await Runner.run(
agent,
input,
run_config=SandboxRunConfig(client=client),
)
# Trace includes sandbox operationsprint(result.trace)
Sandbox Span Data
Sandbox operations create spans:
Python
from agents.sandbox.tracing import sandbox_span
with sandbox_span(name="file_operation"):
await sandbox.client.write_file(path, content)
Summary
The Sandbox system provides isolated execution environments. Key takeaways:
SandboxAgent is designed for sandbox environments
Manifest defines the sandbox workspace
SandboxRunConfig configures sandbox execution
UnixLocalSandboxClient provides local Unix sandbox
DockerSandboxClient provides container isolation
Cloud sandboxes (E2B, Modal, etc.) provide cloud hosting
Entries define what's in the sandbox (GitRepo, LocalFile, etc.)
Capabilities control what operations are allowed
Sessions maintain state across runs
Snapshots enable reproducible environments
Memory rollouts provide memory capabilities
Command execution via sandbox tools
File operations (read/write) in sandbox
Error handling for sandbox failures
Timeouts prevent hanging operations
Cleanup prevents resource leaks
Tracing includes sandbox operations
Isolation protects the host system
Statefulness enables long-running tasks
Reproducibility via snapshots
Sandboxes are essential for agents that need to perform real work with files and commands in a safe, isolated environment.