InkdownInkdown
Start writing

Study

59 filesยท8 subfolders

Shared Workspace

Study
core

09_SANDBOX_SYSTEM

Shared from "Study" on Inkdown

Sandbox System - Comprehensive Deep Dive

Overview

The Sandbox system provides isolated execution environments for agents to perform real work with filesystems, run commands, and maintain state across longer time horizons. Think of a sandbox as a "secure workspace" or "container" where an agent can safely interact with files, execute code, and perform tasks without affecting your actual system.

Core Concepts

What is a Sandbox?

A sandbox is an isolated execution environment that:

  • Provides a filesystem for the agent to work with
  • Allows command execution in a controlled manner
  • Maintains state across multiple agent runs
  • Isolates the agent's actions from the host system
  • Supports snapshots for state preservation
Why Sandboxes Matter
  1. Safety - Agents can't accidentally damage your system
programming-language-concepts.md
zero-language-explanation.md
DB
01-introduction.md
02-relational-databases.md
03-database-design.md
04-indexing.md
05-transactions-acid.md
06-nosql-databases.md
07-query-optimization.md
08-replication-ha.md
09-sharding-partitioning.md
10-caching-strategies.md
11-cap-theorem.md
12-connection-pooling.md
13-backup-recovery.md
14-monitoring.md
15-database-selection.md
README.md
JS
Event loop
Merlin Backend
01-Orchestration.md
02-DeepResearch.md
03-Search.md
04-Scraping.md
05-Streaming.md
06-MultiProviderLLM.md
07-MemoryAndContext.md
08-ErrorHandling.md
09-RateLimiting.md
10-TaskQueue.md
11-SecurityAndAuth.md
Orchestration-2nd-draft
OpenAI Agents Python
00_OVERVIEW.md
01_AGENT_SYSTEM.md
02_RUNNER_SYSTEM.md
03_TOOL_SYSTEM.md
04_ITEMS_SYSTEM.md
05_GUARDRAILS.md
06_HANDOFFS.md
07_MEMORY_SESSIONS.md
08_MODEL_PROVIDERS.md
09_SANDBOX_SYSTEM.md
10_TRACING.md
11_RUN_STATE.md
12_CONTEXT.md
13_LIFECYCLE_HOOKS.md
14_CONFIGURATION.md
15_ERROR_HANDLING.md
16_STREAMING.md
17_EXTENSIONS.md
18_MCP_INTEGRATION.md
19_BEST_PRACTICES.md
20_ARCHITECTURE_PATTERNS.md
opencode-study
context-handling
core
Python
Alembic
Basics
sqlalchemy - fastapi
SQLAlchemy overview
tweets
system_design_for_agentic_apps.md
  • Statefulness - Work persists across runs
  • Real Work - Agents can actually perform tasks (edit files, run code, etc.)
  • Isolation - Multiple agents can work independently
  • Reproducibility - Snapshots enable reproducible environments
  • Long-Running Tasks - Support for extended work sessions
  • Sandbox Types

    The SDK supports several sandbox implementations:

    1. UnixLocalSandboxClient - Local Unix-based sandbox
    2. DockerSandboxClient - Docker container sandbox
    3. E2BSandboxClient - E2B cloud sandbox
    4. ModalSandboxClient - Modal cloud sandbox
    5. RunloopSandboxClient - Runloop cloud sandbox
    6. DaytonaSandboxClient - Daytona cloud sandbox
    7. VercelSandboxClient - Vercel cloud sandbox
    8. Custom Clients - Implement your own sandbox client

    SandboxAgent

    SandboxAgent Class

    SandboxAgent is a specialized agent designed for sandbox environments:

    Python
    from agents import SandboxAgent, Manifest, SandboxRunConfig
    from agents.sandbox.entries import GitRepo
    from agents.sandbox.sandboxes import UnixLocalSandboxClient
    
    agent = SandboxAgent(
        name="workspace_assistant",
        instructions="Inspect the sandbox workspace before answering",
        default_manifest=Manifest(
            entries={
                "repo": GitRepo(repo="openai/openai-agents-python", ref="main"),
            }
        ),
    )
    
    result = await Runner.run(
        agent,
        "Inspect the repo README and summarize what this project does",
        run_config=SandboxRunConfig(
            client=UnixLocalSandboxClient(),
        ),
    )

    Key Differences from Regular Agent:

    • Manifest - Defines what files/resources are available
    • Sandbox-aware - Knows it's running in a sandbox
    • Workspace focus - Designed for file/workspace operations
    • State persistence - Maintains sandbox state across runs
    Manifest

    The manifest defines the sandbox workspace:

    Python
    from agents.sandbox import Manifest
    from agents.sandbox.entries import GitRepo, LocalFile, Dir
    
    manifest = Manifest(
        entries={
            # Git repository
            "repo": GitRepo(repo="openai/openai-agents-python", ref="main"),
            
            # Local file
            "config": LocalFile(path="./config.json"),
            
            # Local directory
            "data": Dir(path="./data"),
            
            # Remote file
            "remote": RemoteFile(url="https://example.com/file.txt"),
        },
    )

    Manifest Entry Types:

    • GitRepo - Clone a Git repository
    • LocalFile - Copy a local file
    • LocalDir - Copy a local directory
    • RemoteFile - Download a remote file
    • RemoteDir - Download a remote directory
    • InlineFile - Create a file from inline content
    SandboxRunConfig

    Configuration for sandbox execution:

    Python
    from agents import SandboxRunConfig
    
    config = SandboxRunConfig(
        client=UnixLocalSandboxClient(),
        options={"timeout": 300},  # Client-specific options
        manifest=manifest,  # Override default manifest
        snapshot="snapshot-123",  # Use a snapshot
        concurrency_limits=SandboxConcurrencyLimits(
            manifest_entries=4,
            local_dir_files=10,
        ),
    )

    Unix Local Sandbox

    UnixLocalSandboxClient

    Local Unix-based sandbox for development:

    Python
    from agents.sandbox.sandboxes import UnixLocalSandboxClient
    
    client = UnixLocalSandboxClient()
    
    config = SandboxRunConfig(client=client)
    
    result = await Runner.run(
        agent,
        input,
        run_config=config,
    )

    How it works:

    • Creates a temporary directory on your local filesystem
    • Provides shell access via subprocess
    • Isolates via chroot (if supported)
    • Good for development and testing

    Limitations:

    • Not truly isolated (on same machine)
    • Requires Unix-like system
    • May require sudo for full isolation
    Unix Local Sandbox Options
    Python
    client = UnixLocalSandboxClient(
        root_dir="/tmp/sandbox",  # Custom root directory
        timeout=300,  # Command timeout
        env_vars={"MY_VAR": "value"},  # Environment variables
    )

    Docker Sandbox

    DockerSandboxClient

    Docker container-based sandbox:

    Python
    from agents.sandbox.sandboxes import DockerSandboxClient
    
    client = DockerSandboxClient(
        image="python:3.11",
    )
    
    config = SandboxRunConfig(client=client)
    
    result = await Runner.run(
        agent,
        input,
        run_config=config,
    )

    Benefits:

    • True isolation via containers
    • Reproducible environments
    • Can use any Docker image
    • Network isolation

    Requirements:

    • Docker installed and running
    • Sufficient disk space
    Docker Sandbox Options
    Python
    client = DockerSandboxClient(
        image="python:3.11",
        dockerfile=None,  # Custom Dockerfile
        build_args={},  # Build arguments
        env_vars={},  # Environment variables
        volumes={},  # Volume mounts
        ports={},  # Port mappings
        network="bridge",  # Network mode
    )

    Cloud Sandboxes

    E2B Sandbox

    E2B cloud-based sandbox:

    Python
    from agents.sandbox.sandboxes import E2BSandboxClient
    
    client = E2BSandboxClient(
        api_key="your-api-key",
    )
    
    config = SandboxRunConfig(client=client)
    
    result = await Runner.run(
        agent,
        input,
        run_config=config,
    )

    Benefits:

    • Cloud-hosted (no local resources)
    • Scalable
    • Pre-configured environments
    • Good for production
    Modal Sandbox

    Modal cloud-based sandbox:

    Python
    from agents.sandbox.sandboxes import ModalSandboxClient
    
    client = ModalSandboxClient()
    
    config = SandboxRunConfig(client=client)
    
    result = await Runner.run(
        agent,
        input,
        run_config=config,
    )
    Runloop Sandbox

    Runloop cloud-based sandbox:

    Python
    from agents.sandbox.sandboxes import RunloopSandboxClient
    
    client = RunloopSandboxClient(
        api_key="your-api-key",
    )
    
    config = SandboxRunConfig(client=client)
    
    result = await Runner.run(
        agent,
        input,
        run_config=config,
    )
    Daytona Sandbox

    Daytona cloud-based sandbox:

    Python
    from agents.sandbox.sandboxes import DaytonaSandboxClient
    
    client = DaytonaSandboxClient(
        api_key="your-api-key",
    )
    
    config = SandboxRunConfig(client=client)
    
    result = await Runner.run(
        agent,
        input,
        run_config=config,
    )
    Vercel Sandbox

    Vercel cloud-based sandbox:

    Python
    from agents.sandbox.sandboxes import VercelSandboxClient
    
    client = VercelSandboxClient()
    
    config = SandboxRunConfig(client=client)
    
    result = await Runner.run(
        agent,
        input,
        run_config=config,
    )

    Sandbox Entries

    GitRepo Entry

    Clone a Git repository:

    Python
    from agents.sandbox.entries import GitRepo
    
    manifest = Manifest(
        entries={
            "repo": GitRepo(
                repo="openai/openai-agents-python",
                ref="main",  # Branch or tag
                depth=1,  # Shallow clone
                sparse_checkout=None,  # Sparse checkout paths
            ),
        },
    )
    LocalFile Entry

    Copy a local file:

    Python
    from agents.sandbox.entries import LocalFile
    
    manifest = Manifest(
        entries={
            "config": LocalFile(
                path="./config.json",
                dest="config.json",  # Destination path
            ),
        },
    )
    LocalDir Entry

    Copy a local directory:

    Python
    from agents.sandbox.entries import LocalDir
    
    manifest = Manifest(
        entries={
            "data": LocalDir(
                path="./data",
                dest="data",  # Destination path
                exclude=["*.tmp"],  # Exclude patterns
            ),
        },
    )
    InlineFile Entry

    Create a file from inline content:

    Python
    from agents.sandbox.entries import InlineFile
    
    manifest = Manifest(
        entries={
            "readme": InlineFile(
                content="# README\n\nThis is a README file.",
                dest="README.md",
            ),
        },
    )
    RemoteFile Entry

    Download a remote file:

    Python
    from agents.sandbox.entries import RemoteFile
    
    manifest = Manifest(
        entries={
            "data": RemoteFile(
                url="https://example.com/data.json",
                dest="data.json",
                headers={"Authorization": "Bearer token"},
            ),
        },
    )

    Sandbox Capabilities

    Capability System

    Sandbox capabilities define what operations are allowed:

    Python
    from agents.sandbox.capabilities import Capability
    
    class MyCapability(Capability):
        name = "my_capability"
        description = "My custom capability"
        
        async def check(self, context) -> bool:
            """Check if capability is available."""
            return True
    Built-in Capabilities

    The SDK includes several built-in capabilities:

    • ExecCapability - Execute commands
    • ReadCapability - Read files
    • WriteCapability - Write files
    • NetworkCapability - Network access
    • BrowserCapability - Browser automation
    Capability Configuration
    Python
    from agents.sandbox.capabilities import ExecCapability, ReadCapability
    
    capabilities = [
        ExecCapability(allow=["python", "node"]),  # Allow specific commands
        ReadCapability(allow=["/workspace/**"]),  # Allow reading specific paths
        WriteCapability(allow=["/workspace/**"]),  # Allow writing specific paths
    ]

    Sandbox Session Management

    Sandbox Session

    Sandbox sessions maintain state across runs:

    Python
    from agents.sandbox.session import BaseSandboxSession
    
    # Create a session
    session = await client.create_session(manifest)
    
    # Use the session
    config = SandboxRunConfig(session=session)
    
    result1 = await Runner.run(agent, input1, run_config=config)
    result2 = await Runner.run(agent, input2, run_config=config)
    # Both runs use the same sandbox session
    
    # Cleanup
    await session.close()
    Session State

    Sandbox sessions can be resumed:

    Python
    # First run
    result1 = await Runner.run(agent, input1, run_config=config)
    session_state = result1._sandbox
    
    # Later run
    config2 = SandboxRunConfig(
        session_state=session_state,
    )
    
    result2 = await Runner.run(agent, input2, run_config=config2)
    # Resumes from previous state

    Sandbox Snapshots

    Creating Snapshots

    Snapshots capture sandbox state:

    Python
    from agents.sandbox.snapshot import LocalSnapshot
    
    # Create a snapshot
    snapshot = await client.create_snapshot(session)
    
    # Save snapshot reference
    snapshot_id = snapshot.id
    Using Snapshots

    Restore sandbox from a snapshot:

    Python
    config = SandboxRunConfig(
        snapshot="snapshot-123",
    )
    
    result = await Runner.run(
        agent,
        input,
        run_config=config,
    )
    Snapshot Benefits
    • Reproducibility - Exact environment reproduction
    • State Sharing - Share sandbox state across runs
    • Rollback - Return to previous state
    • Testing - Test against known states

    Sandbox Memory

    Memory Rollouts

    Sandbox memory can be rolled out:

    Python
    from agents.sandbox.memory.rollouts import create_rollout
    
    rollout_id = await create_rollout(
        client=client,
        memory_config=MemoryGenerateConfig(
            instructions="Remember important information",
        ),
    )
    
    config = SandboxRunConfig(
        memory_rollout_id=rollout_id,
    )
    Memory Types

    Sandbox supports different memory types:

    • File-based memory - Store in files
    • Database memory - Store in database
    • Custom memory - Implement your own

    Sandbox Execution

    Executing Commands

    Agents can execute commands in the sandbox:

    Python
    @function_tool
    def run_command(context: ToolContext, command: str) -> str:
        """Execute a command in the sandbox."""
        sandbox = context.run_config.sandbox
        result = await sandbox.client.exec(command)
        return result.stdout
    Reading Files

    Agents can read files in the sandbox:

    Python
    @function_tool
    def read_file(context: ToolContext, path: str) -> str:
        """Read a file from the sandbox."""
        sandbox = context.run_config.sandbox
        content = await sandbox.client.read_file(path)
        return content
    Writing Files

    Agents can write files in the sandbox:

    Python
    @function_tool
    def write_file(context: ToolContext, path: str, content: str) -> str:
        """Write a file to the sandbox."""
        sandbox = context.run_config.sandbox
        await sandbox.client.write_file(path, content)
        return f"Wrote {path}"

    Sandbox Errors

    Error Types

    Sandbox operations can raise specific errors:

    Python
    from agents.sandbox.errors import (
        SandboxError,
        ExecTimeoutError,
        ExecTransportError,
        WorkspaceReadNotFoundError,
        WorkspaceWriteTypeError,
    )
    
    try:
        result = await sandbox.client.exec(command)
    except ExecTimeoutError:
        print("Command timed out")
    except ExecTransportError:
        print("Transport error")
    except WorkspaceReadNotFoundError:
        print("File not found")
    Error Handling

    Handle sandbox errors gracefully:

    Python
    @function_tool
    def safe_exec(context: ToolContext, command: str) -> str:
        """Execute command with error handling."""
        try:
            result = await context.run_config.sandbox.client.exec(command)
            return result.stdout
        except ExecTimeoutError:
            return f"Command timed out: {command}"
        except SandboxError as e:
            return f"Sandbox error: {str(e)}"

    Sandbox Best Practices

    1. Use Appropriate Sandboxes

    Choose the right sandbox for your use case:

    Python
    # Good - local for development
    client = UnixLocalSandboxClient()
    
    # Good - cloud for production
    client = E2BSandboxClient()
    
    # Avoid - cloud for simple local testing
    client = E2BSandboxClient()  # Unnecessary overhead
    2. Manage Manifest Size

    Keep manifests efficient:

    Python
    # Good - minimal manifest
    manifest = Manifest(
        entries={
            "repo": GitRepo(repo="owner/repo", depth=1),
        },
    )
    
    # Avoid - bloated manifest
    manifest = Manifest(
        entries={
            "repo": GitRepo(repo="owner/repo"),
            "data": LocalDir(path="./huge_data"),  # Too large
            "more_data": LocalDir(path="./more_data"),
        },
    )
    3. Set Timeouts

    Prevent hanging operations:

    Python
    # Good - reasonable timeout
    client = UnixLocalSandboxClient(timeout=300)
    
    # Avoid - no timeout (could hang forever)
    client = UnixLocalSandboxClient()
    4. Use Snapshots for Reproducibility

    Use snapshots for testing:

    Python
    # Good - use snapshot for testing
    config = SandboxRunConfig(snapshot="test-snapshot")
    
    # Avoid - fresh sandbox each time (less reproducible)
    config = SandboxRunConfig()
    5. Clean Up Sessions

    Always clean up sessions:

    Python
    # Good - explicit cleanup
    try:
        session = await client.create_session(manifest)
        # Use session
    finally:
        await session.close()
    
    # Avoid - leak sessions
    session = await client.create_session(manifest)
    # Forgot to close

    Common Sandbox Patterns

    1. Code Repository Analysis

    Analyze a codebase:

    Python
    agent = SandboxAgent(
        name="code_analyzer",
        instructions="Analyze the codebase structure",
        default_manifest=Manifest(
            entries={
                "repo": GitRepo(repo="owner/repo", ref="main"),
            },
        ),
    )
    
    result = await Runner.run(
        agent,
        "Analyze the repository structure and identify key files",
        run_config=SandboxRunConfig(client=UnixLocalSandboxClient()),
    )
    2. Data Processing Pipeline

    Process data in sandbox:

    Python
    agent = SandboxAgent(
        name="data_processor",
        instructions="Process the data files",
        default_manifest=Manifest(
            entries={
                "data": LocalDir(path="./data"),
                "scripts": LocalDir(path="./scripts"),
            },
        ),
    )
    
    result = await Runner.run(
        agent,
        "Run the processing scripts on the data",
        run_config=SandboxRunConfig(client=UnixLocalSandboxClient()),
    )
    3. Testing Environment

    Run tests in isolated environment:

    Python
    agent = SandboxAgent(
        name="tester",
        instructions="Run the test suite",
        default_manifest=Manifest(
            entries={
                "repo": GitRepo(repo="owner/repo"),
            },
        ),
    )
    
    result = await Runner.run(
        agent,
        "Run the test suite and report results",
        run_config=SandboxRunConfig(client=UnixLocalSandboxClient()),
    )
    4. Build Process

    Build projects in sandbox:

    Python
    agent = SandboxAgent(
        name="builder",
        instructions="Build the project",
        default_manifest=Manifest(
            entries={
                "repo": GitRepo(repo="owner/repo"),
            },
        ),
    )
    
    result = await Runner.run(
        agent,
        "Build the project and report any errors",
        run_config=SandboxRunConfig(client=UnixLocalSandboxClient()),
    )
    5. Documentation Generation

    Generate documentation:

    Python
    agent = SandboxAgent(
        name="doc_generator",
        instructions="Generate documentation",
        default_manifest=Manifest(
            entries={
                "repo": GitRepo(repo="owner/repo"),
            },
        ),
    )
    
    result = await Runner.run(
        agent,
        "Generate API documentation from the code",
        run_config=SandboxRunConfig(client=UnixLocalSandboxClient()),
    )

    Sandbox and Tracing

    Sandbox Tracing

    Sandbox operations are traced:

    Python
    result = await Runner.run(
        agent,
        input,
        run_config=SandboxRunConfig(client=client),
    )
    
    # Trace includes sandbox operations
    print(result.trace)
    Sandbox Span Data

    Sandbox operations create spans:

    Python
    from agents.sandbox.tracing import sandbox_span
    
    with sandbox_span(name="file_operation"):
        await sandbox.client.write_file(path, content)

    Summary

    The Sandbox system provides isolated execution environments. Key takeaways:

    1. SandboxAgent is designed for sandbox environments
    2. Manifest defines the sandbox workspace
    3. SandboxRunConfig configures sandbox execution
    4. UnixLocalSandboxClient provides local Unix sandbox
    5. DockerSandboxClient provides container isolation
    6. Cloud sandboxes (E2B, Modal, etc.) provide cloud hosting
    7. Entries define what's in the sandbox (GitRepo, LocalFile, etc.)
    8. Capabilities control what operations are allowed
    9. Sessions maintain state across runs
    10. Snapshots enable reproducible environments
    11. Memory rollouts provide memory capabilities
    12. Command execution via sandbox tools
    13. File operations (read/write) in sandbox
    14. Error handling for sandbox failures
    15. Timeouts prevent hanging operations
    16. Cleanup prevents resource leaks
    17. Tracing includes sandbox operations
    18. Isolation protects the host system
    19. Statefulness enables long-running tasks
    20. Reproducibility via snapshots

    Sandboxes are essential for agents that need to perform real work with files and commands in a safe, isolated environment.