InkdownInkdown
Start writing

OpenAI Agents Python

21 filesยท0 subfolders

Shared Workspace

OpenAI Agents Python
00_OVERVIEW.md

08_MODEL_PROVIDERS

Shared from "OpenAI Agents Python" on Inkdown

Model Providers - Comprehensive Deep Dive

Overview

Model Providers are the abstraction layer that connects the OpenAI Agents SDK to various Large Language Model (LLM) APIs. Think of Model Providers as "translators" or "adapters" - they translate the SDK's standardized requests into the specific format required by different LLM providers (OpenAI, Anthropic, Google, etc.).

Core Concepts

What is a Model Provider?

A Model Provider is responsible for:

  • Resolving model names to concrete Model instances
  • Managing model connections and resources
  • Providing a consistent interface across different LLM APIs
  • Handling provider-specific features and quirks
Why Model Providers Matter
  1. Provider Agnostic - Switch between LLM providers without changing agent code
  2. - Use the best model for each task
01_AGENT_SYSTEM.md
02_RUNNER_SYSTEM.md
03_TOOL_SYSTEM.md
04_ITEMS_SYSTEM.md
05_GUARDRAILS.md
06_HANDOFFS.md
07_MEMORY_SESSIONS.md
08_MODEL_PROVIDERS.md
09_SANDBOX_SYSTEM.md
10_TRACING.md
11_RUN_STATE.md
12_CONTEXT.md
13_LIFECYCLE_HOOKS.md
14_CONFIGURATION.md
15_ERROR_HANDLING.md
16_STREAMING.md
17_EXTENSIONS.md
18_MCP_INTEGRATION.md
19_BEST_PRACTICES.md
20_ARCHITECTURE_PATTERNS.md
Flexibility
  • Cost Optimization - Use cheaper models for simple tasks
  • Redundancy - Fallback to alternative providers
  • Experimentation - Easy A/B testing of different models
  • Provider Architecture

    Model Interface

    The base Model interface defines what all models must implement:

    Python
    class Model(abc.ABC):
        """The base interface for calling an LLM."""
        
        @abc.abstractmethod
        async def get_response(
            self,
            system_instructions: str | None,
            input: str | list[TResponseInputItem],
            model_settings: ModelSettings,
            tools: list[Tool],
            output_schema: AgentOutputSchemaBase | None,
            handoffs: list[Handoff],
            tracing: ModelTracing,
            *,
            previous_response_id: str | None,
            conversation_id: str | None,
            prompt: ResponsePromptParam | None,
        ) -> ModelResponse:
            """Get a response from the model."""
            pass
        
        @abc.abstractmethod
        def stream_response(
            self,
            system_instructions: str | None,
            input: str | list[TResponseInputItem],
            model_settings: ModelSettings,
            tools: list[Tool],
            output_schema: AgentOutputSchemaBase | None,
            handoffs: list[Handoff],
            tracing: ModelTracing,
            *,
            previous_response_id: str | None,
            conversation_id: str | None,
            prompt: ResponsePromptParam | None,
        ) -> AsyncIterator[TResponseStreamEvent]:
            """Stream a response from the model."""
            pass
    ModelProvider Interface

    The ModelProvider interface defines how models are resolved:

    Python
    class ModelProvider(abc.ABC):
        """The base interface for a model provider."""
        
        @abc.abstractmethod
        def get_model(self, model_name: str | None) -> Model:
            """Get a model by name."""
            pass
        
        async def aclose(self) -> None:
            """Release any resources held by the provider."""
            return None

    OpenAI Provider

    OpenAIProvider

    The default provider for OpenAI models:

    Python
    from agents import Agent, OpenAIProvider, Runner
    
    provider = OpenAIProvider()
    
    agent = Agent(
        name="assistant",
        instructions="You are a helpful assistant",
        model="gpt-4o",
    )
    
    result = await Runner.run(
        agent,
        "Hello",
        model_provider=provider,
    )
    OpenAI Chat Completions Model

    Uses the OpenAI Chat Completions API:

    Python
    from agents import Agent, OpenAIChatCompletionsModel, Runner
    
    model = OpenAIChatCompletionsModel(model="gpt-4o")
    
    agent = Agent(
        name="assistant",
        instructions="You are a helpful assistant",
    )
    
    result = await Runner.run(
        agent,
        "Hello",
        model=model,
    )

    When to use:

    • You want to use the Chat Completions API
    • You need compatibility with older OpenAI integrations
    • You want more control over the API
    OpenAI Responses Model

    Uses the newer OpenAI Responses API:

    Python
    from agents import Agent, OpenAIResponsesModel, Runner
    
    model = OpenAIResponsesModel(model="gpt-4o")
    
    agent = Agent(
        name="assistant",
        instructions="You are a helpful assistant",
    )
    
    result = await Runner.run(
        agent,
        "Hello",
        model=model,
    )

    When to use:

    • You want the latest OpenAI features
    • You need better tool support
    • You want server-managed conversations
    • You need prompt caching
    OpenAI Responses WebSocket Model

    Uses WebSocket transport for Responses API:

    Python
    from agents import Agent, OpenAIResponsesWSModel, Runner
    
    model = OpenAIResponsesWSModel(model="gpt-4o")
    
    agent = Agent(
        name="assistant",
        instructions="You are a helpful assistant",
    )
    
    result = await Runner.run(
        agent,
        "Hello",
        model=model,
    )

    When to use:

    • You want real-time streaming
    • You need lower latency
    • You're building real-time applications
    Default Model Selection

    The SDK has a default model:

    Python
    from agents import get_default_model_settings
    
    # Default is currently gpt-4.1
    settings = get_default_model_settings()
    print(settings.model)  # "gpt-4.1"
    GPT-5 Special Handling

    GPT-5 models require special reasoning settings:

    Python
    from agents import Agent, ModelSettings
    
    agent = Agent(
        name="assistant",
        instructions="You are a helpful assistant",
        model="gpt-5-preview",
        model_settings=ModelSettings(
            # GPT-5 requires specific settings
            reasoning_effort="high",
        ),
    )

    The SDK automatically adjusts settings when you specify a GPT-5 model.

    MultiProvider

    Using Multiple Providers

    MultiProvider allows using multiple model providers:

    Python
    from agents import Agent, MultiProvider, OpenAIProvider, AnthropicProvider
    
    provider = MultiProvider([
        OpenAIProvider(),
        AnthropicProvider(),
    ])
    
    agent = Agent(
        name="assistant",
        instructions="You are a helpful assistant",
        model="gpt-4o",  # Will use OpenAI
    )
    Provider Priority

    Providers are tried in order:

    Python
    provider = MultiProvider([
        OpenAIProvider(),  # Tried first
        AnthropicProvider(),  # Tried if OpenAI fails
        GoogleProvider(),  # Tried if both fail
    ])
    Model Name Resolution

    Different providers use different model names:

    Python
    provider = MultiProvider([
        OpenAIProvider(),
        AnthropicProvider(),
    ])
    
    # This will try to resolve "gpt-4o" with OpenAI first
    # If that fails, it will try with Anthropic (which won't have "gpt-4o")
    model = provider.get_model("gpt-4o")

    Custom Model Providers

    Creating a Custom Provider

    Implement the ModelProvider interface:

    Python
    from agents import ModelProvider, Model
    
    class CustomProvider(ModelProvider):
        def __init__(self, api_key: str):
            self.api_key = api_key
            self.client = CustomClient(api_key)
        
        def get_model(self, model_name: str | None) -> Model:
            """Get a model instance."""
            return CustomModel(
                model_name or "default-model",
                self.client,
            )
        
        async def aclose(self) -> None:
            """Release resources."""
            await self.client.close()
    Creating a Custom Model

    Implement the Model interface:

    Python
    from agents import Model, ModelResponse, ModelSettings
    from typing import AsyncIterator
    
    class CustomModel(Model):
        def __init__(self, model_name: str, client: CustomClient):
            self.model_name = model_name
            self.client = client
        
        async def get_response(
            self,
            system_instructions: str | None,
            input: str | list[TResponseInputItem],
            model_settings: ModelSettings,
            tools: list[Tool],
            output_schema: AgentOutputSchemaBase | None,
            handoffs: list[Handoff],
            tracing: ModelTracing,
            *,
            previous_response_id: str | None,
            conversation_id: str | None,
            prompt: ResponsePromptParam | None,
        ) -> ModelResponse:
            """Get response from custom API."""
            # Convert to custom API format
            custom_input = self.convert_input(input)
            custom_tools = self.convert_tools(tools)
            
            # Call custom API
            response = await self.client.chat(
                model=self.model_name,
                messages=custom_input,
                tools=custom_tools,
                **self.convert_settings(model_settings),
            )
            
            # Convert back to SDK format
            return self.convert_response(response)
        
        def stream_response(
            self,
            system_instructions: str | None,
            input: str | list[TResponseInputItem],
            model_settings: ModelSettings,
            tools: list[Tool],
            output_schema: AgentOutputSchemaBase | None,
            handoffs: list[Handoff],
            tracing: ModelTracing,
            *,
            previous_response_id: str | None,
            conversation_id: str | None,
            prompt: ResponsePromptParam | None,
        ) -> AsyncIterator[TResponseStreamEvent]:
            """Stream response from custom API."""
            async for chunk in self.client.chat_stream(...):
                yield self.convert_chunk(chunk)
    Using Custom Provider
    Python
    provider = CustomProvider(api_key="your-api-key")
    
    agent = Agent(
        name="assistant",
        instructions="You are a helpful assistant",
    )
    
    result = await Runner.run(
        agent,
        "Hello",
        model_provider=provider,
        model="custom-model",
    )

    Model Settings

    ModelSettings Class

    Configure model-specific parameters:

    Python
    from agents import ModelSettings
    
    settings = ModelSettings(
        temperature=0.7,  # 0.0 - 2.0, higher = more creative
        top_p=0.9,  # 0.0 - 1.0, nucleus sampling
        max_tokens=1000,  # Maximum tokens in response
        presence_penalty=0.0,  # -2.0 - 2.0
        frequency_penalty=0.0,  # -2.0 - 2.0
    )
    Temperature

    Controls randomness:

    Python
    # Low temperature - more deterministic
    settings = ModelSettings(temperature=0.1)
    
    # High temperature - more creative
    settings = ModelSettings(temperature=1.5)

    Use cases:

    • Low (0.0-0.3): Factual responses, code generation
    • Medium (0.4-0.7): General conversation
    • High (0.8-1.5): Creative writing, brainstorming
    Max Tokens

    Limit response length:

    Python
    # Short responses
    settings = ModelSettings(max_tokens=100)
    
    # Long responses
    settings = ModelSettings(max_tokens=4000)
    Penalties

    Control repetition:

    Python
    # Presence penalty - encourage new topics
    settings = ModelSettings(presence_penalty=0.5)
    
    # Frequency penalty - discourage repetition
    settings = ModelSettings(frequency_penalty=0.5)

    Model Tracing

    ModelTracing Enum

    Controls tracing behavior:

    Python
    from agents import ModelTracing
    
    # Tracing disabled
    tracing = ModelTracing.DISABLED
    
    # Tracing enabled with data
    tracing = ModelTracing.ENABLED
    
    # Tracing enabled without sensitive data
    tracing = ModelTracing.ENABLED_WITHOUT_DATA
    Tracing in Model Calls
    Python
    await model.get_response(
        ...,
        tracing=ModelTracing.ENABLED,
    )

    Model Retry

    Retry Advice

    Models can provide retry advice:

    Python
    from agents import ModelRetryAdviceRequest, ModelRetryAdvice
    
    class CustomModel(Model):
        def get_retry_advice(
            self,
            request: ModelRetryAdviceRequest,
        ) -> ModelRetryAdvice | None:
            """Provide retry advice for failed requests."""
            if request.error_type == "rate_limit":
                return ModelRetryAdvice(
                    can_retry=True,
                    retry_after=60,  # Retry after 60 seconds
                )
            return None
    Retry Policies

    Configure retry behavior:

    Python
    from agents import retry_policies, RetryPolicy
    
    policy = retry_policies.exponential_backoff(
        max_retries=3,
        initial_delay=1.0,
    )
    
    agent = Agent(
        name="assistant",
        instructions="You are a helpful assistant",
        model_settings=ModelSettings(
            retry_policy=policy,
        ),
    )

    Provider-Specific Features

    OpenAI Features

    Server-Managed Conversations:

    Python
    result = await Runner.run(
        agent,
        input,
        conversation_id="conv-123",
        previous_response_id="resp-456",
    )

    Prompt Caching:

    Python
    settings = ModelSettings(
        enable_prompt_cache=True,
    )

    Reasoning Models:

    Python
    settings = ModelSettings(
        reasoning_effort="high",  # For GPT-5
    )
    Anthropic Features (via LiteLLM)

    Using Anthropic through LiteLLM:

    Python
    from agents import Agent, LiteLLMModel
    
    model = LiteLLMModel(model="anthropic/claude-3-opus")
    
    agent = Agent(
        name="assistant",
        instructions="You are a helpful assistant",
        model=model,
    )
    Google Features (via LiteLLM)

    Using Google through LiteLLM:

    Python
    model = LiteLLMModel(model="gemini/gemini-pro")
    
    agent = Agent(
        name="assistant",
        instructions="You are a helpful assistant",
        model=model,
    )

    Model Selection Strategies

    Per-Agent Model Selection

    Different agents can use different models:

    Python
    # Fast model for simple tasks
    quick_agent = Agent(
        name="quick",
        instructions="Quick responses",
        model="gpt-4o-mini",
    )
    
    # Smart model for complex tasks
    smart_agent = Agent(
        name="smart",
        instructions="Complex reasoning",
        model="gpt-4o",
    )
    Run-Level Model Override

    Override model for a specific run:

    Python
    config = RunConfig(
        model="gpt-4o",  # Override all agents
    )
    
    result = await Runner.run(
        agent,
        input,
        run_config=config,
    )
    Dynamic Model Selection

    Select model based on context:

    Python
    def select_model(context: RunContextWrapper) -> str:
        """Select model based on context."""
        if context.context.complexity == "high":
            return "gpt-4o"
        return "gpt-4o-mini"
    
    # Apply via model settings or custom provider

    Model Configuration

    Global Default Configuration

    Set global defaults:

    Python
    from agents import set_default_openai_key, set_default_openai_api
    
    set_default_openai_key("your-api-key")
    set_default_openai_api("responses")  # Use Responses API by default
    Agent-Level Configuration

    Configure model on agent:

    Python
    agent = Agent(
        name="assistant",
        instructions="You are a helpful assistant",
        model="gpt-4o",
        model_settings=ModelSettings(temperature=0.7),
    )
    Run-Level Configuration

    Configure model for a run:

    Python
    config = RunConfig(
        model="gpt-4o",
        model_settings=ModelSettings(temperature=0.7),
    )
    
    result = await Runner.run(
        agent,
        input,
        run_config=config,
    )
    Configuration Priority

    Priority (highest to lowest):

    1. RunConfig.model
    2. Agent.model
    3. Global default

    Model Best Practices

    1. Choose Appropriate Models

    Select the right model for the task:

    Python
    # Good - appropriate model for task
    quick_response = Agent(
        name="quick",
        instructions="Simple responses",
        model="gpt-4o-mini",  # Fast, cheap
    )
    
    complex_reasoning = Agent(
        name="complex",
        instructions="Complex reasoning",
        model="gpt-4o",  # More capable
    )
    
    # Avoid - overkill for simple tasks
    simple_task = Agent(
        name="simple",
        instructions="Simple task",
        model="gpt-4o",  # Unnecessary expense
    )
    2. Configure Temperature Appropriately

    Set temperature based on task:

    Python
    # Good - factual, low temperature
    code_agent = Agent(
        name="coder",
        instructions="Write code",
        model_settings=ModelSettings(temperature=0.1),
    )
    
    # Good - creative, high temperature
    writer = Agent(
        name="writer",
        instructions="Creative writing",
        model_settings=ModelSettings(temperature=0.9),
    )
    
    # Avoid - wrong temperature for task
    code_agent = Agent(
        name="coder",
        instructions="Write code",
        model_settings=ModelSettings(temperature=1.5),  # Too creative
    )
    3. Set Reasonable Token Limits

    Control response length:

    Python
    # Good - appropriate limits
    summary = Agent(
        name="summarizer",
        instructions="Summarize briefly",
        model_settings=ModelSettings(max_tokens=200),
    )
    
    # Avoid - excessive limits
    summary = Agent(
        name="summarizer",
        instructions="Summarize briefly",
        model_settings=ModelSettings(max_tokens=4000),  # Too long
    )
    4. Handle Model Failures

    Implement fallback logic:

    Python
    provider = MultiProvider([
        OpenAIProvider(),  # Primary
        AnthropicProvider(),  # Fallback
    ])
    
    # If primary fails, secondary is tried automatically
    5. Monitor Model Usage

    Track model usage and costs:

    Python
    result = await Runner.run(agent, input)
    print(result.usage)
    # Usage(request_tokens=100, response_tokens=50, total_tokens=150)

    Common Patterns

    1. Tiered Model Usage

    Use different models for different complexity:

    Python
    def get_model_for_task(complexity: str) -> str:
        if complexity == "high":
            return "gpt-4o"
        elif complexity == "medium":
            return "gpt-4o-mini"
        return "gpt-3.5-turbo"
    
    agent = Agent(
        name="adaptive",
        instructions="Adaptive responses",
        model=get_model_for_task("medium"),
    )
    2. Model A/B Testing

    Test different models:

    Python
    import random
    
    model = random.choice(["gpt-4o", "claude-3-opus"])
    
    agent = Agent(
        name="test",
        instructions="Test agent",
        model=model,
    )
    3. Cost Optimization

    Use cheaper models when possible:

    Python
    # Use cheap model for classification
    classifier = Agent(
        name="classifier",
        instructions="Classify the input",
        model="gpt-4o-mini",
    )
    
    # Use expensive model only for generation
    generator = Agent(
        name="generator",
        instructions="Generate content",
        model="gpt-4o",
    )
    4. Model-Specific Prompts

    Adjust prompts per model:

    Python
    def get_instructions_for_model(model: str) -> str:
        if "gpt-4" in model:
            return "You are GPT-4, be thorough."
        elif "claude" in model:
            return "You are Claude, be helpful."
        return "You are a helpful assistant."
    
    agent = Agent(
        name="adaptive",
        instructions=get_instructions_for_model("gpt-4o"),
    )
    5. Provider Redundancy

    Ensure availability with multiple providers:

    Python
    provider = MultiProvider([
        OpenAIProvider(),
        AnthropicProvider(),
        GoogleProvider(),
    ])
    
    # If one provider is down, others are tried

    Model and Tracing

    Model-Level Tracing

    Models can emit traces:

    Python
    from agents import ModelTracing
    
    await model.get_response(
        ...,
        tracing=ModelTracing.ENABLED,
    )
    Sensitive Data Handling

    Control what's traced:

    Python
    # Include all data
    tracing = ModelTracing.ENABLED
    
    # Exclude sensitive data
    tracing = ModelTracing.ENABLED_WITHOUT_DATA
    
    # Disable tracing
    tracing = ModelTracing.DISABLED

    Summary

    Model Providers enable flexible LLM integration. Key takeaways:

    1. Model Providers abstract LLM API differences
    2. Model interface defines the contract for all models
    3. ModelProvider interface defines model resolution
    4. OpenAIProvider is the default provider
    5. OpenAIChatCompletionsModel uses the Chat Completions API
    6. OpenAIResponsesModel uses the newer Responses API
    7. OpenAIResponsesWSModel uses WebSocket transport
    8. MultiProvider enables using multiple providers
    9. Custom providers can integrate any LLM API
    10. ModelSettings configures model parameters
    11. Temperature controls randomness
    12. Max tokens limits response length
    13. Penalties control repetition
    14. ModelTracing controls observability
    15. Retry advice provides error recovery hints
    16. Provider-specific features like server-managed conversations
    17. LiteLLM enables using 100+ LLM providers
    18. Per-agent model selection for different tasks
    19. Run-level overrides for specific runs
    20. Configuration priority determines which settings apply

    Model Providers are essential for building flexible, cost-effective, and resilient agent systems.