06-MultiProviderLLM

Multi-Provider LLM Architecture

Overview

The LLM provider system routes requests to multiple AI models through a unified abstraction layer called Rune. It handles model selection, token cost calculation, caching, and provider-specific transformations.

Architecture

Plain text

Request
    ↓
┌─────────────────────────────────────────────────────────────┐
│                      Rune API Layer                          │
│  ┌─────────────────────────────────────────────────────────┐│
│  │ Provider: OpenAI, Anthropic, Google, Mistral, etc.    ││
│  │ Transform: Convert to provider-specific format          ││
│  │ Route: Based on model parameter                        ││
│  └─────────────────────────────────────────────────────────┘│
│                           ↓                                 │
│  ┌─────────────────────────────────────────────────────────┐│
│  │                Token Cost Calculation                  ││
│  │ Base cost × Multipliers:                               ││
│  │ • Web search: ×2                                       ││
│  │ • Data analysis: +15                                   ││
│  │ • RAG: +250 (if Project + ProFinder)                  ││
│  │ • Large context: ×5 (disabled)                        ││
│  └─────────────────────────────────────────────────────────┘│
│                           ↓                                 │
│  ┌─────────────────────────────────────────────────────────┐│
│  │                  Caching Layer                         ││
│  │ Cache last message (prompt caching)                    ││
│  │ Track cached tokens for billing                        ││
│  └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
    ↓
LLM Response Stream

Request ↓ ┌─────────────────────────────────────────────────────────────┐ │ Rune API Layer │ │ ┌─────────────────────────────────────────────────────────┐│ │ │ Provider: OpenAI, Anthropic, Google, Mistral, etc. ││ │ │ Transform: Convert to provider-specific format ││ │ │ Route: Based on model parameter ││ │ └─────────────────────────────────────────────────────────┘│ │ ↓ │ │ ┌─────────────────────────────────────────────────────────┐│ │ │ Token Cost Calculation ││ │ │ Base cost × Multipliers: ││ │ │ • Web search: ×2 ││ │ │ • Data analysis: +15 ││ │ │ • RAG: +250 (if Project + ProFinder) ││ │ │ • Large context: ×5 (disabled) ││ │ └─────────────────────────────────────────────────────────┘│ │ ↓ │ │ ┌─────────────────────────────────────────────────────────┐│ │ │ Caching Layer ││ │ │ Cache last message (prompt caching) ││ │ │ Track cached tokens for billing ││ │ └─────────────────────────────────────────────────────────┘│ └─────────────────────────────────────────────────────────────┘ ↓ LLM Response Stream

Operation	Base Cost	Multiplier	Final Cost
Standard chat	100	1×	100
With web search	100	2×	200
With data analysis	100	1× + 15	115
RAG (Project + ProFinder)	100	1× + 250	350
Multiple searches (3×)	100	(3+1)×	400

Multi-Provider LLM Architecture

Overview

Architecture