Bonkers — End-to-End System Guide (Knowledge Base)
Confidential Internal Document — Preserves complete architectural context of the Bonkers image generation platform and its backend (Merlin Arcane / Cauldron monorepo).
Shared from "Interview Questions" on Inkdown
Confidential Internal Document — Preserves complete architectural context of the Bonkers image generation platform and its backend (Merlin Arcane / Cauldron monorepo).
[User Browser/Extension]
│
▼
┌───────────────────────────────────────────────────────┐
│ Vercel (Edge + Serverless) │
│ ┌─────────────────────────────────────────────┐ │
│ │ Next.js 14 App (bonkers/) │ │
│ │ ├─ App Router (i18n, auth, SSR) │ │
│ │ ├─ React Query (client cache) │ │
│ │ ├─ Zustand (UI state) │ │
│ │ ├─ next-auth v5 (JWT + Firebase) │ │
│ │ └─ next-intl (i18n) │ │
│ └─────────────────────────────────────────────┘ │
└──────────┬────────────────────────────────────────────┘
│ HTTPS + SSE
▼
┌───────────────────────────────────────────────────────┐
│ Google Cloud Run (Arcane Backend) │
│ ┌─────────────────────────────────────────────┐ │
│ │ Express + express-zod-api (merlin-arcane)│ │
│ │ ├─ Auth middleware (Firebase ID Token) │ │
│ │ ├─ Streaming SSE endpoints │ │
│ │ ├─ Tool Orchestrator │ │
│ │ ├─ LLM Provider abstraction (Rune) │ │
│ │ └─ Image Gen providers (FAL, Replicate) │ │
│ └─────────────────────────────────────────────┘ │
└──────────┬────────────────────────────────────────────┘
│
┌─────┼──────────┬──────────────┬──────────────────┐
▼ ▼ ▼ ▼ ▼
┌────────┐┌────────┐┌──────────┐┌───────────────┐┌────────────┐
│Firebase││ Redis ││LlamaIndex││ 3rd-party AI ││SendGrid │
│Firestore││(ioredis)││(RAG API) ││ OpenAI,Anthro-││Mailchimp │
│+ Auth ││ ││ ││ pic,GoogleAI ││ │
└────────┘└────────┘└──────────┘└───────────────┘└────────────┘
│
▼
┌───────────────────────────────────────────────────────┐
│ Image Generation Providers │
│ ├─ fal.ai (Flux, Ideogram, Recraft, Bria, etc.) │
│ ├─ Replicate (Flux, Ideogram, HiDream) │
│ ├─ OpenAI (GPT Image 1, DALL-E) │
│ ├─ Google Vertex AI (Imagen, Gemini) │
│ ├─ Ideogram API │
│ └─ Midjourney (via GoAPI) │
└───────────────────────────────────────────────────────┘Bonkers (bonkers/ — codename "cauldron"):
bonkers/ # pnpm monorepo
├── apps/
│ ├── website/ # Next.js 14 (App Router) — main app
│ ├── extension/ # Chrome extension (git submodule)
│ └── session-manager/ # Express server for cookie chunking
├── packages/
│ ├── types/ # Shared TS types
│ ├── utils/ # Shared utilities
│ ├── hooks/ # Shared React hooks
│ ├── components/ # Shared UI (CircularTimer, etc.)
│ ├── assets/ # Static assets
│ └── config/ # ESLint, Prettier, TypeScript configs
├── app-config/ # Feature flags, prompts (git submodule)
├── patches/ # Patched deps (react-arborist)
└── deploy.sh # Vercel deploy gateMerlin Arcane (merlin-arcane/):
merlin-arcane/ # TypeScript (Express 5)
├── src/
│ ├── index.ts # Server entry point (express-zod-api)
│ ├── config/
│ │ ├── config.ts # Server config (CORS, routes, ports)
│ │ ├── routing.ts # All API route definitions (557 lines)
│ │ └── logger.ts # Pino logger w/ cloud severity mapping
│ └── server/
│ ├── factories/ # Endpoint factory (authStreamEndpointsFactory)
│ ├── middlewares/
│ │ ├── auth/ # Firebase ID Token verification
│ │ ├── initContext/ # Request context initialization
│ │ ├── threadPreware/ # Pre-chat processing (loads Thread, User)
│ │ ├── threadPostware/ # Post-chat processing (runs ToolOrchestrator)
│ │ ├── usageLimits/ # Quota enforcement
│ │ └── usageAnalytics/ # Usage tracking
│ ├── endpoints/
│ │ ├── unified/ # Core chat ML pipeline
│ │ ├── wallflower/ # Bonkers image generation
│ │ ├── projects/ # Collaboration system
│ │ ├── tools/ # Text/image tools
│ │ ├── user/ # User data endpoints
│ │ ├── chatbots/ # Chatbot marketplace
│ │ └── v2/ # V2 API endpoints
│ ├── models/
│ │ ├── thread.ts # Thread model (Firestore-backed, 1251 lines)
│ │ ├── message.ts # Message model
│ │ └── user.ts # User model (Firebase claims)
│ ├── repositories/
│ │ ├── engine/ # Context window management + trimming
│ │ ├── provider/ # LLM provider abstraction (Rune)
│ │ ├── streamer/ # SSE streaming engine
│ │ ├── irc/ # Inter-request communication (Redis)
│ │ └── sideActions/ # Concurrent side-action runner
│ ├── services/
│ │ ├── firebase.ts # Firebase Admin SDK (merlindb)
│ │ ├── redis.ts # ioredis (pub/sub, cache)
│ │ ├── llamaindex.ts # RAG vector store client
│ │ └── axios.ts # Shared axios instance
│ └── utilities/
│ ├── usage.ts # Usage increment logic (818 lines)
│ └── llm.ts # LLM model resolutionThe website uses Next.js 14 App Router with:
next-intl with [lang] prefix (27 languages)next-auth v5 (beta) with Credentials provider@tanstack/react-query v5 with async persistence (24h gc)Zustand v5 (auth UI, SSE, attachments)class strategy)| Route | Purpose |
|---|---|
/{lang}/bonkers | Bonkers image generation — canvas, model selection, generation |
/{lang}/old-bonkers | Legacy Bonkers |
/{lang}/chat/[[...chatId]] | Main chat interface |
/{lang}/creations/[iid] | Shared creations (public gallery) |
/{lang}/pricing | Subscription plans |
/{lang}/old-profile | User profile, settings, subscription |
/{lang}/old-vault | Knowledge vault (file management) |
/{lang}/updates | Changelog |
/{lang}/user/[id] | Public user profile |
/{lang}/templates/[id] | Templates |
/{lang}/ai-tools | AI tools directory |
/{lang}/ai-detection | AI content detection |
/{lang}/plagiarism-checker | Plagiarism checker |
/{lang}/ai-humanizer | AI text humanizer |
/{lang}/new-old-chat/history | Chat history |
/{lang}/new-old-chat/projects | Projects workspace |
/{lang}/new-old-chat/share/[chatId] | Shared chat view |
Multi-layer authentication:
__Secure-merlin-session_0..3)merlin-auth) — cross-tab syncchrome.runtime.sendMessageZustand session store (userSessionStore.ts):
isFree, isPaid, isOwner, isPro, isBonkersBasic, isBonkersPro, isAppSumo, etc.React Query setup:
QueryClient with 24-hour garbage collectionv2.99) forces cache reset on deploywaitForAuthInit)Axios instances: Two typed instances with interceptors:
ArcaneAxiosInstance → .../arcane/api (attaches Firebase token)UAMAxiosInstance → uam.getmerlin.inx-merlin-version headerThe frontend uses @microsoft/fetch-event-source for SSE connections. The sseBaseSecureStore (Zustand) handles:
message (text/reasoning/progress), attachments, references, usage, metadataExpress 5 + express-zod-api:
authEndpointsFactory, authStreamEndpointsFactory, usageLimitsStreamEndpointsFactoryThe end-to-end flow for a chat message:
HTTP POST /v1/thread/unified
│
├─ 1. authMiddleware
│ └─ Verify Firebase ID Token from Authorization header
│ └─ Load User from Firestore (plan, features, usage)
│ └─ Set requestContext.user
│
├─ 2. usageLimitsMiddleware
│ └─ Check daily/monthly usage against plan limits
│ └─ Throw if over limit
│
├─ 3. threadPreware (Middleware)
│ ├─ Load Thread from Firestore (or create new)
│ ├─ Load user settings V3
│ ├─ Load personalization + profile memories
│ ├─ Process attachments (LlamaIndex embeddings)
│ ├─ Check project permissions (if projectId)
│ ├─ Initialize SSE stream (writeHead 200, text/event-stream)
│ ├─ Create UserMessage + AssistantMessage (pending)
│ ├─ Fetch active thread with embeddings (RAG)
│ ├─ Initialize Schema (model, context window, max tokens)
│ └─ Fire side actions (content moderation, chatbot loading)
│
├─ 4. providerConfigOverrideMiddleware
│ └─ Override LLM provider config if needed
│
├─ 5. unifiedController (Middleware)
│ ├─ Wait for content moderation result
│ ├─ ChatStateManager — conflict resolution for mode combinations
│ ├─ Merlin Magic → model selection engine
│ ├─ Deep Research → handleDeepResearch (for Mobile)
│ ├─ MCP Plugin → getMCPResults
│ └─ Attach PROGRESS events to assistantMessage
│
├─ 6. threadPostware (Middleware) — MAIN EXECUTION
│ ├─ If Deep Research → spawn deepResearchAgent
│ ├─ Else → create ToolRegistry → ToolOrchestrator.run()
│ ├─ Orchestrator loop:
│ │ ├─ Build messages (engine trimming)
│ │ ├─ Call LLM provider (via Rune)
│ │ ├─ Stream response via SSE
│ │ ├─ Process tool calls → execute tools
│ │ ├─ Repeat until done or max iterations
│ │ └─ Engine trims context window after each iteration
│ ├─ Attach references + attachments to response
│ ├─ Set chat title from first user message
│ ├─ Update shortcut preferences
│ └─ Increment user usage
│
├─ 7. usageAnalyticsMiddleware
│ └─ Log usage to BigQuery
│
└─ Response: Empty 200 (all data streamed via SSE)The entire request shares a requestContext — a AsyncLocalStorage-based store that carries:
user (plan, uid, email, usage)chatNode (Thread model instance)userMessageNode, assistantMessageNodeschema (Schema model — controls prompt/messages/model)chatStateManager (mode resolution)eventManager (SSE progress events)response (raw Express res for SSE streaming)executionContext (USER vs TASK)decisionLog (tool orchestration debug log)logFields (structured logging context)firebase.auth().currentUser.getIdToken())Authorization: Bearer <token>authMiddleware verifies via firebase.auth().verifyIdToken(token)User model instance with computed propertiesmodels/user.ts)customers/{uid} Firestore documentuserPlan (free, pro, teams, bonkers_basic, bonkers_pro, apprentice_sumo, etc.)userType (owner, member, etc.)The user.features map contains per-feature objects:
{
"chat": { "usage": 42, "resetsAt": 1717000000000, "limit": 500 },
"webSearch": { "usage": 10, "resetsAt": ..., "limit": 50 },
"imageGeneration": { "usage": 5, "resetsAt": ..., "limit": 100 },
"webPdfChat": { "usage": 3, "resetsAt": ..., "limit": 50 },
"deepResearch": { "usage": 1, "resetsAt": ..., "limit": 20 },
...
}repositories/engine/)The engine manages the LLM context window — the critical part that decides what fits in the prompt:
Context trimming strategies:
FULL — include everything (when context window is large enough)TOOL_PROVIDED_SUMMARY_IF_POSSIBLE — use tool-provided summaries for tool resultsSUMMARY — LLM-generated summary of historyThree sections of the context window:
HISTORY — past conversation turnsIN_LOOP — current tool call iteration messagesCURRENT_MESSAGE — the latest assistant response + tool resultsLayout optimization:
chooseOptimalLayout picks cheapest valid layout within context limitPREFERRED_TRIMMING_LAYOUTS until one fitsrepositories/provider/)rune.siddhartha-5c5.workers.devPayload structure sent to Rune:
{
"config": { "messages": [...] },
"mode": "CHAT",
"params": { "model": "claude-3.5-sonnet", "tools": [...] },
"metaData": { "systemPrompt": "...", "apiKey": "..." }
}repositories/streamer/)Two streaming versions:
streamAsToolResult flagSSE Event types:
| Event | Purpose |
|---|---|
message | Text content, reasoning content, tool results |
progress | Progress step lifecycle (init → in_progress → done) |
attachments | Generated images, web search links, citations |
references | Citation references with document index mapping |
usage | Token usage after completion |
metadata | Model info, MCP context |
init_message_content | Signals start of content streaming |
features | Identifies active features (e.g., "AGENTIC_RESEARCH") |
Content Indexing (V2):
Each tool result in the response has a unique contentIndex. The streamer maps text, reasoning, and progress events to the correct index so the frontend can render them in the right position within the message tree.
The ChatStateManager is the central decision-maker for a chat request. It:
ToolRegistry tools to enableMode compatibility rules:
IMAGE mode cannot combine with RAG, DEEP_RESEARCH, or MCPMERLIN_MAGIC auto-selects best modelDEEP_RESEARCH takes priority and runs agent sub-pipelineMCP injects plugin results before LLM callLARGE_CONTEXT gives full context window (no trimming)repositories/engine/schema.ts)The Schema is a prompt builder that:
Frontend (bonkers/website):
/bonkers route with full image generation canvasBackend (Wallflower System — /v1/wallflower/):
| Model ID | Provider | Type |
|---|---|---|
black-forest-labs/flux-schnell | Replicate / FAL | Text-to-Image |
black-forest-labs/flux-1.1-pro | Replicate / FAL | Text-to-Image |
black-forest-labs/flux-1.1-pro-ultra | Replicate / FAL | Text-to-Image |
black-forest-labs/flux-pro | Replicate / FAL | Text-to-Image |
recraft-ai/recraft-v3 | Replicate / FAL | Text-to-Image |
ideogram-ai/ideogram-v2-turbo | Replicate | Text-to-Image |
ideogram-ai/ideogram-v3-turbo | Replicate / FAL | Text-to-Image |
prunaai/hidream-l1-fast | Replicate / FAL | Text-to-Image |
fal-ai/flux-pro/v1/fill | FAL | Inpainting |
fal-ai/bria/eraser | FAL | Erasing |
fal-ai/clarity-upscaler | FAL | Upscaling |
fal-ai/bria/background/replace | FAL | Background Edit |
fal-ai/ghiblify | FAL | Style Transfer |
851-labs/background-remover | Replicate | Background Removal |
google/imagen-4 | Replicate / FAL | Text-to-Image |
gemini-2.0-flash-exp | Google Vertex AI | Text-to-Image |
gpt-image-1-high/medium/low | OpenAI | Text-to-Image |
midjourney-v6.1-relax | GoAPI | Text-to-Image |
fal-ai/bytedance/seedream/v3 | FAL | Text-to-Image |
Bonkers presents simplified model names to users, mapped to concrete providers:
| Abstract Name | Maps To | Type |
|---|---|---|
bonkers-lite | prunaai/hidream-l1-fast | Fast generation |
bonkers-advance | fal-ai/ideogram/v3 | High quality |
bonkers-magic-fill | fal-ai/ideogram/v3/edit | Inpainting |
bonkers-remix | gpt-image-1-medium | Image remixing |
bonkers-upscale | fal-ai/clarity-upscaler | Upscaling |
bonkers-magic-erase | fal-ai/bria/eraser | Object removal |
bonkers-bg-edit | fal-ai/bria/background/replace | Background change |
bonkers-bg-erase | 851-labs/background-remover | Background removal |
bonkers-omni-edit | fal-ai/flux-pro/kontext/multi | Multi-image editing |
Client POST /v1/wallflower/image-generation
│
├─ authMiddleware (Firebase token)
├─ usageLimitsStreamEndpointsFactory
│
├─ wallflowerImageGenerationController:
│ ├─ Validate input (prompt, modelConfig, style, isPublic)
│ ├─ Check if user exists in wallflower collection
│ ├─ Check prompt for NSFW/flagged content
│ ├─ Apply style modifiers to prompt
│ │ └─ PRESET_STYLES_MAP: Auto, Anime, Realistic, Vintage, etc.
│ ├─ Route to provider handler:
│ │ ├─ Replicate: handleReplicateGeneration()
│ │ └─ FAL AI: handleFalAIImageGeneration()
│ │
│ ├─ Each handler:
│ │ ├─ Calls provider API (REST)
│ │ ├─ Waits for webhook callback or polls
│ │ ├─ Downloads generated images
│ │ ├─ Uploads to GCS (wallflower-images bucket)
│ │ └─ Returns formatted TImageGenerationPost
│ │
│ ├─ Emit SSE events: attachments, usage
│ ├─ Calculate usage cost:
│ │ └─ queryCost * numberOfImages (from MODEL_COSTS)
│ ├─ Save image post to Firestore
│ └─ Increment user usage
│
└─ Streams generated images back to clientThe Wallflower system supports 9 image editing features:
Each feature type can use a Magic Prompt system:
MAGIC_PROMPT_MODELS mapwallflower-imageswallflower/{userId}/images/ — user's image documents/v1/wallflower/images endpointToolOrchestrator.run()
│
├─ 1. Build messages from chat history + system prompt
├─ 2. Get tool definitions from ToolRegistry
├─ 3. Call LLM → get response (streaming)
│
├─ 4. If response has tool_calls:
│ ├─ Decide which tools to execute (filterToolCallsByPolicy)
│ ├─ Execute tools (parallel or sequential)
│ │ ├─ Web Search (Tavily/SerpAPI/Firecrawl)
│ │ ├─ RAG (knowledge vault)
│ │ ├─ Image Generation
│ │ ├─ Data Analysis (E2B sandbox)
│ │ ├─ MCP (external plugins)
│ │ ├─ Memory (mem0)
│ │ ├─ Chatbot (marketplace bots)
│ │ └─ Think (internal reasoning)
│ ├─ Format tool results
│ ├─ Engine: trim context window
│ └─ Go to step 3 (loop)
│
└─ 5. No more tool calls → finalize responseAvailable tools are registered in a ToolRegistry instance, keyed by function name:
| Tool | Description |
|---|---|
webSearch | Internet search via Tavily/SerpAPI/Firecrawl with academic/social/Youtube focus modes |
rag | Knowledge vault retrieval (LlamaIndex) |
imageGen | Image generation via Wallflower |
dataAnalysis | Code execution in E2B sandbox |
mcp | Model Context Protocol (external plugins) |
memory | User memory (mem0) |
think | Internal chain-of-thought reasoning |
craft | Canvas/craft creation |
chatbot | Marketplace chatbot queries |
deepResearchWebSearch | Web search for deep research |
createTodo / getTodo / updateTodo / markTodo / dumpFinding / feedbackQuestions / getSearchHistory / reportGeneration / researcherAgent | Deep research agent tools |
Different agent configurations determine behavior:
TOOL_RESULTS_CONTEXT_SUMMARY_TOKENSDatabase: Firebase project foyer-work, database merlindb
Collections:
| Collection | Path | Purpose |
|---|---|---|
customers/{uid} | customers/{uid} | User profile, plan, features, usage, settings |
customers/{uid}/chats/{chatId} | Chats per user | Chat metadata |
customers/{uid}/chats/{chatId}/thread/{docId} | Thread messages | Individual messages |
global_chats/{chatId} | global_chats/{chatId} | Project chats (global namespace) |
sharedChatsV2/{chatId} | sharedChatsV2/{chatId} | Publicly shared chats |
projects/{projectId} | projects/{projectId} | Project workspace |
projects/{projectId}/members/{uid} | Membership | Project members + roles |
wallflower/{userId}/images/{imageId} | Images | Generated image posts |
chatbots/{chatbotId} | chatbots/{chatbotId} | Marketplace chatbot definitions |
notifications/{uid}/tokens/{token} | Push tokens | FCM notification tokens |
vault/{uid}/items/{itemId} | vault/{uid}/items/{itemId} | Knowledge vault items |
attachments/{attachmentId} | attachments/{attachmentId} | File attachments metadata |
canvas/{canvasId} | canvas/{canvasId} | Craft canvas content |
surveys/{surveyId} | surveys/{surveyId} | User survey responses |
connectedApps/{appId} | connectedApps/{appId} | OAuth connected apps |
mcp/{connectionId} | mcp/{connectionId} | MCP server connections |
memories/{uid}/memories/{memoryId} | Memories | User memories (mem0 backup) |
Purpose: Caching, pub/sub inter-request communication, rate limiting
Channels:
STOP_GENERATING — stop generation signal (chatId + messageId)ARCANE_MCP_CHANNEL:{ircId} — MCP inter-request communicationIMPORT_CHATS — chat import queueATTACHMENT_QUEUE — attachment processing queueGOOGLE_API_KEY — Google API key poolUsage:
rate-limiter-flexibleBuckets:
wallflower-imagesDedicated Cloud Run service for vector embeddings and semantic search:
text-embedding-3-small and text-embedding-3-largeUsage analytics logging:
| Provider | Services Used | Auth Method |
|---|---|---|
| OpenAI | GPT-4, GPT-4o, GPT-4o-mini, o1, o3, DALL-E, GPT Image 1, Embeddings | API key (embedded) |
| Anthropic | Claude 3.5 Sonnet, Claude 3.7 Sonnet | API key via Rune |
| Google AI | Gemini 2.0 Flash, Gemini 2.5 Pro, Vertex AI Imagen | API key pool |
| Fireworks AI | DeepSeek V3, Llama 3, Qwen, Mistral | API key via Rune |
| FAL AI | Flux, Ideogram, Recraft, Bria, HiDream, Seedream, Ghiblify | API key |
| Replicate | Flux, Ideogram, Recraft, HiDream, Background Remover | API key |
| Ideogram | Ideogram V2/V3, inpainting | API key |
| GoAPI | Midjourney v6.1 | API key |
| Azure | Merlin Magic (custom ML models for image/web classification) | API key |
| Service | Usage |
|---|---|
| Firebase Auth | User authentication (email, Google, anonymous) |
| Firebase Cloud Messaging | Web push notifications |
| Firebase Cloud Functions | Stripe portal, reCAPTCHA, email update |
| Redis (ioredis) | Cache, pub/sub, rate limiting |
| SendGrid | Transactional emails |
| Mailchimp | Email marketing + transactional |
| Stripe | Subscription billing |
| TawkTo | Live chat support |
| PostHog | Product analytics (opt-in) |
| Google Tag Manager | GA4 events, TikTok/Facebook pixels |
| Sentry | Error tracking |
| BigQuery | Usage analytics warehouse |
| Google Cloud Tasks | Async task queue |
| Firecrawl | Web scraping for deep research |
| SerpAPI | Google search results API |
| Tavily | AI-optimized web search API |
| E2B | Code interpreter sandbox (data analysis) |
| mem0 | User memory/profile |
| Composio | External app integrations (2-way sync) |
| Raindrop AI | Analytics/signals platform |
| Undetectable AI | AI text humanization |
| Copyleaks | Plagiarism detection |
| ImageKit | Image CDN optimization |
| Photon | Image metadata service |
Several companion services run alongside Arcane:
| Service | URL | Purpose |
|---|---|---|
| LlamaIndex | merlin-llama-index-*.run.app | Vector embeddings + RAG |
| Tokenizer | merlin-tokenizer-*.run.app | Token counting |
| Session Manager | session.getmerlin.in | Cookie chunked sessions |
| UAM | uam.getmerlin.in | User account management |
| File Processor | file-processor-*.run.app | Document text extraction |
| Readable Text | merlin-readable-text-*.run.app | HTML→clean text |
| Scribe | scribe.siddhartha-5c5.workers.dev | HTML parsing |
| Whisper | merlin-backend-whisper-*.run.app | Speech-to-text |
| MCP Servers | mcp-servers-*.run.app | MCP server registry |
| Spells | spells-*.run.app | Spell/utility service |
| Rune | rune.siddhartha-5c5.workers.dev | LLM proxy (Cloudflare Worker) |
| Plan | Key Features |
|---|---|
| Free | Limited chat, basic models, web search, 102 queries/month |
| Pro | Full access, advanced models, deep research, attachments |
| Teams | Pro features + team collaboration, admin controls |
| Bonkers Basic | Image generation only (limited) |
| Bonkers Pro | Image generation full access |
| AppSumo | Lifetime deal variants |
| Apprentice Sumo | Limited lifetime |
| Friend/Family | Discounted internal plans |
| Starter | Entry-level paid |
Each user has a features map in Firestore with per-feature usage objects:
usage: current countlimit: max allowedresetsAt: timestamp for resetresetInterval: daily/monthlytopUps: temporary bonus allocations with expiryUsage limits middleware checks:
LLM costs calculated from MODEL_COSTS in LLMConstants.ts:
dollarCost (input/output per token), queryCost (abstract units)queryCost * numberOfImagesincrementUserUsage() in utilities/usage.ts (818 lines of business logic)MCP is the plugin system that allows external tools to be used within Merlin:
1. User connects MCP server → stored in Firestore `mcp` collection
2. Chat request with MCP mode → threadPreware loads MCP config
3. unifiedController → getMCPResults() → calls MCP servers in parallel
4. MCP results injected into LLM prompt
5. LLM response may include MCP tool calls → executed via IRC
6. Results streamed back via SSEInterRequestCommunicator class:
addSubscription(ircId, callback) — subscribe to Redis channelsendMessage(ircId, message) — publish to Redis channelsendError(ircId, error) — publish error backARCANE_MCP_CHANNEL:{ircId}Projects are workspaces that group chats and members:
projects/{projectId} — Project document with name, visibility, settingsprojects/{projectId}/members/{uid} — Member role (owner, admin, member, viewer)global_chats collection with projectId metadataEnforced via projectPermissionService:
CREATE_CHATS — member+VIEW_CHATS — all membersEDIT_CHATS — admin+DELETE_CHATS — admin+INVITE_MEMBERS — admin+MANAGE_PROJECT — owner onlygetProjectUsingSecret)Deep Research is a multi-agent research system:
Supervisor Agent (deepResearchAgent):
Researcher Agents (researcherAgent):
Tool Set:
createTodo / getTodo / updateTodo / markTodo — task trackingdumpFinding — stores research findingsfeedbackQuestions — clarification questionsgetSearchHistory — avoid duplicate searchesreportGeneration — final report compilationresearcherAgent — spawn sub-researchersUser Query
→ Generate Research Plan (LLM)
→ Generate SERP Queries (LLM)
→ Execute Searches (Tavily/SerpAPI/Firecrawl)
→ Evaluate Result Relevance (LLM)
→ Extract Learnings (LLM)
→ Generate Follow-up Queries
→ Repeat until depth satisfied
→ Generate Final Report (LLM)deploy.sh — only develop and review branches deployturbo run build (Turborepo orchestration)mainus-west1gcloud run deploy arcane --source .arcane-dev for staging--max-old-space-size=6144)Critical vars (40+ total):
NEXT_PUBLIC_ARCANE_BASE_URL — API endpointNEXT_PUBLIC_UAM_BASE_URL — User account managementNEXT_PUBLIC_CLOUD_FUNCTION_BASE — Firebase functionsNEXT_PUBLIC_SESSION_MANAGER — Session cookie serverNEXT_PUBLIC_GTM_ID — Google Tag ManagerNEXT_PUBLIC_POSTHOG_KEY — PostHog analyticsSENTRY_AUTH_TOKEN / SENTRY_DSN — Error trackingNEXTAUTH_SECRET / NEXTAUTH_URL — Auth.js configFIREBASE_* — Firebase configARCANE_* — Arcane backend configCMS_* — CMS (Strapi) configREDISHOST / REDISPORT — Redis connectionK_REVISION — Cloud Run revision identifier@antfu/eslint-config + perfectionist sort-imports@trivago/prettier-plugin-sort-imports + Tailwind pluginshamefully-hoist=trueexpress-zod-api over tRPC/GraphQL: Auto-generated OpenAPI docs + TypeScript types from Zod schemas. Enables both internal use and external API clients.
SSE over WebSockets: Simpler infrastructure (no sticky sessions needed on Cloud Run), HTTP/2 compatible, works through proxies. Reconnection handled client-side via @microsoft/fetch-event-source.
AsyncLocalStorage for request context: Avoids passing req/res through every function. Thread-safe per request. Downside: makes code harder to unit test.
Rune (Cloudflare Worker) as LLM proxy: Centralizes API key management, enables consistent streaming format across providers, handles rate limiting and fallbacks.
Firestore over PostgreSQL/other relational: Serverless DB with real-time sync capabilities. Schema-on-read design enables rapid iteration. Subcollections provide natural hierarchy (chat→thread→messages).
LlamaIndex as separate microservice: Dedicated RAG service with GPU/TPU availability for embeddings. Decouples vector infrastructure from main API.
Redis pub/sub for IRC: Cloud Run instances can't communicate directly. Redis pub/sub provides cross-instance messaging for MCP responses without sticky sessions.
Cookie chunking: Browsers limit cookies to ~4KB. Merlin's JWT exceeds this, so it's split across 4 __Secure-merlin-session cookies.
Context window engine: Proactive context management (trimming + summarization) rather than reactive truncation. Ensures optimal LLM response quality within token limits.
Tool orchestrator pattern: Separates LLM calling from tool execution, enabling recursive tool use, parallel execution, and agent behavior without framework lock-in.