InkdownInkdown
Start writing

Interview Questions

13 files·4 subfolders

Shared Workspace

Interview Questions
Agentic

BONKERS_END_TO_END_GUIDE

Shared from "Interview Questions" on Inkdown

Bonkers — End-to-End System Guide (Knowledge Base)

Confidential Internal Document — Preserves complete architectural context of the Bonkers image generation platform and its backend (Merlin Arcane / Cauldron monorepo).


Table of Contents

  1. System Architecture Overview
  2. Frontend Architecture (bonkers/)
  3. Backend Architecture (merlin-arcane/)
  4. Authentication & Authorization
  5. Chat & LLM Streaming Pipeline
  6. Bonkers Image Generation Pipeline
interview-questions.md
Bonkers
BONKERS_END_TO_END_GUIDE.md
BONKERS_INTERVIEW_QUESTIONS.md
interview_questions.md
PROJECT_WALKTHROUGH_SCRIPT.md
Edward
pookie
questions
interview-questions-part1.md
interview-questions-part2.md
interview-questions-part3.md
interview-questions-part4.md
interview-questions-part5.md
interview-questions-part6.md
interview-questions-part7.md
  • Tool Orchestration System
  • Database & Storage
  • External Integrations
  • Pricing & Usage Limits
  • MCP (Model Context Protocol) System
  • Projects & Collaboration
  • Deep Research
  • Deployment & Infrastructure

  • 1. System Architecture Overview

    High-Level Topology
    Plain text
    [User Browser/Extension]
            │
            ▼
    ┌───────────────────────────────────────────────────────┐
    │            Vercel (Edge + Serverless)                  │
    │  ┌─────────────────────────────────────────────┐      │
    │  │       Next.js 14 App (bonkers/)              │      │
    │  │  ├─ App Router (i18n, auth, SSR)            │      │
    │  │  ├─ React Query (client cache)              │      │
    │  │  ├─ Zustand (UI state)                      │      │
    │  │  ├─ next-auth v5 (JWT + Firebase)           │      │
    │  │  └─ next-intl (i18n)                        │      │
    │  └─────────────────────────────────────────────┘      │
    └──────────┬────────────────────────────────────────────┘
               │ HTTPS + SSE
               ▼
    ┌───────────────────────────────────────────────────────┐
    │            Google Cloud Run (Arcane Backend)            │
    │  ┌─────────────────────────────────────────────┐      │
    │  │     Express + express-zod-api (merlin-arcane)│      │
    │  │  ├─ Auth middleware (Firebase ID Token)     │      │
    │  │  ├─ Streaming SSE endpoints                 │      │
    │  │  ├─ Tool Orchestrator                       │      │
    │  │  ├─ LLM Provider abstraction (Rune)         │      │
    │  │  └─ Image Gen providers (FAL, Replicate)    │      │
    │  └─────────────────────────────────────────────┘      │
    └──────────┬────────────────────────────────────────────┘
               │
         ┌─────┼──────────┬──────────────┬──────────────────┐
         ▼     ▼          ▼              ▼                  ▼
    ┌────────┐┌────────┐┌──────────┐┌───────────────┐┌────────────┐
    │Firebase││ Redis  ││LlamaIndex││ 3rd-party AI  ││SendGrid    │
    │Firestore││(ioredis)││(RAG API) ││ OpenAI,Anthro-││Mailchimp   │
    │+ Auth  ││        ││          ││ pic,GoogleAI  ││            │
    └────────┘└────────┘└──────────┘└───────────────┘└────────────┘
               │
               ▼
    ┌───────────────────────────────────────────────────────┐
    │   Image Generation Providers                           │
    │  ├─ fal.ai (Flux, Ideogram, Recraft, Bria, etc.)     │
    │  ├─ Replicate (Flux, Ideogram, HiDream)               │
    │  ├─ OpenAI (GPT Image 1, DALL-E)                      │
    │  ├─ Google Vertex AI (Imagen, Gemini)                  │
    │  ├─ Ideogram API                                       │
    │  └─ Midjourney (via GoAPI)                             │
    └───────────────────────────────────────────────────────┘
    Monorepo Structure

    Bonkers (bonkers/ — codename "cauldron"):

    Plain text
    bonkers/                              # pnpm monorepo
    ├── apps/
    │   ├── website/                      # Next.js 14 (App Router) — main app
    │   ├── extension/                    # Chrome extension (git submodule)
    │   └── session-manager/              # Express server for cookie chunking
    ├── packages/
    │   ├── types/                        # Shared TS types
    │   ├── utils/                        # Shared utilities
    │   ├── hooks/                        # Shared React hooks
    │   ├── components/                   # Shared UI (CircularTimer, etc.)
    │   ├── assets/                       # Static assets
    │   └── config/                       # ESLint, Prettier, TypeScript configs
    ├── app-config/                       # Feature flags, prompts (git submodule)
    ├── patches/                          # Patched deps (react-arborist)
    └── deploy.sh                         # Vercel deploy gate

    Merlin Arcane (merlin-arcane/):

    Plain text
    merlin-arcane/                        # TypeScript (Express 5)
    ├── src/
    │   ├── index.ts                      # Server entry point (express-zod-api)
    │   ├── config/
    │   │   ├── config.ts                 # Server config (CORS, routes, ports)
    │   │   ├── routing.ts                # All API route definitions (557 lines)
    │   │   └── logger.ts                 # Pino logger w/ cloud severity mapping
    │   └── server/
    │       ├── factories/                # Endpoint factory (authStreamEndpointsFactory)
    │       ├── middlewares/
    │       │   ├── auth/                 # Firebase ID Token verification
    │       │   ├── initContext/          # Request context initialization
    │       │   ├── threadPreware/        # Pre-chat processing (loads Thread, User)
    │       │   ├── threadPostware/       # Post-chat processing (runs ToolOrchestrator)
    │       │   ├── usageLimits/          # Quota enforcement
    │       │   └── usageAnalytics/       # Usage tracking
    │       ├── endpoints/
    │       │   ├── unified/              # Core chat ML pipeline
    │       │   ├── wallflower/           # Bonkers image generation
    │       │   ├── projects/             # Collaboration system
    │       │   ├── tools/                # Text/image tools
    │       │   ├── user/                 # User data endpoints
    │       │   ├── chatbots/            # Chatbot marketplace
    │       │   └── v2/                   # V2 API endpoints
    │       ├── models/
    │       │   ├── thread.ts             # Thread model (Firestore-backed, 1251 lines)
    │       │   ├── message.ts            # Message model
    │       │   └── user.ts              # User model (Firebase claims)
    │       ├── repositories/
    │       │   ├── engine/               # Context window management + trimming
    │       │   ├── provider/             # LLM provider abstraction (Rune)
    │       │   ├── streamer/             # SSE streaming engine
    │       │   ├── irc/                  # Inter-request communication (Redis)
    │       │   └── sideActions/          # Concurrent side-action runner
    │       ├── services/
    │       │   ├── firebase.ts           # Firebase Admin SDK (merlindb)
    │       │   ├── redis.ts              # ioredis (pub/sub, cache)
    │       │   ├── llamaindex.ts         # RAG vector store client
    │       │   └── axios.ts             # Shared axios instance
    │       └── utilities/
    │           ├── usage.ts              # Usage increment logic (818 lines)
    │           └── llm.ts               # LLM model resolution

    2. Frontend Architecture

    2.1 Next.js App Structure

    The website uses Next.js 14 App Router with:

    • i18n: next-intl with [lang] prefix (27 languages)
    • Auth: next-auth v5 (beta) with Credentials provider
    • Data Fetching: @tanstack/react-query v5 with async persistence (24h gc)
    • UI State: Zustand v5 (auth UI, SSE, attachments)
    • Styling: Tailwind CSS 3.4 with dark mode (class strategy)
    • UI Kit: shadcn/ui (54 components, New York style, Zinc base)
    • Animation: Framer Motion 11
    • Rich Text: Plate.js editor (60+ component files)
    • Icons: Lucide React
    2.2 Route Groups
    RoutePurpose
    /{lang}/bonkersBonkers image generation — canvas, model selection, generation
    /{lang}/old-bonkersLegacy Bonkers
    /{lang}/chat/[[...chatId]]Main chat interface
    /{lang}/creations/[iid]Shared creations (public gallery)
    /{lang}/pricingSubscription plans
    /{lang}/old-profileUser profile, settings, subscription
    /{lang}/old-vaultKnowledge vault (file management)
    /{lang}/updatesChangelog
    /{lang}/user/[id]Public user profile
    /{lang}/templates/[id]Templates
    /{lang}/ai-toolsAI tools directory
    /{lang}/ai-detectionAI content detection
    /{lang}/plagiarism-checkerPlagiarism checker
    /{lang}/ai-humanizerAI text humanizer
    /{lang}/new-old-chat/historyChat history
    /{lang}/new-old-chat/projectsProjects workspace
    /{lang}/new-old-chat/share/[chatId]Shared chat view
    2.3 Auth System

    Multi-layer authentication:

    1. Firebase Auth — actual identity (email/password, Google, anonymous)
    2. next-auth v5 — JWT session with Firebase custom claims merged in
    3. Session Manager — Express on Cloud Run for cookie chunking (4KB browser limit → __Secure-merlin-session_0..3)
    4. BroadcastChannel (merlin-auth) — cross-tab sync
    5. Chrome Extension — login state broadcast via chrome.runtime.sendMessage

    Zustand session store (userSessionStore.ts):

    • 24 computed boolean flags: isFree, isPaid, isOwner, isPro, isBonkersBasic, isBonkersPro, isAppSumo, etc.
    • Auth UI store for modals (login, pricing, upsells)
    2.4 Data Fetching Layer

    React Query setup:

    • QueryClient with 24-hour garbage collection
    • Async persister (extension storage or localStorage)
    • Version buster (v2.99) forces cache reset on deploy
    • Auth-aware queries (waitForAuthInit)
    • ~30 query definitions: user session, settings, usage, bots, chats, folders, tools, canvas, MCP, etc.

    Axios instances: Two typed instances with interceptors:

    • ArcaneAxiosInstance → .../arcane/api (attaches Firebase token)
    • UAMAxiosInstance → uam.getmerlin.in
    • 401 handling → token refresh
    • x-merlin-version header
    2.5 SSE Event Handling

    The frontend uses @microsoft/fetch-event-source for SSE connections. The sseBaseSecureStore (Zustand) handles:

    • Connection lifecycle (abort, retry, stop)
    • Event types: message (text/reasoning/progress), attachments, references, usage, metadata
    • Progress events rendered as animated step indicators
    • Tool call events trigger client-side UI updates

    3. Backend Architecture

    3.1 Server Framework

    Express 5 + express-zod-api:

    • Self-documenting API with Zod schema validation
    • Auto-generates OpenAPI docs + TypeScript client types
    • Middleware chain pattern (like tRPC but for REST)
    • Custom factories: authEndpointsFactory, authStreamEndpointsFactory, usageLimitsStreamEndpointsFactory
    3.2 Request Pipeline (Unified Chat)

    The end-to-end flow for a chat message:

    Plain text
    HTTP POST /v1/thread/unified
      │
      ├─ 1. authMiddleware
      │     └─ Verify Firebase ID Token from Authorization header
      │     └─ Load User from Firestore (plan, features, usage)
      │     └─ Set requestContext.user
      │
      ├─ 2. usageLimitsMiddleware
      │     └─ Check daily/monthly usage against plan limits
      │     └─ Throw if over limit
      │
      ├─ 3. threadPreware (Middleware)
      │     ├─ Load Thread from Firestore (or create new)
      │     ├─ Load user settings V3
      │     ├─ Load personalization + profile memories
      │     ├─ Process attachments (LlamaIndex embeddings)
      │     ├─ Check project permissions (if projectId)
      │     ├─ Initialize SSE stream (writeHead 200, text/event-stream)
      │     ├─ Create UserMessage + AssistantMessage (pending)
      │     ├─ Fetch active thread with embeddings (RAG)
      │     ├─ Initialize Schema (model, context window, max tokens)
      │     └─ Fire side actions (content moderation, chatbot loading)
      │
      ├─ 4. providerConfigOverrideMiddleware
      │     └─ Override LLM provider config if needed
      │
      ├─ 5. unifiedController (Middleware)
      │     ├─ Wait for content moderation result
      │     ├─ ChatStateManager — conflict resolution for mode combinations
      │     ├─ Merlin Magic → model selection engine
      │     ├─ Deep Research → handleDeepResearch (for Mobile)
      │     ├─ MCP Plugin → getMCPResults
      │     └─ Attach PROGRESS events to assistantMessage
      │
      ├─ 6. threadPostware (Middleware) — MAIN EXECUTION
      │     ├─ If Deep Research → spawn deepResearchAgent
      │     ├─ Else → create ToolRegistry → ToolOrchestrator.run()
      │     ├─ Orchestrator loop:
      │     │    ├─ Build messages (engine trimming)
      │     │    ├─ Call LLM provider (via Rune)
      │     │    ├─ Stream response via SSE
      │     │    ├─ Process tool calls → execute tools
      │     │    ├─ Repeat until done or max iterations
      │     │    └─ Engine trims context window after each iteration
      │     ├─ Attach references + attachments to response
      │     ├─ Set chat title from first user message
      │     ├─ Update shortcut preferences
      │     └─ Increment user usage
      │
      ├─ 7. usageAnalyticsMiddleware
      │     └─ Log usage to BigQuery
      │
      └─ Response: Empty 200 (all data streamed via SSE)
    3.3 Request Context

    The entire request shares a requestContext — a AsyncLocalStorage-based store that carries:

    • user (plan, uid, email, usage)
    • chatNode (Thread model instance)
    • userMessageNode, assistantMessageNode
    • schema (Schema model — controls prompt/messages/model)
    • chatStateManager (mode resolution)
    • eventManager (SSE progress events)
    • response (raw Express res for SSE streaming)
    • executionContext (USER vs TASK)
    • decisionLog (tool orchestration debug log)
    • logFields (structured logging context)

    4. Authentication & Authorization

    4.1 Client → Backend Flow
    1. Client obtains Firebase ID Token (via firebase.auth().currentUser.getIdToken())
    2. Axios interceptor attaches as Authorization: Bearer <token>
    3. Backend authMiddleware verifies via firebase.auth().verifyIdToken(token)
    4. On verification, loads full user document from Firestore
    5. Creates User model instance with computed properties
    4.2 User Model (models/user.ts)
    • Reads from customers/{uid} Firestore document
    • Computes: userPlan (free, pro, teams, bonkers_basic, bonkers_pro, apprentice_sumo, etc.)
    • Computes: userType (owner, member, etc.)
    • Manages: temporary pro, top-ups, feature limits
    • Updates: last active timestamp, login count
    • Handles: expired subscriptions grace period
    4.3 Plan/Feature Resolution

    The user.features map contains per-feature objects:

    JSON
    {
      "chat": { "usage": 42, "resetsAt": 1717000000000, "limit": 500 },
      "webSearch": { "usage": 10, "resetsAt": ..., "limit": 50 },
      "imageGeneration": { "usage": 5, "resetsAt": ..., "limit": 100 },
      "webPdfChat": { "usage": 3, "resetsAt": ..., "limit": 50 },
      "deepResearch": { "usage": 1, "resetsAt": ..., "limit": 20 },
      ...
    }

    5. Chat & LLM Streaming Pipeline

    5.1 The Engine (repositories/engine/)

    The engine manages the LLM context window — the critical part that decides what fits in the prompt:

    Context trimming strategies:

    • FULL — include everything (when context window is large enough)
    • TOOL_PROVIDED_SUMMARY_IF_POSSIBLE — use tool-provided summaries for tool results
    • SUMMARY — LLM-generated summary of history

    Three sections of the context window:

    1. HISTORY — past conversation turns
    2. IN_LOOP — current tool call iteration messages
    3. CURRENT_MESSAGE — the latest assistant response + tool results

    Layout optimization:

    • Each section has 3 handler modes (full, summary-if-possible, summary)
    • Token tables computed for each combination
    • chooseOptimalLayout picks cheapest valid layout within context limit
    • Default summary size: 1024 tokens
    • Falls back through PREFERRED_TRIMMING_LAYOUTS until one fits
    5.2 LLM Provider (repositories/provider/)
    • Primary provider: Rune — a custom proxy/worker at rune.siddhartha-5c5.workers.dev
    • Rune accepts normalized payload and routes to: OpenAI, Anthropic, Google, etc.
    • Supports streaming, tools, reasoning content, citations
    • Fallback logic: if primary model fails, fallback models from config are tried
    • Token counting via dedicated tokenizer service

    Payload structure sent to Rune:

    JSON
    {
      "config": { "messages": [...] },
      "mode": "CHAT",
      "params": { "model": "claude-3.5-sonnet", "tools": [...] },
      "metaData": { "systemPrompt": "...", "apiKey": "..." }
    }
    5.3 SSE Streaming (repositories/streamer/)

    Two streaming versions:

    • V1 (legacy): Chunks content in variable sizes (2-5 chars) with adaptive delay based on buffer length
    • V2 (new): Separate content indices per tool result, supports streamAsToolResult flag

    SSE Event types:

    EventPurpose
    messageText content, reasoning content, tool results
    progressProgress step lifecycle (init → in_progress → done)
    attachmentsGenerated images, web search links, citations
    referencesCitation references with document index mapping
    usageToken usage after completion
    metadataModel info, MCP context
    init_message_contentSignals start of content streaming
    featuresIdentifies active features (e.g., "AGENTIC_RESEARCH")

    Content Indexing (V2): Each tool result in the response has a unique contentIndex. The streamer maps text, reasoning, and progress events to the correct index so the frontend can render them in the right position within the message tree.

    5.4 Chat State Manager

    The ChatStateManager is the central decision-maker for a chat request. It:

    • Takes input modes from the user (web search, deep research, RAG, image, MCP, etc.)
    • Resolves conflicts between incompatible mode combinations
    • Determines which ToolRegistry tools to enable
    • Tracks which modes were actually used vs requested

    Mode compatibility rules:

    • IMAGE mode cannot combine with RAG, DEEP_RESEARCH, or MCP
    • MERLIN_MAGIC auto-selects best model
    • DEEP_RESEARCH takes priority and runs agent sub-pipeline
    • MCP injects plugin results before LLM call
    • LARGE_CONTEXT gives full context window (no trimming)
    5.5 Schema (repositories/engine/schema.ts)

    The Schema is a prompt builder that:

    • Compiles system prompt from provider config + user personalization
    • Adds user message with citations + image attachments
    • Adds assistant message with reasoning, tool calls, tool results
    • Supports cache control (ephemeral marking) for Anthropic
    • Handles tool call ID mapping for multi-turn tool use
    • Serializes to JSON for LLM consumption

    6. Bonkers Image Generation Pipeline

    6.1 Architecture

    Frontend (bonkers/website):

    • /bonkers route with full image generation canvas
    • Model selection UI, prompt input, style presets, image editing tools
    • Generations stored in "creations" gallery
    • SSE streaming for real-time generation progress

    Backend (Wallflower System — /v1/wallflower/):

    • Named "Wallflower" internally (codename for Bonkers backend)
    • Separate endpoint group with its own auth + usage limits
    6.2 Supported Models
    Model IDProviderType
    black-forest-labs/flux-schnellReplicate / FALText-to-Image
    black-forest-labs/flux-1.1-proReplicate / FALText-to-Image
    black-forest-labs/flux-1.1-pro-ultraReplicate / FALText-to-Image
    black-forest-labs/flux-proReplicate / FALText-to-Image
    recraft-ai/recraft-v3Replicate / FALText-to-Image
    ideogram-ai/ideogram-v2-turboReplicateText-to-Image
    ideogram-ai/ideogram-v3-turboReplicate / FALText-to-Image
    prunaai/hidream-l1-fastReplicate / FALText-to-Image
    fal-ai/flux-pro/v1/fillFALInpainting
    fal-ai/bria/eraserFALErasing
    fal-ai/clarity-upscalerFALUpscaling
    fal-ai/bria/background/replaceFALBackground Edit
    fal-ai/ghiblifyFALStyle Transfer
    851-labs/background-removerReplicateBackground Removal
    google/imagen-4Replicate / FALText-to-Image
    gemini-2.0-flash-expGoogle Vertex AIText-to-Image
    gpt-image-1-high/medium/lowOpenAIText-to-Image
    midjourney-v6.1-relaxGoAPIText-to-Image
    fal-ai/bytedance/seedream/v3FALText-to-Image
    6.3 Bonkers Abstract Models

    Bonkers presents simplified model names to users, mapped to concrete providers:

    Abstract NameMaps ToType
    bonkers-liteprunaai/hidream-l1-fastFast generation
    bonkers-advancefal-ai/ideogram/v3High quality
    bonkers-magic-fillfal-ai/ideogram/v3/editInpainting
    bonkers-remixgpt-image-1-mediumImage remixing
    bonkers-upscalefal-ai/clarity-upscalerUpscaling
    bonkers-magic-erasefal-ai/bria/eraserObject removal
    bonkers-bg-editfal-ai/bria/background/replaceBackground change
    bonkers-bg-erase851-labs/background-removerBackground removal
    bonkers-omni-editfal-ai/flux-pro/kontext/multiMulti-image editing
    6.4 Generation Flow
    Plain text
    Client POST /v1/wallflower/image-generation
      │
      ├─ authMiddleware (Firebase token)
      ├─ usageLimitsStreamEndpointsFactory
      │
      ├─ wallflowerImageGenerationController:
      │     ├─ Validate input (prompt, modelConfig, style, isPublic)
      │     ├─ Check if user exists in wallflower collection
      │     ├─ Check prompt for NSFW/flagged content
      │     ├─ Apply style modifiers to prompt
      │     │   └─ PRESET_STYLES_MAP: Auto, Anime, Realistic, Vintage, etc.
      │     ├─ Route to provider handler:
      │     │   ├─ Replicate: handleReplicateGeneration()
      │     │   └─ FAL AI: handleFalAIImageGeneration()
      │     │
      │     ├─ Each handler:
      │     │     ├─ Calls provider API (REST)
      │     │     ├─ Waits for webhook callback or polls
      │     │     ├─ Downloads generated images
      │     │     ├─ Uploads to GCS (wallflower-images bucket)
      │     │     └─ Returns formatted TImageGenerationPost
      │     │
      │     ├─ Emit SSE events: attachments, usage
      │     ├─ Calculate usage cost:
      │     │     └─ queryCost * numberOfImages (from MODEL_COSTS)
      │     ├─ Save image post to Firestore
      │     └─ Increment user usage
      │
      └─ Streams generated images back to client
    6.5 Image Editing Features

    The Wallflower system supports 9 image editing features:

    1. GENERATE — Text-to-image (core)
    2. INPAINT — Mask-based inpainting (magic fill)
    3. UPSCALE — Resolution enhancement
    4. ERASE — Object removal
    5. EDIT_BG — Background replacement
    6. OMNI_EDIT — Multi-modal editing
    7. REMIX_WITH_MULTI_IMAGE — Image remixing
    8. PRODUCT_PHOTOGRAPHY — Product-focused generation
    9. LOGO_WIZARD — Logo design
    6.6 Magic Prompt Enhancement

    Each feature type can use a Magic Prompt system:

    • Before generating, an LLM (GPT-4o-mini or GPT-4o) enhances the user's prompt
    • Models designated: MAGIC_PROMPT_MODELS map
    • System prompts defined per feature type (generate, remix, inpainting, bg-edit, ghibli, product-photography, logo-wizard)
    • The enhanced prompt is used as input to the image model
    6.7 Image Storage & Feed
    • GCS Bucket: wallflower-images
    • Firestore Collections:
      • wallflower/{userId}/images/ — user's image documents
      • Each image document: prompt, model, style, GCS URL, likes, visibility, metadata
    • Public Feed: Discoverable via /v1/wallflower/images endpoint
    • Likes System: Users can like images, tracked per image
    • Daily Questions: Generated prompts users can respond to

    7. Tool Orchestration System

    7.1 Architecture
    Plain text
    ToolOrchestrator.run()
      │
      ├─ 1. Build messages from chat history + system prompt
      ├─ 2. Get tool definitions from ToolRegistry
      ├─ 3. Call LLM → get response (streaming)
      │
      ├─ 4. If response has tool_calls:
      │     ├─ Decide which tools to execute (filterToolCallsByPolicy)
      │     ├─ Execute tools (parallel or sequential)
      │     │   ├─ Web Search (Tavily/SerpAPI/Firecrawl)
      │     │   ├─ RAG (knowledge vault)
      │     │   ├─ Image Generation
      │     │   ├─ Data Analysis (E2B sandbox)
      │     │   ├─ MCP (external plugins)
      │     │   ├─ Memory (mem0)
      │     │   ├─ Chatbot (marketplace bots)
      │     │   └─ Think (internal reasoning)
      │     ├─ Format tool results
      │     ├─ Engine: trim context window
      │     └─ Go to step 3 (loop)
      │
      └─ 5. No more tool calls → finalize response
    7.2 Tool Registry

    Available tools are registered in a ToolRegistry instance, keyed by function name:

    ToolDescription
    webSearchInternet search via Tavily/SerpAPI/Firecrawl with academic/social/Youtube focus modes
    ragKnowledge vault retrieval (LlamaIndex)
    imageGenImage generation via Wallflower
    dataAnalysisCode execution in E2B sandbox
    mcpModel Context Protocol (external plugins)
    memoryUser memory (mem0)
    thinkInternal chain-of-thought reasoning
    craftCanvas/craft creation
    chatbotMarketplace chatbot queries
    deepResearchWebSearchWeb search for deep research
    createTodo / getTodo / updateTodo / markTodo / dumpFinding / feedbackQuestions / getSearchHistory / reportGeneration / researcherAgentDeep research agent tools
    7.3 Agent Configurations

    Different agent configurations determine behavior:

    • MAIN_THREAD: Standard chat with web search, RAG, image gen, etc.
    • RESEARCHER: Deep research agent with recursive search + report generation
    • DEEP_RESEARCH_SUPERVISOR: Oversees multiple researcher agents
    • Each config specifies: max tool calls, max iterations, parallel tool call support, model, tool choice policy
    7.4 Tool Call Policies
    • Tool calls filtered based on user plan (free users may get limited tools)
    • Parallel tool call support depends on the model (some models support parallel function calling)
    • Non-iterative tools (like image gen) can run in parallel
    • Error handling: max 3 retries per tool call, exponential backoff
    • Tool results summarized if they exceed TOOL_RESULTS_CONTEXT_SUMMARY_TOKENS

    8. Database & Storage

    8.1 Firebase Firestore (merlindb)

    Database: Firebase project foyer-work, database merlindb

    Collections:

    CollectionPathPurpose
    customers/{uid}customers/{uid}User profile, plan, features, usage, settings
    customers/{uid}/chats/{chatId}Chats per userChat metadata
    customers/{uid}/chats/{chatId}/thread/{docId}Thread messagesIndividual messages
    global_chats/{chatId}global_chats/{chatId}Project chats (global namespace)
    sharedChatsV2/{chatId}sharedChatsV2/{chatId}Publicly shared chats
    projects/{projectId}projects/{projectId}Project workspace
    projects/{projectId}/members/{uid}MembershipProject members + roles
    wallflower/{userId}/images/{imageId}ImagesGenerated image posts
    chatbots/{chatbotId}chatbots/{chatbotId}Marketplace chatbot definitions
    notifications/{uid}/tokens/{token}Push tokensFCM notification tokens
    vault/{uid}/items/{itemId}vault/{uid}/items/{itemId}Knowledge vault items
    attachments/{attachmentId}attachments/{attachmentId}File attachments metadata
    canvas/{canvasId}canvas/{canvasId}Craft canvas content
    surveys/{surveyId}surveys/{surveyId}User survey responses
    connectedApps/{appId}connectedApps/{appId}OAuth connected apps
    mcp/{connectionId}mcp/{connectionId}MCP server connections
    memories/{uid}/memories/{memoryId}MemoriesUser memories (mem0 backup)
    8.2 Redis

    Purpose: Caching, pub/sub inter-request communication, rate limiting

    Channels:

    • STOP_GENERATING — stop generation signal (chatId + messageId)
    • ARCANE_MCP_CHANNEL:{ircId} — MCP inter-request communication
    • IMPORT_CHATS — chat import queue
    • ATTACHMENT_QUEUE — attachment processing queue
    • GOOGLE_API_KEY — Google API key pool

    Usage:

    • Rate limiting via rate-limiter-flexible
    • TTL cache for stop-generation flags
    • Pub/sub for cross-instance MCP communication
    8.3 Google Cloud Storage

    Buckets:

    • Wallflower images: wallflower-images
    • File attachments: default bucket for user uploads
    • PDF processing: temporary storage for PDF chat
    8.4 LlamaIndex (RAG Service)

    Dedicated Cloud Run service for vector embeddings and semantic search:

    • Upsert text/GCS files → vector store
    • Query similar text with BM25 + embeddings
    • Process file attachments (PDF, DOCX, etc.)
    • Namespace per user + type (chats, vault, attachments)
    • Uses OpenAI text-embedding-3-small and text-embedding-3-large
    • Fallback from large to small embedding model on failure
    8.5 BigQuery

    Usage analytics logging:

    • Per-request usage events
    • Model usage breakdowns
    • Feature adoption metrics
    • Queried via Google Cloud Tasks batched writes

    9. External Integrations

    9.1 AI Providers
    ProviderServices UsedAuth Method
    OpenAIGPT-4, GPT-4o, GPT-4o-mini, o1, o3, DALL-E, GPT Image 1, EmbeddingsAPI key (embedded)
    AnthropicClaude 3.5 Sonnet, Claude 3.7 SonnetAPI key via Rune
    Google AIGemini 2.0 Flash, Gemini 2.5 Pro, Vertex AI ImagenAPI key pool
    Fireworks AIDeepSeek V3, Llama 3, Qwen, MistralAPI key via Rune
    FAL AIFlux, Ideogram, Recraft, Bria, HiDream, Seedream, GhiblifyAPI key
    ReplicateFlux, Ideogram, Recraft, HiDream, Background RemoverAPI key
    IdeogramIdeogram V2/V3, inpaintingAPI key
    GoAPIMidjourney v6.1API key
    AzureMerlin Magic (custom ML models for image/web classification)API key
    9.2 Infrastructure Services
    ServiceUsage
    Firebase AuthUser authentication (email, Google, anonymous)
    Firebase Cloud MessagingWeb push notifications
    Firebase Cloud FunctionsStripe portal, reCAPTCHA, email update
    Redis (ioredis)Cache, pub/sub, rate limiting
    SendGridTransactional emails
    MailchimpEmail marketing + transactional
    StripeSubscription billing
    TawkToLive chat support
    PostHogProduct analytics (opt-in)
    Google Tag ManagerGA4 events, TikTok/Facebook pixels
    SentryError tracking
    BigQueryUsage analytics warehouse
    Google Cloud TasksAsync task queue
    FirecrawlWeb scraping for deep research
    SerpAPIGoogle search results API
    TavilyAI-optimized web search API
    E2BCode interpreter sandbox (data analysis)
    mem0User memory/profile
    ComposioExternal app integrations (2-way sync)
    Raindrop AIAnalytics/signals platform
    Undetectable AIAI text humanization
    CopyleaksPlagiarism detection
    ImageKitImage CDN optimization
    PhotonImage metadata service
    9.3 Microservices Architecture

    Several companion services run alongside Arcane:

    ServiceURLPurpose
    LlamaIndexmerlin-llama-index-*.run.appVector embeddings + RAG
    Tokenizermerlin-tokenizer-*.run.appToken counting
    Session Managersession.getmerlin.inCookie chunked sessions
    UAMuam.getmerlin.inUser account management
    File Processorfile-processor-*.run.appDocument text extraction
    Readable Textmerlin-readable-text-*.run.appHTML→clean text
    Scribescribe.siddhartha-5c5.workers.devHTML parsing
    Whispermerlin-backend-whisper-*.run.appSpeech-to-text
    MCP Serversmcp-servers-*.run.appMCP server registry
    Spellsspells-*.run.appSpell/utility service
    Runerune.siddhartha-5c5.workers.devLLM proxy (Cloudflare Worker)

    10. Pricing & Usage Limits

    10.1 Plan Tiers
    PlanKey Features
    FreeLimited chat, basic models, web search, 102 queries/month
    ProFull access, advanced models, deep research, attachments
    TeamsPro features + team collaboration, admin controls
    Bonkers BasicImage generation only (limited)
    Bonkers ProImage generation full access
    AppSumoLifetime deal variants
    Apprentice SumoLimited lifetime
    Friend/FamilyDiscounted internal plans
    StarterEntry-level paid
    10.2 Usage Tracking System

    Each user has a features map in Firestore with per-feature usage objects:

    • usage: current count
    • limit: max allowed
    • resetsAt: timestamp for reset
    • resetInterval: daily/monthly
    • topUps: temporary bonus allocations with expiry

    Usage limits middleware checks:

    1. Daily limits (resets at end of day)
    2. Monthly limits (resets at end of month)
    3. Feature-specific limits (e.g., image generation has different limits than chat)
    4. Top-up usage (consumed before regular quota)
    10.3 Cost Calculation

    LLM costs calculated from MODEL_COSTS in LLMConstants.ts:

    • Each model has: dollarCost (input/output per token), queryCost (abstract units)
    • Image generation: queryCost * numberOfImages
    • Token costs tracked separately for: input, output (cached), output (reasoning), output (non-reasoning)
    • Usage incremented via incrementUserUsage() in utilities/usage.ts (818 lines of business logic)

    11. MCP (Model Context Protocol) System

    11.1 Architecture

    MCP is the plugin system that allows external tools to be used within Merlin:

    1. MCP Server Registry — Cloud Run service listing available MCP servers
    2. User Connections — Each user can connect to multiple MCP servers
    3. Inter-Request Communication (IRC) — Redis pub/sub for cross-instance MCP communication
    4. IRC Protocol — JSON-RPC 2.0 messages over Redis pub/sub channels
    11.2 MCP Flow
    Plain text
    1. User connects MCP server → stored in Firestore `mcp` collection
    2. Chat request with MCP mode → threadPreware loads MCP config
    3. unifiedController → getMCPResults() → calls MCP servers in parallel
    4. MCP results injected into LLM prompt
    5. LLM response may include MCP tool calls → executed via IRC
    6. Results streamed back via SSE
    11.3 IRC Implementation

    InterRequestCommunicator class:

    • addSubscription(ircId, callback) — subscribe to Redis channel
    • sendMessage(ircId, message) — publish to Redis channel
    • sendError(ircId, error) — publish error back
    • Channel format: ARCANE_MCP_CHANNEL:{ircId}
    • Required for multi-instance Cloud Run deployments

    12. Projects & Collaboration

    12.1 Project Structure

    Projects are workspaces that group chats and members:

    • projects/{projectId} — Project document with name, visibility, settings
    • projects/{projectId}/members/{uid} — Member role (owner, admin, member, viewer)
    • Chats stored in global_chats collection with projectId metadata
    • Project sharing: public, team-only, or private with secret link
    12.2 Permissions System

    Enforced via projectPermissionService:

    • CREATE_CHATS — member+
    • VIEW_CHATS — all members
    • EDIT_CHATS — admin+
    • DELETE_CHATS — admin+
    • INVITE_MEMBERS — admin+
    • MANAGE_PROJECT — owner only
    • 15 permission actions total
    12.3 Sharing
    • Public projects: Accessible via secret token (getProjectUsingSecret)
    • Publishing: Auto-generate SEO content from project chats
    • Forking: Fork entire projects or individual chats
    • Chat sharing: Individual chats shareable via link
    • Project invitations: Email-based with accept/dismiss flow

    13. Deep Research

    13.1 Architecture

    Deep Research is a multi-agent research system:

    1. Supervisor Agent (deepResearchAgent):

      • Takes user query
      • Generates research plan with sub-queries
      • Spawns researcher agents
      • Collects findings
      • Generates comprehensive report
    2. Researcher Agents (researcherAgent):

      • Execute individual search queries
      • Use web search, Firecrawl, SERP API
      • Extract learnings from each source
      • Rate relevance of results
      • Store findings
    3. Tool Set:

      • createTodo / getTodo / updateTodo / markTodo — task tracking
      • dumpFinding — stores research findings
      • feedbackQuestions — clarification questions
      • getSearchHistory — avoid duplicate searches
      • reportGeneration — final report compilation
      • researcherAgent — spawn sub-researchers
    13.2 Pipeline
    Plain text
    User Query
      → Generate Research Plan (LLM)
      → Generate SERP Queries (LLM)
      → Execute Searches (Tavily/SerpAPI/Firecrawl)
      → Evaluate Result Relevance (LLM)
      → Extract Learnings (LLM)
      → Generate Follow-up Queries
      → Repeat until depth satisfied
      → Generate Final Report (LLM)
    13.3 Similarity & Deduplication
    • Search results rated for relevance by LLM
    • Deduplication via URL and content similarity
    • BM25 + embedding-based retrieval for existing findings
    • Learning extraction with structured output

    14. Deployment & Infrastructure

    14.1 Bonkers Frontend
    • Platform: Vercel
    • Deploy gate: deploy.sh — only develop and review branches deploy
    • Build: turbo run build (Turborepo orchestration)
    • Preview: Vercel preview deployments for PRs
    • Production: Manual deploy from main
    14.2 Merlin Arcane Backend
    • Platform: Google Cloud Run (serverless container)
    • Region: us-west1
    • Deployment: gcloud run deploy arcane --source .
    • Dev instance: arcane-dev for staging
    • Dockerfile: Multi-stage build with tsup bundling
    • Scaling: Cloud Run autoscaling (0 to N instances)
    • Memory: 6144 MB max (--max-old-space-size=6144)
    14.3 Environment Variables

    Critical vars (40+ total):

    • NEXT_PUBLIC_ARCANE_BASE_URL — API endpoint
    • NEXT_PUBLIC_UAM_BASE_URL — User account management
    • NEXT_PUBLIC_CLOUD_FUNCTION_BASE — Firebase functions
    • NEXT_PUBLIC_SESSION_MANAGER — Session cookie server
    • NEXT_PUBLIC_GTM_ID — Google Tag Manager
    • NEXT_PUBLIC_POSTHOG_KEY — PostHog analytics
    • SENTRY_AUTH_TOKEN / SENTRY_DSN — Error tracking
    • NEXTAUTH_SECRET / NEXTAUTH_URL — Auth.js config
    • FIREBASE_* — Firebase config
    • ARCANE_* — Arcane backend config
    • CMS_* — CMS (Strapi) config
    • REDISHOST / REDISPORT — Redis connection
    • K_REVISION — Cloud Run revision identifier
    14.4 CI/CD
    • Husky: Pre-commit hooks
    • ESLint: @antfu/eslint-config + perfectionist sort-imports
    • Prettier: With @trivago/prettier-plugin-sort-imports + Tailwind plugin
    • syncpack: Dependency version alignment across monorepo
    • pnpm: 9.15.5 with shamefully-hoist=true

    Key Architectural Decisions (Why It's Built This Way)

    1. express-zod-api over tRPC/GraphQL: Auto-generated OpenAPI docs + TypeScript types from Zod schemas. Enables both internal use and external API clients.

    2. SSE over WebSockets: Simpler infrastructure (no sticky sessions needed on Cloud Run), HTTP/2 compatible, works through proxies. Reconnection handled client-side via @microsoft/fetch-event-source.

    3. AsyncLocalStorage for request context: Avoids passing req/res through every function. Thread-safe per request. Downside: makes code harder to unit test.

    4. Rune (Cloudflare Worker) as LLM proxy: Centralizes API key management, enables consistent streaming format across providers, handles rate limiting and fallbacks.

    5. Firestore over PostgreSQL/other relational: Serverless DB with real-time sync capabilities. Schema-on-read design enables rapid iteration. Subcollections provide natural hierarchy (chat→thread→messages).

    6. LlamaIndex as separate microservice: Dedicated RAG service with GPU/TPU availability for embeddings. Decouples vector infrastructure from main API.

    7. Redis pub/sub for IRC: Cloud Run instances can't communicate directly. Redis pub/sub provides cross-instance messaging for MCP responses without sticky sessions.

    8. Cookie chunking: Browsers limit cookies to ~4KB. Merlin's JWT exceeds this, so it's split across 4 __Secure-merlin-session cookies.

    9. Context window engine: Proactive context management (trimming + summarization) rather than reactive truncation. Ensures optimal LLM response quality within token limits.

    10. Tool orchestrator pattern: Separates LLM calling from tool execution, enabling recursive tool use, parallel execution, and agent behavior without framework lock-in.