InkdownInkdown
Start writing

Interview Questions

13 files·4 subfolders

Shared Workspace

Interview Questions
Agentic

PROJECT_WALKTHROUGH_SCRIPT

Shared from "Interview Questions" on Inkdown

Bonkers (Wallflower) — End-to-End Project Walkthrough Script

Purpose: Script for when a senior engineer/recruiter asks you to walk through the Bonkers image generation platform. Speak naturally — think of it as your director's commentary. You built every piece. Here's how to tell that story without showing code (NDA).


Opening Frame

"Let me start with the big picture, then drill into every layer.

Bonkers is the AI image generation platform at getmerlin.in. It was originally a standalone product that got consolidated into the Merlin monorepo. Internally, the image generation system is codenamed Wallflower — that's the name you'll see across the backend.

On the surface, Bonkers lets users:

  • Generate images from text prompts using 20+ models
  • Edit images — inpainting, background replacement, object removal, upscaling
  • Apply style presets and templates (Ghibli style, Minecraft style, product photography, etc.)
interview-questions.md
Bonkers
BONKERS_END_TO_END_GUIDE.md
BONKERS_INTERVIEW_QUESTIONS.md
interview_questions.md
PROJECT_WALKTHROUGH_SCRIPT.md
Edward
pookie
questions
interview-questions-part1.md
interview-questions-part2.md
interview-questions-part3.md
interview-questions-part4.md
interview-questions-part5.md
interview-questions-part6.md
interview-questions-part7.md
  • Browse a public gallery and like/share creations
  • Under the hood, it's a cross-provider image generation engine that abstracts away 5 different AI providers behind a unified API, with automatic failover, content moderation, magic prompt enhancement, and a GCS-based storage pipeline.

    The system does about 50K+ image generations per month, with 8 feature types and 20+ supported models."


    Architecture Overview

    "The Bonkers system splits across the monorepo:

    Frontend (bonkers/apps/website/):

    • The /bonkers route group — canvas UI, model selection, prompt input, gallery, style presets
    • SSE streaming for real-time generation progress
    • Zustand store for generation state (pending, streaming, complete, error)

    Backend (merlin-arcane/src/server/):

    • The Wallflower endpoint group at /v1/wallflower/
    • unified-generation.controller.ts — the main controller that routes to providers
    • Helper files for each provider: replicate/, fal-ai/, ideogram/, goapi/, leonardo/, dalle.ts, google-image-gen/, openai-image-gen.ts
    • Shared utilities: helpers/common.ts for prompt enhancement, image formatting, moderation
    • Firestore collections: wallflower/{userId}/images/{imageId} for image posts
    • GCS bucket: wallflower-images for all generated images

    The architecture follows a unified pipeline pattern: all 8 feature types go through the same request flow, with different routing based on the model and feature type."


    The Abstract Model Mapping — The Core Architectural Pattern

    "This is the most important design decision in the system.

    Users never see provider-specific model names like black-forest-labs/flux-schnell or fal-ai/ideogram/v3. Instead, they see abstract model IDs:

    User-Facing NameAbstract Model IDResolves ToCost
    Bonkers Litebonkers-liteprunaai/hidream-l1-fast20 queries
    Bonkers Advancebonkers-advancefal-ai/ideogram/v3120 queries
    Bonkers Upscalebonkers-upscalefal-ai/clarity-upscaler10 queries
    Bonkers Magic Fillbonkers-magic-fillfal-ai/ideogram/v3/edit75 queries
    Bonkers Magic Erasebonkers-magic-erasefal-ai/bria/eraser10 queries
    Bonkers BG Editbonkers-bg-editfal-ai/bria/background/replace10 queries

    This resolution happens at the Zod schema layer via .transform(). When the request body comes in with abstractModelId: "bonkers-advance", the schema transforms it to modelConfig.modelId: "fal-ai/ideogram/v3" before the controller ever sees it.

    The benefits of this abstraction:

    1. Provider swaps without UI changes — change the Zod .transform() map, zero frontend changes
    2. Model deprecation resilience — when a provider kills a model, just update the map entry, users never notice
    3. A/B testing — could route 10% of bonkers-advance traffic to a new model by changing the map
    4. Unified pricing — bonkers-advance is priced at 120 queries regardless of what model it resolves to, decoupling cost from provider pricing

    The one tricky part: the pricing override. bonkers-advance costs 120 queries even though Ideogram V3 costs 75. There's an explicit check in the controller: if abstractModelId === "bonkers-advance", override usageConfig.queries = numberOfImages * 120. This intentionally decouples abstract model pricing from underlying provider cost — it's a premium-tier pricing mechanism that also acts as a rate limiter."


    The 8 Feature Types

    "Bonkers supports 8 distinct feature types, all routed through the same unified pipeline:

    1. GENERATE — Standard text-to-image. The core feature. Routes to any model based on selection.

    2. INPAINT — Mask-based inpainting (magic fill). The user draws a mask over an area and describes what should replace it. Providers handle this differently: FAL expects a mask_url parameter, Replicate needs an inverted mask (I built getInvertedMaskUrl() using the Photon microservice to invert the colors because Replicate interprets the mask opposite to FAL).

    3. UPSCALE — Resolution enhancement. Takes an existing image and increases its resolution using fal-ai/clarity-upscaler.

    4. ERASE — Object removal. User selects an object, and fal-ai/bria/eraser removes it with AI-powered inpainting.

    5. EDIT_BG — Background replacement. fal-ai/bria/background/replace detects the subject and replaces the background based on a prompt.

    6. OMNI_EDIT — Multi-image editing. Uses fal-ai/flux-pro/kontext/multi to edit multiple images in one request.

    7. TEMPLATE — 9 pre-styled generation templates: Ghibli style, Minecraft style, Simpson style, Pixar style, Humanize My Pet, Watermark Remover, Make Me Bald, Product Photography, Logo Wizard. Each routes to a specific provider and model with fixed system prompts.

    8. REMIX_WITH_MULTI_IMAGE — Image remixing. Takes an input image and generates variations with GPT Image 1.

    The hardest challenge unifying these: each provider has a different API shape. Some expect images as base64, others as URLs, others as multipart uploads. The GCS-based normalization was the key — upload the image to GCS once, and all providers reference it by URL. Mask handling for inpainting was another divergence point: FAL uses mask_url, Replicate needs an inverted mask. The Photon microservice handles the color inversion."


    The Generation Pipeline — End to End

    "Here's what happens when a user clicks Generate. Follow the data:

    Plain text
    Client POST /v1/wallflower/unified-generation

    Step 1: Validation The Zod schema validates the entire request body — prompt, modelConfig, feature type, style preset, number of images, isPublic flag. Invalid requests are rejected before any processing happens.

    Step 2: Auth & Usage Check Firebase ID token verification. Then the usage limits middleware checks: is the user on a plan that allows Bonkers? Free users are blocked entirely at this stage — FEATURE_LIMITS.bonkers has a GUEST block that prevents them from reaching the controller.

    Step 3: Prompt Moderation checkPromptFlagged() calls GPT-4o-mini with a strict NSFW classification system prompt. This is blocking — awaited before any provider API call. If flagged, it throws a ClientError(400) and the image is never sent to the provider. This saves significant API costs on rejected prompts.

    Step 4: Magic Prompt Enhancement If magic prompt is enabled, the user's prompt goes through improvePrompt(). This calls GPT-4o-mini (or GPT-4o for INPAINT) with a feature-specific system prompt. Each feature type has a different system prompt and a Zod JSON schema for structured output:

    • GENERATE: synthesizes a cohesive prompt ≤1000 chars
    • INPAINT: determines isRemoveOnly boolean + enhanced prompt
    • EDIT_BG: describes new backgrounds without referencing the subject
    • Templates: style-specific enhancement

    Step 5: Abstract Model Resolution The Zod schema .transform() converts the abstract model ID to a concrete provider model ID. This is where bonkers-advance becomes fal-ai/ideogram/v3.

    Step 6: Provider Dispatch The controller routes to the correct provider handler based on the model ID prefix:

    • black-forest-labs/*, recraft-ai/*, ideogram-ai/* → Replicate handler
    • fal-ai/* → FAL AI handler
    • gpt-image-1-* → OpenAI handler
    • gemini-2.0-flash-exp → Google Vertex AI handler
    • midjourney-* → GoAPI handler

    Step 7: Provider Call with Fallback The primary provider is called via callWithFallback(). If the primary throws ANY error (API error, 5xx, network failure, NSFW detection at the provider level), the fallback handler is invoked using the FALLBACK_MODELS_MAP. This map is bidirectional between Replicate and FAL AI equivalents.

    IMPORTANT CAVEAT: The fallback only covers GENERATE models. Features like ERASE, UPSCALE, EDIT_BG, OMNI_EDIT use models that have no equivalent on the other provider (fal-ai/bria/eraser has no Replicate alternative). If FAL AI goes down, 5 of the 8 features become completely unavailable.

    Step 8: Post-Processing After generation, formatImageGenerations() runs:

    1. Aspect ratio resolution — auto-detect via Photon microservice or use the ratio from the provider config
    2. SEO description — GPT-4o-mini generates keyword-rich third-person descriptions for social sharing
    3. GCS upload — the image is downloaded from the provider's URL (which could be temporary) and uploaded to wallflower-images/{uid}/{iid}.png. The GCS URL becomes the canonical reference
    4. Parent image lineage — if this is an edit/remix, the original image metadata is attached

    Step 9: Firestore Persistence The image document is saved to Firestore with: prompt, enhanced prompt, model ID, style, GCS URL, seed, aspect ratio, likes (empty array), visibility, metadata.

    Step 10: SSE Events Two SSE events are emitted: attachments with the formatted image data, and usage with the updated quota information.

    Step 11: Usage Increment incrementUserUsage() increments the user's feature counter based on the abstract model's query cost. For bonkers-advance, this is hardcoded to 120 queries per image regardless of the underlying model's cost."


    Cross-Provider Fallback — How It Actually Works

    "The fallback system is based on FALLBACK_MODELS_MAP — a bidirectional map of 10+ model equivalents between Replicate and FAL AI:

    Plain text
    FALLBACK_MODELS_MAP = {
      "black-forest-labs/flux-schnell": "fal-ai/flux/schnell",
      "fal-ai/flux/schnell": "black-forest-labs/flux-schnell",
      "prunaai/hidream-l1-fast": "fal-ai/hidream-i1-fast",
      "fal-ai/hidream-i1-fast": "prunaai/hidream-l1-fast",
      ...
    }

    callWithFallback() is sequential — it tries the primary provider first, and only on error tries the fallback. It's NOT parallel (which would double API costs on every request).

    The routing is model-prefix-based in helpers/generate.ts:

    • Replicate-model-prefixed IDs use handleReplicateWithFalAIFallback() — Replicate first, FAL as backup
    • FAL-model-prefixed IDs use handleFalWithReplicateFallback() — FAL first, Replicate as backup

    Some mappings are inexact — ideogram-ai/ideogram-v2-turbo falls back to fal-ai/flux-pro/v1.1, which is a completely different model. The fallback gives the user something rather than nothing, but it won't be the same quality or style.

    The known gap: ERASE, UPSCALE, EDIT_BG, and OMNI_EDIT have NO fallback. Their models (fal-ai/bria/eraser, fal-ai/clarity-upscaler, etc.) don't have equivalents on Replicate. A FAL AI outage takes down 5 of 8 features completely."


    Magic Prompt System — Per-Feature Enhancement

    "The magic prompt system isn't one-size-fits-all. Each feature type has a dedicated LLM system prompt with structured output schemas.

    GENERATE — System prompt: 'Synthesize a cohesive single prompt ≤1000 chars that captures the user's intent.' Uses GPT-4o-mini. The Zod schema returns { enhancedPrompt: string }.

    INPAINT — Uses GPT-4o (more capable model because inpainting is harder). System prompt has elaborate rules for interpreting ambiguous removal vs replacement requests — with examples like 'remove the car' vs 'replace the car with a tree.' The Zod schema returns { enhancedPrompt: string, isRemoveOnly: boolean }.

    EDIT_BG — System prompt: 'Describe the new background naturally, without referencing the original subject or using phrases like "the subject" or "the person."' This prevents artifacts where the background model tries to regenerate the subject.

    GHIBLI Template — System prompt: 'Transform the scene to Studio Ghibli style. Use soft, painterly aesthetics, muted pastels, magical lighting, floating dust motes, hand-drawn linework, whimsical clouds, lush greenery, detailed skies, and Studio Ghibli's signature warmth and emotional depth.' This level of detail in the system prompt produces dramatically better style transfer results.

    PRODUCT_PHOTOGRAPHY — System prompt: 'Create a detailed photoshoot scene for professional product photography, describing lighting, composition, background, and atmosphere.'

    LOGO_WIZARD — System prompt: 'Generate professional logo design specifications with typography, brand identity, and visual style guidelines.'

    The system prompt library is defined in MAGIC_PROMPT_SYSTEM_PROMPT and the model routing for magic prompts in MAGIC_PROMPT_MODELS.

    When magic prompt is OFF, default values per feature are returned — isRemoveOnly: false for INPAINT, imageStrength: 0.8 for GHIBLI, etc."


    Content Moderation — Two Layers

    Pre-generation: checkPromptFlagged()

    • Uses GPT-4o-mini with a strict NSFW/safety classification system prompt
    • Returns a JSON object: { flagged: boolean, reason: string }
    • Blocking — the image is never sent to the provider if flagged
    • Saves API cost on rejected prompts

    Post-generation: isImageFlagged()

    • Uses OpenAI's omni-moderation-latest model to check the actual generated image
    • Currently only used by template features (Ghibli, Minecraft, Simpson, Pixar, Humanize My Pet)
    • Non-template features skip this check — this is a gap

    Known issue: If the LLM returns malformed JSON or throws during checkPromptFlagged(), the code currently fails open — returns flagged: false, meaning the prompt passes through unchecked. The JSON.parse() call for the LLM response isn't wrapped in try/catch. This should be fail-closed: default to flagged: true on any error."


    Image Storage & Post-Processing Pipeline

    "The post-processing pipeline in formatImageGenerations() handles several transformations between provider response and user delivery:

    1. Aspect Ratio Resolution If aspectRatio is 'auto-detect', we call the Photon microservice (getDimensionsFromImage()) to fetch actual image dimensions. If Photon fails, we fall back to { width: 1024, height: 1024 } — not ideal but prevents the whole request from failing. If a ratio was explicitly set, we use getDimensionsFromAspectRatio() to calculate width/height.

    2. SEO Description Generation Each variation gets a call to GPT-4o-mini via describeImage() to generate a keyword-rich third-person description. This runs in parallel across all variations via Promise.allSettled(). The description is used for social sharing OG tags and discoverability.

    3. GCS Upload (The Normalization Layer) getAttachmentWithGBucketUrl() downloads the image from the provider's URL (which is usually a temporary signed URL that expires), uploads it to wallflower-images/{uid}/{iid}.png with public-read ACL, and replaces the URL with the permanent GCS URL. This is the key unification layer — all provider URLs, regardless of format or expiry policy, are normalized through GCS.

    4. Parent Image Lineage If this generation was an edit, remix, or template application, the parentImage/parentImages objects are attached to track provenance. This includes the original URL, IID, and public status.

    5. Cost Calculation Usage cost is calculated: queryCost * numberOfImages where queryCost comes from IMAGE_MODELS_INFO — a static map of model display names to query costs (e.g., FLUX.1 Schnell = 10 queries, FLUX.1 Pro = 140 queries).

    GCS → Firestore sequencing: The image is uploaded to GCS first, then saved to Firestore. If Firestore save fails after GCS upload, the image is orphaned — exists in GCS with no metadata. The proposed fix: write-ahead — save Firestore first with a 'pending' status, then update to 'complete' after GCS upload."


    Style System — Translation Across Providers

    "We support 9 preset styles: Auto, Anime, Realistic, Digital Art, Vintage, Cinematic, Fantasy, Neon Noir, and Minimalist. These are defined in PRESET_STYLES_MAP.

    Each provider has a different way of applying styles:

    • Ideogram: Native style parameter — ideogramStyle can be General, Anime, Realistic, Design, or Render 3D
    • Recraft: recraftStyle parameter — values like realistic_image/natural_light, digital_illustration
    • Other models: The style is appended to the prompt as 'in {style} style' via getStyleModifiedPrompt()
    • Ideogram V2 Turbo: Returns the prompt unmodified — it has native style support, and modifying the prompt actually hurts quality

    There are exceptions wired in: Recraft V3 skips the prompt append for Realistic style because it uses the API's style param instead."


    Templates — 9 Pre-Styled Generations

    "Templates are Firestore-stored configuration objects with a prompt template, model config, style, thumbnail, and category. Each template routes to a specific provider:

    TemplateProvider
    Ghibli StyleOpenAI GPT Image 1
    Minecraft StyleOpenAI GPT Image 1
    Simpson StyleOpenAI GPT Image 1
    Pixar StyleOpenAI GPT Image 1
    Humanize My PetOpenAI GPT Image 1
    Watermark RemoverGoogle Gemini 2.0 Flash
    Make Me BaldGoogle Gemini 2.0 Flash
    Product PhotographyOpenAI (specific handler)
    Logo WizardOpenAI (specific handler)

    Templates are versioned via a version field. When a template is updated, a new version document is created; the frontend always requests the latest version. Model deprecation is handled through the abstract model map — the template itself doesn't change, only the mapping layer.

    In the first month after launch, templates generated 10,000 images — about 20% of total generation volume at that time."


    The Unified Controller — How It All Ties Together

    "The unified-generation.controller.ts is the central orchestrator. Here's the exact sequence:

    1. Validate input — Zod schema validation
    2. Check user exists in wallflower collection
    3. Check prompt for NSFW checkPromptFlagged() — blocking
    4. Apply style modifiers to prompt via getStyleModifiedPrompt()
    5. Enhance prompt if magic prompt is enabled via improvePrompt()
    6. Route to provider handler based on model ID prefix
    7. Provider executes — calls API, waits for prediction, downloads images
    8. Format results — formatImageGenerations() does GCS upload, SEO, aspect ratio
    9. Emit SSE events — attachments and usage
    10. Calculate usage cost — queryCost × numberOfImages
    11. Save image post to Firestore
    12. Increment user usage via incrementUserUsage()

    The key design: it's all middleware-based, sharing a request context via AsyncLocalStorage. The controller's job is coordination — the actual provider work happens in dedicated handler files."


    Known Gaps & Failure Modes

    "I've documented these explicitly because they represent the boundary between 'works now' and 'works at scale':

    1. No fallback for non-GENERATE features ERASE, UPSCALE, EDIT_BG, OMNI_EDIT have a single provider. If FAL AI goes down, those 4 features break completely. GENERATE features degrade gracefully via the fallback map.

    2. Orphaned images in GCS If saveImagesToFirestore() fails after GCS upload, the image is orphaned — exists in storage with no Firestore metadata. No recovery mechanism exists. Fix: write Firestore first with 'pending' status.

    3. Prompt moderation fails open If the LLM in checkPromptFlagged() returns malformed JSON, JSON.parse() throws — there's no try/catch. The function returns false (not flagged), and the prompt passes through unchecked. Fix: wrap in try/catch, default to flagged: true.

    4. Seed reproducibility across fallback Seeds are provider-specific. If the fallback fires, the same seed produces a completely different image because it's a different model. The concrete modelId isn't stored alongside the seed — regeneration can silently change providers.

    5. SEO description cost at scale Each generation calls GPT-4o-mini for SEO descriptions. At 100K DAU generating 2 images each, that's 200K LLM calls/day ($700/month). The descriptions block the response by 500ms–2s. Fix: make async via Cloud Tasks.

    6. Photon dependency The Photon microservice handles image dimension detection and mask inversion. If Photon is down, INPAINT and ERASE on Replicate fail completely (mask inversion is required). Dimension detection falls back to 1024×1024.

    7. Firestore hotspot on usage counters Every generation increments the customers/{uid} document. Firestore's 1 write/sec per document limit is a scalability bottleneck. Fix: distributed counter shards with Redis buffering.

    8. No decision logging The chat system has a decisionLog that records every tool selection and model choice. The image generation pipeline has zero decision logging. Debugging 'my image looks wrong' requires correlating browser logs, Cloud Run logs, Firestore, provider dashboards, and GCS — 20-30 minutes per incident."


    Deployment & Infrastructure

    Frontend: Deployed on Vercel as part of the Next.js app. The /bonkers route group is part of the same build.

    Backend: Deployed on Google Cloud Run as part of the Arcane service. The Wallflower endpoints share the same container as the chat endpoints — they're route groups within the same Express application.

    Providers: API keys for FAL AI, Replicate, OpenAI, Google Vertex AI, GoAPI, Ideogram — all embedded as environment variables in Cloud Run.

    Storage:

    • GCS bucket wallflower-images — all generated images
    • Firestore wallflower/{userId}/images/ — image metadata
    • Firestore customers/{userId}/features/bonkers — usage tracking

    Usage Tracking: Bonkers has its own feature key (bonkers) separate from Merlin chat (merlin). BONKERS_PRO and BONKERS_BASIC plans consume from bonkers. Free/PRO/TEAMS/ELITE users who use image generation consume from merlin — meaning one FLUX Pro image (140 queries) can exhaust their entire daily chat quota."


    Key Numbers

    • 20+ supported models across 5 providers
    • 8 feature types unified under one pipeline
    • 9 template types
    • ~50K+ image generations per month
    • ~700 tokens per magic prompt call
    • 120 queries per bonkers-advance image (premium pricing)
    • 10 queries per bonkers-lite image (budget tier)
    • GCS bucket: wallflower-images
    • No fallback for 4 of 8 feature types (current gap)

    Closing

    "If I had to explain Bonkers in one sentence: it's a cross-provider image generation engine that abstracts away 5 AI providers behind a unified API with automatic failover, content moderation, and GCS-based storage normalization.

    The hardest problems were:

    1. Provider API divergence — each provider has different payload shapes, auth methods, image formats, and error models. GCS normalization was the unification layer.
    2. Cross-provider fallback — making GENERATE features resilient to provider outages
    3. Per-feature magic prompts — each feature type needs a different enhancement strategy and system prompt
    4. The mask inversion problem — FAL and Replicate interpret inpainting masks oppositely, requiring the Photon microservice
    5. Pricing decoupling — abstract model costs intentionally diverge from provider costs for business reasons

    The system works reliably at scale, but I've documented 8 specific gaps that represent the next iteration's roadmap."

    Pro tip: If they ask about a specific area — fallback, magic prompts, moderation, the unified controller, templates — you can drill into any of those sections above. Each is written with enough detail to answer follow-ups without needing the actual code.