Purpose: Script for when a senior engineer/recruiter asks you to walk through the Bonkers image generation platform. Speak naturally — think of it as your director's commentary. You built every piece. Here's how to tell that story without showing code (NDA).
Opening Frame
"Let me start with the big picture, then drill into every layer.
Bonkers is the AI image generation platform at getmerlin.in. It was originally a standalone product that got consolidated into the Merlin monorepo. Internally, the image generation system is codenamed Wallflower — that's the name you'll see across the backend.
On the surface, Bonkers lets users:
Generate images from text prompts using 20+ models
Under the hood, it's a cross-provider image generation engine that abstracts away 5 different AI providers behind a unified API, with automatic failover, content moderation, magic prompt enhancement, and a GCS-based storage pipeline.
The system does about 50K+ image generations per month, with 8 feature types and 20+ supported models."
Architecture Overview
"The Bonkers system splits across the monorepo:
Frontend (bonkers/apps/website/):
The /bonkers route group — canvas UI, model selection, prompt input, gallery, style presets
SSE streaming for real-time generation progress
Zustand store for generation state (pending, streaming, complete, error)
Backend (merlin-arcane/src/server/):
The Wallflower endpoint group at /v1/wallflower/
unified-generation.controller.ts — the main controller that routes to providers
Helper files for each provider: replicate/, fal-ai/, ideogram/, goapi/, leonardo/, dalle.ts, google-image-gen/, openai-image-gen.ts
Shared utilities: helpers/common.ts for prompt enhancement, image formatting, moderation
Firestore collections: wallflower/{userId}/images/{imageId} for image posts
GCS bucket: wallflower-images for all generated images
The architecture follows a unified pipeline pattern: all 8 feature types go through the same request flow, with different routing based on the model and feature type."
The Abstract Model Mapping — The Core Architectural Pattern
"This is the most important design decision in the system.
Users never see provider-specific model names like black-forest-labs/flux-schnell or fal-ai/ideogram/v3. Instead, they see abstract model IDs:
User-Facing Name
Abstract Model ID
Resolves To
Cost
Bonkers Lite
bonkers-lite
prunaai/hidream-l1-fast
20 queries
Bonkers Advance
bonkers-advance
fal-ai/ideogram/v3
120 queries
Bonkers Upscale
bonkers-upscale
fal-ai/clarity-upscaler
10 queries
Bonkers Magic Fill
bonkers-magic-fill
fal-ai/ideogram/v3/edit
75 queries
Bonkers Magic Erase
bonkers-magic-erase
fal-ai/bria/eraser
10 queries
Bonkers BG Edit
bonkers-bg-edit
fal-ai/bria/background/replace
10 queries
This resolution happens at the Zod schema layer via .transform(). When the request body comes in with abstractModelId: "bonkers-advance", the schema transforms it to modelConfig.modelId: "fal-ai/ideogram/v3" before the controller ever sees it.
The benefits of this abstraction:
Provider swaps without UI changes — change the Zod .transform() map, zero frontend changes
Model deprecation resilience — when a provider kills a model, just update the map entry, users never notice
A/B testing — could route 10% of bonkers-advance traffic to a new model by changing the map
Unified pricing — bonkers-advance is priced at 120 queries regardless of what model it resolves to, decoupling cost from provider pricing
The one tricky part: the pricing override. bonkers-advance costs 120 queries even though Ideogram V3 costs 75. There's an explicit check in the controller: if abstractModelId === "bonkers-advance", override usageConfig.queries = numberOfImages * 120. This intentionally decouples abstract model pricing from underlying provider cost — it's a premium-tier pricing mechanism that also acts as a rate limiter."
The 8 Feature Types
"Bonkers supports 8 distinct feature types, all routed through the same unified pipeline:
1. GENERATE — Standard text-to-image. The core feature. Routes to any model based on selection.
2. INPAINT — Mask-based inpainting (magic fill). The user draws a mask over an area and describes what should replace it. Providers handle this differently: FAL expects a mask_url parameter, Replicate needs an inverted mask (I built getInvertedMaskUrl() using the Photon microservice to invert the colors because Replicate interprets the mask opposite to FAL).
3. UPSCALE — Resolution enhancement. Takes an existing image and increases its resolution using fal-ai/clarity-upscaler.
4. ERASE — Object removal. User selects an object, and fal-ai/bria/eraser removes it with AI-powered inpainting.
5. EDIT_BG — Background replacement. fal-ai/bria/background/replace detects the subject and replaces the background based on a prompt.
6. OMNI_EDIT — Multi-image editing. Uses fal-ai/flux-pro/kontext/multi to edit multiple images in one request.
7. TEMPLATE — 9 pre-styled generation templates: Ghibli style, Minecraft style, Simpson style, Pixar style, Humanize My Pet, Watermark Remover, Make Me Bald, Product Photography, Logo Wizard. Each routes to a specific provider and model with fixed system prompts.
8. REMIX_WITH_MULTI_IMAGE — Image remixing. Takes an input image and generates variations with GPT Image 1.
The hardest challenge unifying these: each provider has a different API shape. Some expect images as base64, others as URLs, others as multipart uploads. The GCS-based normalization was the key — upload the image to GCS once, and all providers reference it by URL. Mask handling for inpainting was another divergence point: FAL uses mask_url, Replicate needs an inverted mask. The Photon microservice handles the color inversion."
The Generation Pipeline — End to End
"Here's what happens when a user clicks Generate. Follow the data:
Plain text
Client POST /v1/wallflower/unified-generation
Step 1: Validation
The Zod schema validates the entire request body — prompt, modelConfig, feature type, style preset, number of images, isPublic flag. Invalid requests are rejected before any processing happens.
Step 2: Auth & Usage Check
Firebase ID token verification. Then the usage limits middleware checks: is the user on a plan that allows Bonkers? Free users are blocked entirely at this stage — FEATURE_LIMITS.bonkers has a GUEST block that prevents them from reaching the controller.
Step 3: Prompt ModerationcheckPromptFlagged() calls GPT-4o-mini with a strict NSFW classification system prompt. This is blocking — awaited before any provider API call. If flagged, it throws a ClientError(400) and the image is never sent to the provider. This saves significant API costs on rejected prompts.
Step 4: Magic Prompt Enhancement
If magic prompt is enabled, the user's prompt goes through improvePrompt(). This calls GPT-4o-mini (or GPT-4o for INPAINT) with a feature-specific system prompt. Each feature type has a different system prompt and a Zod JSON schema for structured output:
GENERATE: synthesizes a cohesive prompt ≤1000 chars
EDIT_BG: describes new backgrounds without referencing the subject
Templates: style-specific enhancement
Step 5: Abstract Model Resolution
The Zod schema .transform() converts the abstract model ID to a concrete provider model ID. This is where bonkers-advance becomes fal-ai/ideogram/v3.
Step 6: Provider Dispatch
The controller routes to the correct provider handler based on the model ID prefix:
Step 7: Provider Call with Fallback
The primary provider is called via callWithFallback(). If the primary throws ANY error (API error, 5xx, network failure, NSFW detection at the provider level), the fallback handler is invoked using the FALLBACK_MODELS_MAP. This map is bidirectional between Replicate and FAL AI equivalents.
IMPORTANT CAVEAT: The fallback only covers GENERATE models. Features like ERASE, UPSCALE, EDIT_BG, OMNI_EDIT use models that have no equivalent on the other provider (fal-ai/bria/eraser has no Replicate alternative). If FAL AI goes down, 5 of the 8 features become completely unavailable.
Step 8: Post-Processing
After generation, formatImageGenerations() runs:
Aspect ratio resolution — auto-detect via Photon microservice or use the ratio from the provider config
SEO description — GPT-4o-mini generates keyword-rich third-person descriptions for social sharing
GCS upload — the image is downloaded from the provider's URL (which could be temporary) and uploaded to wallflower-images/{uid}/{iid}.png. The GCS URL becomes the canonical reference
Parent image lineage — if this is an edit/remix, the original image metadata is attached
Step 9: Firestore Persistence
The image document is saved to Firestore with: prompt, enhanced prompt, model ID, style, GCS URL, seed, aspect ratio, likes (empty array), visibility, metadata.
Step 10: SSE Events
Two SSE events are emitted: attachments with the formatted image data, and usage with the updated quota information.
Step 11: Usage IncrementincrementUserUsage() increments the user's feature counter based on the abstract model's query cost. For bonkers-advance, this is hardcoded to 120 queries per image regardless of the underlying model's cost."
Cross-Provider Fallback — How It Actually Works
"The fallback system is based on FALLBACK_MODELS_MAP — a bidirectional map of 10+ model equivalents between Replicate and FAL AI:
callWithFallback() is sequential — it tries the primary provider first, and only on error tries the fallback. It's NOT parallel (which would double API costs on every request).
The routing is model-prefix-based in helpers/generate.ts:
Replicate-model-prefixed IDs use handleReplicateWithFalAIFallback() — Replicate first, FAL as backup
FAL-model-prefixed IDs use handleFalWithReplicateFallback() — FAL first, Replicate as backup
Some mappings are inexact — ideogram-ai/ideogram-v2-turbo falls back to fal-ai/flux-pro/v1.1, which is a completely different model. The fallback gives the user something rather than nothing, but it won't be the same quality or style.
The known gap: ERASE, UPSCALE, EDIT_BG, and OMNI_EDIT have NO fallback. Their models (fal-ai/bria/eraser, fal-ai/clarity-upscaler, etc.) don't have equivalents on Replicate. A FAL AI outage takes down 5 of 8 features completely."
Magic Prompt System — Per-Feature Enhancement
"The magic prompt system isn't one-size-fits-all. Each feature type has a dedicated LLM system prompt with structured output schemas.
GENERATE — System prompt: 'Synthesize a cohesive single prompt ≤1000 chars that captures the user's intent.' Uses GPT-4o-mini. The Zod schema returns { enhancedPrompt: string }.
INPAINT — Uses GPT-4o (more capable model because inpainting is harder). System prompt has elaborate rules for interpreting ambiguous removal vs replacement requests — with examples like 'remove the car' vs 'replace the car with a tree.' The Zod schema returns { enhancedPrompt: string, isRemoveOnly: boolean }.
EDIT_BG — System prompt: 'Describe the new background naturally, without referencing the original subject or using phrases like "the subject" or "the person."' This prevents artifacts where the background model tries to regenerate the subject.
GHIBLI Template — System prompt: 'Transform the scene to Studio Ghibli style. Use soft, painterly aesthetics, muted pastels, magical lighting, floating dust motes, hand-drawn linework, whimsical clouds, lush greenery, detailed skies, and Studio Ghibli's signature warmth and emotional depth.' This level of detail in the system prompt produces dramatically better style transfer results.
PRODUCT_PHOTOGRAPHY — System prompt: 'Create a detailed photoshoot scene for professional product photography, describing lighting, composition, background, and atmosphere.'
LOGO_WIZARD — System prompt: 'Generate professional logo design specifications with typography, brand identity, and visual style guidelines.'
The system prompt library is defined in MAGIC_PROMPT_SYSTEM_PROMPT and the model routing for magic prompts in MAGIC_PROMPT_MODELS.
When magic prompt is OFF, default values per feature are returned — isRemoveOnly: false for INPAINT, imageStrength: 0.8 for GHIBLI, etc."
Content Moderation — Two Layers
Pre-generation: checkPromptFlagged()
Uses GPT-4o-mini with a strict NSFW/safety classification system prompt
Returns a JSON object: { flagged: boolean, reason: string }
Blocking — the image is never sent to the provider if flagged
Saves API cost on rejected prompts
Post-generation: isImageFlagged()
Uses OpenAI's omni-moderation-latest model to check the actual generated image
Currently only used by template features (Ghibli, Minecraft, Simpson, Pixar, Humanize My Pet)
Non-template features skip this check — this is a gap
Known issue: If the LLM returns malformed JSON or throws during checkPromptFlagged(), the code currently fails open — returns flagged: false, meaning the prompt passes through unchecked. The JSON.parse() call for the LLM response isn't wrapped in try/catch. This should be fail-closed: default to flagged: true on any error."
Image Storage & Post-Processing Pipeline
"The post-processing pipeline in formatImageGenerations() handles several transformations between provider response and user delivery:
1. Aspect Ratio Resolution
If aspectRatio is 'auto-detect', we call the Photon microservice (getDimensionsFromImage()) to fetch actual image dimensions. If Photon fails, we fall back to { width: 1024, height: 1024 } — not ideal but prevents the whole request from failing. If a ratio was explicitly set, we use getDimensionsFromAspectRatio() to calculate width/height.
2. SEO Description Generation
Each variation gets a call to GPT-4o-mini via describeImage() to generate a keyword-rich third-person description. This runs in parallel across all variations via Promise.allSettled(). The description is used for social sharing OG tags and discoverability.
3. GCS Upload (The Normalization Layer)getAttachmentWithGBucketUrl() downloads the image from the provider's URL (which is usually a temporary signed URL that expires), uploads it to wallflower-images/{uid}/{iid}.png with public-read ACL, and replaces the URL with the permanent GCS URL. This is the key unification layer — all provider URLs, regardless of format or expiry policy, are normalized through GCS.
4. Parent Image Lineage
If this generation was an edit, remix, or template application, the parentImage/parentImages objects are attached to track provenance. This includes the original URL, IID, and public status.
5. Cost Calculation
Usage cost is calculated: queryCost * numberOfImages where queryCost comes from IMAGE_MODELS_INFO — a static map of model display names to query costs (e.g., FLUX.1 Schnell = 10 queries, FLUX.1 Pro = 140 queries).
GCS → Firestore sequencing: The image is uploaded to GCS first, then saved to Firestore. If Firestore save fails after GCS upload, the image is orphaned — exists in GCS with no metadata. The proposed fix: write-ahead — save Firestore first with a 'pending' status, then update to 'complete' after GCS upload."
Style System — Translation Across Providers
"We support 9 preset styles: Auto, Anime, Realistic, Digital Art, Vintage, Cinematic, Fantasy, Neon Noir, and Minimalist. These are defined in PRESET_STYLES_MAP.
Each provider has a different way of applying styles:
Ideogram: Native style parameter — ideogramStyle can be General, Anime, Realistic, Design, or Render 3D
Recraft: recraftStyle parameter — values like realistic_image/natural_light, digital_illustration
Other models: The style is appended to the prompt as 'in {style} style' via getStyleModifiedPrompt()
Ideogram V2 Turbo: Returns the prompt unmodified — it has native style support, and modifying the prompt actually hurts quality
There are exceptions wired in: Recraft V3 skips the prompt append for Realistic style because it uses the API's style param instead."
Templates — 9 Pre-Styled Generations
"Templates are Firestore-stored configuration objects with a prompt template, model config, style, thumbnail, and category. Each template routes to a specific provider:
Template
Provider
Ghibli Style
OpenAI GPT Image 1
Minecraft Style
OpenAI GPT Image 1
Simpson Style
OpenAI GPT Image 1
Pixar Style
OpenAI GPT Image 1
Humanize My Pet
OpenAI GPT Image 1
Watermark Remover
Google Gemini 2.0 Flash
Make Me Bald
Google Gemini 2.0 Flash
Product Photography
OpenAI (specific handler)
Logo Wizard
OpenAI (specific handler)
Templates are versioned via a version field. When a template is updated, a new version document is created; the frontend always requests the latest version. Model deprecation is handled through the abstract model map — the template itself doesn't change, only the mapping layer.
In the first month after launch, templates generated 10,000 images — about 20% of total generation volume at that time."
The Unified Controller — How It All Ties Together
"The unified-generation.controller.ts is the central orchestrator. Here's the exact sequence:
Validate input — Zod schema validation
Check user exists in wallflower collection
Check prompt for NSFWcheckPromptFlagged() — blocking
Apply style modifiers to prompt via getStyleModifiedPrompt()
Enhance prompt if magic prompt is enabled via improvePrompt()
Route to provider handler based on model ID prefix
Provider executes — calls API, waits for prediction, downloads images
Format results — formatImageGenerations() does GCS upload, SEO, aspect ratio
Emit SSE events — attachments and usage
Calculate usage cost — queryCost × numberOfImages
Save image post to Firestore
Increment user usage via incrementUserUsage()
The key design: it's all middleware-based, sharing a request context via AsyncLocalStorage. The controller's job is coordination — the actual provider work happens in dedicated handler files."
Known Gaps & Failure Modes
"I've documented these explicitly because they represent the boundary between 'works now' and 'works at scale':
1. No fallback for non-GENERATE features
ERASE, UPSCALE, EDIT_BG, OMNI_EDIT have a single provider. If FAL AI goes down, those 4 features break completely. GENERATE features degrade gracefully via the fallback map.
2. Orphaned images in GCS
If saveImagesToFirestore() fails after GCS upload, the image is orphaned — exists in storage with no Firestore metadata. No recovery mechanism exists. Fix: write Firestore first with 'pending' status.
3. Prompt moderation fails open
If the LLM in checkPromptFlagged() returns malformed JSON, JSON.parse() throws — there's no try/catch. The function returns false (not flagged), and the prompt passes through unchecked. Fix: wrap in try/catch, default to flagged: true.
4. Seed reproducibility across fallback
Seeds are provider-specific. If the fallback fires, the same seed produces a completely different image because it's a different model. The concrete modelId isn't stored alongside the seed — regeneration can silently change providers.
5. SEO description cost at scale
Each generation calls GPT-4o-mini for SEO descriptions. At 100K DAU generating 2 images each, that's 200K LLM calls/day ($700/month). The descriptions block the response by 500ms–2s. Fix: make async via Cloud Tasks.
6. Photon dependency
The Photon microservice handles image dimension detection and mask inversion. If Photon is down, INPAINT and ERASE on Replicate fail completely (mask inversion is required). Dimension detection falls back to 1024×1024.
7. Firestore hotspot on usage counters
Every generation increments the customers/{uid} document. Firestore's 1 write/sec per document limit is a scalability bottleneck. Fix: distributed counter shards with Redis buffering.
8. No decision logging
The chat system has a decisionLog that records every tool selection and model choice. The image generation pipeline has zero decision logging. Debugging 'my image looks wrong' requires correlating browser logs, Cloud Run logs, Firestore, provider dashboards, and GCS — 20-30 minutes per incident."
Deployment & Infrastructure
Frontend: Deployed on Vercel as part of the Next.js app. The /bonkers route group is part of the same build.
Backend: Deployed on Google Cloud Run as part of the Arcane service. The Wallflower endpoints share the same container as the chat endpoints — they're route groups within the same Express application.
Providers: API keys for FAL AI, Replicate, OpenAI, Google Vertex AI, GoAPI, Ideogram — all embedded as environment variables in Cloud Run.
Storage:
GCS bucket wallflower-images — all generated images
Usage Tracking: Bonkers has its own feature key (bonkers) separate from Merlin chat (merlin). BONKERS_PRO and BONKERS_BASIC plans consume from bonkers. Free/PRO/TEAMS/ELITE users who use image generation consume from merlin — meaning one FLUX Pro image (140 queries) can exhaust their entire daily chat quota."
Key Numbers
20+ supported models across 5 providers
8 feature types unified under one pipeline
9 template types
~50K+ image generations per month
~700 tokens per magic prompt call
120 queries per bonkers-advance image (premium pricing)
10 queries per bonkers-lite image (budget tier)
GCS bucket: wallflower-images
No fallback for 4 of 8 feature types (current gap)
Closing
"If I had to explain Bonkers in one sentence: it's a cross-provider image generation engine that abstracts away 5 AI providers behind a unified API with automatic failover, content moderation, and GCS-based storage normalization.
The hardest problems were:
Provider API divergence — each provider has different payload shapes, auth methods, image formats, and error models. GCS normalization was the unification layer.
Cross-provider fallback — making GENERATE features resilient to provider outages
Per-feature magic prompts — each feature type needs a different enhancement strategy and system prompt
The mask inversion problem — FAL and Replicate interpret inpainting masks oppositely, requiring the Photon microservice
Pricing decoupling — abstract model costs intentionally diverge from provider costs for business reasons
The system works reliably at scale, but I've documented 8 specific gaps that represent the next iteration's roadmap."
Pro tip: If they ask about a specific area — fallback, magic prompts, moderation, the unified controller, templates — you can drill into any of those sections above. Each is written with enough detail to answer follow-ups without needing the actual code.