Q: What happens when Docker daemon is unavailable during provisioning?
Answer: When the Docker daemon is unavailable during sandbox provisioning, the system handles it through error handling and retry logic. The provisionSandbox function wraps Docker operations in try-catch blocks. If createContainer fails (e.g., Docker daemon not responding), the error is caught and logged. The function transitions the sandbox lifecycle state to FAILED with the error reason. The distributed lock is released in a finally block. The error propagates up to the caller (workflow engine or message orchestrator). The workflow engine's retry logic will retry the step up to the configured max times. If all retries fail, the workflow transitions to FAILED. The user sees an error message indicating the sandbox couldn't be provisioned. The system is designed to fail fast rather than hang - if Docker is unavailable, the operation fails immediately.
Q: How does the system handle Redis connection failures?
Answer: Redis connection failures are handled at multiple levels with graceful degradation. The ioredis client has automatic reconnection logic with exponential backoff. For rate limiting, if the Redis store is unavailable, the system falls back to allowing requests (fail-open) rather than blocking all traffic. This is implemented with try-catch blocks that log errors and call next(). For distributed locks, if Redis is unavailable, lock acquisition fails immediately and the operation proceeds without the lock. For pub/sub operations, if publishing fails, the system logs the error but doesn't fail the operation. For workflow persistence, if Redis is unavailable, the workflow can't be saved, which would fail the operation. The queue workers have Redis connection as a hard dependency - if Redis is down, workers can't process jobs. The system uses separate Redis clients for different purposes so a failure in one area doesn't affect others.
Q: What happens when the LLM API key is invalid?
Answer: When the LLM API key is invalid, the system detects this early and rejects the request before any LLM calls. In unifiedSendMessage, the system retrieves the user's API key, decrypts it, and uses getProviderFromKey to determine the provider by testing the key against regex patterns. If the key doesn't match any provider pattern, the system returns a 400 error with a message "Unable to determine provider from the saved API key. Please re-save it in settings." This happens before the run is created or queued, so no tokens are consumed. If the provider is determined but the key is invalid (expired, revoked), the actual LLM call will fail with a 401 authentication error. This error is caught in the stream session or worker, and the run is marked as FAILED with an appropriate error message.
Q: How does the system handle context window overflow mid-stream?
Answer: Context window overflow is prevented by validation before each LLM call, not mid-stream. In runAgentLoop, before each turn, the system calls computeTokenUsage to calculate total tokens and isOverContextLimit to check if this exceeds the model's context window. If it would overflow, the loop stops immediately with AgentLoopStopReason.CONTEXT_LIMIT_EXCEEDED. The system emits a SESSION_COMPLETE event with an error message explaining the context limit details. The run is marked as FAILED, and the user sees an error in the UI. The system doesn't attempt to continue with reduced context because that would likely produce poor results. This pre-check is critical because context window errors from LLM providers are often opaque and don't provide actionable information.
Q: What happens when the sandbox container crashes during build?
Answer: When a sandbox container crashes during build, the system handles it through error handling and status updates. The build job is running in the worker, which executes build commands via Docker exec. If the container crashes during a command, the exec call will fail with an error. The build orchestrator catches this error and returns a failure result with error details. The worker's processBuildJob catches the error and calls finalizeBuildFailure, which updates the build status to FAILED in the database and publishes the failure status via Redis pub/sub. The error report includes the stderr/stdout from the failed command. The network disconnect is called in a finally block. The container status cache is updated to mark the container as dead. The user sees a build failed message in the UI with error details.
Q: How does the system handle duplicate build job execution?
Answer: Duplicate build job execution is prevented through the claim mechanism and terminal status checks. In processBuildJob, the function first resolves the build record. If the build already has a terminal status (SUCCESS or FAILED), it skips execution and just publishes the existing status. If the build is in QUEUED status, it attempts to claim it by updating the status to BUILDING with a conditional WHERE clause. If this update affects 0 rows, it means another worker already claimed the build or it's already terminal. The function then checks the current status - if terminal, it publishes that status; if BUILDING, it skips as another worker is handling it. This optimistic locking ensures that even if multiple workers process the same job simultaneously, only one will actually execute the build.
Q: What happens when the user disconnects during stream?
Answer: When a user disconnects during a stream (closes the tab, loses network), the server continues processing the run to completion. The HTTP connection will close, but the run execution happens in the background. Events are still persisted to the database via persistRunEvent in the worker. The user can reconnect by refreshing the page - the frontend will detect the active run and resume streaming from the last event ID using the streamRunEventsFromPersistence function. The system uses the isResponseWritable check in unifiedSendMessage to detect if the client disconnected - if the response is no longer writable, it logs that the client disconnected but continues processing. This ensures that even if the user navigates away, the work completes and can be viewed later.
Q: How does the system handle Postgres constraint violations?
Answer: Postgres constraint violations are handled with retry logic in specific cases. In createBuild, if a foreign key constraint violation occurs (code 23503), the system retries up to 3 times with exponential backoff (500ms * attempt number). This handles the case where a chat or message is being created concurrently. For other constraint violations, the error is propagated immediately. The createRunWithUserLimit function uses advisory locks to prevent constraint violations during run admission. The upsertRunToolCall uses onConflictDoUpdate to handle unique constraint violations gracefully by updating instead of failing. The system logs all constraint violations for monitoring. Retry logic is only used for transient violations (foreign keys due to concurrent inserts), not for data integrity violations.
Monitoring & Observability
Q: What metrics are tracked during agent runs?
Answer: The system tracks multiple metrics during agent runs for monitoring and analytics. In the worker's processor.events.ts, the trackRunEventProgress function tracks: turn count, first token latency (time from run start to first text event), loop stop reason, termination reason, tool call counts, token usage, and error messages. These metrics are persisted in the run record's metadata and emitted in SESSION_COMPLETE events. First token latency is measured by capturing the timestamp when the first TEXT event arrives and comparing it to the run start time. Token usage is tracked from the LLM provider (Anthropic provides token counts in stream deltas, others are estimated). Tool call counts are incremented each time a tool is executed. These metrics are used for performance monitoring, cost tracking, and debugging slow runs.
Q: How does request ID propagation work?
Answer: Request ID propagation ensures that all logs and events for a single request can be correlated. The security telemetry middleware generates or propagates a unique x-request-id header. If the incoming request has an X-Request-Id header, that value is used; otherwise, a new UUID is generated. This ID is set in the response header so clients can include it in support requests. The ID is logged in all log statements via the requestId field in the log context. In the API, the getRequestId function extracts the ID from the request. In the worker, run IDs serve a similar purpose for correlating all events associated with a single agent run. This propagation is critical for debugging distributed systems where a single request spans multiple services (API, worker, LLM provider).
Q: What's the purpose of the rate limit headers?
Answer: Rate limit headers inform clients of their current quota status. The headers include: RateLimit-Limit (the maximum number of requests allowed in the window), RateLimit-Remaining (how many requests remain in the current window), RateLimit-Reset (Unix timestamp when the window resets), and RateLimit-Scope (which rate limit policy is being applied). These headers allow clients to implement client-side rate limiting, show users their remaining quota, and avoid unnecessary retries when they've hit the limit. The headers are set by the rate limit middleware for each scope. For the daily chat quota, the headers are set based on the actual successful chat completions tracked in Redis, not just request counting.
Q: How does Sentry integration work?
Answer: Sentry integration provides error tracking and performance monitoring. In utils/sentry.ts, Sentry is initialized with the DSN from environment variables. The integration captures uncaught exceptions and unhandled promise rejections. The API server wraps the bootstrap in error handling that calls captureException. The worker also has Sentry initialization for background job errors. Request IDs are attached to Sentry events for correlation. Source maps are uploaded during the build process for readable stack traces. User context (userId) is attached to events when available. Sentry provides dashboards for error rates, performance metrics, and alerting on critical errors. The integration is configured to filter out certain expected errors (like rate limit violations) to reduce noise.
Deployment & Infrastructure
Q: How does the deployment type affect preview routing?
Answer: Deployment type (configured via EDWARD_DEPLOYMENT_TYPE) determines how previews are accessed. In PATH mode (default), previews are accessed via URLs like https://assets.example.com/userId/chatId/preview/index.html. The preview URL is constructed using the S3/CloudFront URL and the path prefix. No additional infrastructure is needed. In SUBDOMAIN mode, previews are accessed via subdomains like https://chatId-userId.preview.example.com. This requires Cloudflare integration - the registerPreviewSubdomain function writes routing rules to Cloudflare KV, mapping subdomains to S3 paths. A CloudFront Function or Cloudflare Worker rewrites requests from the subdomain to the S3 path. The resolveDeploymentType function automatically chooses subdomain mode if all required Cloudflare credentials are present.
Q: Why use trust proxy configuration?
Answer: The trust proxy configuration is critical for correct IP detection and protocol handling when the API is behind a load balancer or reverse proxy. The parseTrustProxy function in app.config.ts parses the TRUST_PROXY environment variable which can be a boolean, number (hop count), or comma-separated list of trusted proxy addresses. When enabled (true in production), Express trusts the X-Forwarded-For header for IP addresses and X-Forwarded-Proto for the protocol (http/https). This is necessary for: (1) correct rate limiting per IP, (2) security telemetry logging actual client IPs, (3) HTTPS enforcement middleware to work correctly behind SSL termination, (4) CORS origin checking to work with the original protocol. The configuration defaults to true in production and false in development.
Q: How does the CloudFront Function rewrite work?
Answer: The CloudFront Function rewrite handles subdomain-based preview routing. In subdomain mode, when a user accesses https://chatId-userId.preview.example.com, the request hits CloudFront which has a viewer request function configured. This function extracts the subdomain, parses the chatId and userId, and rewrites the request path to /userId/chatId/preview/index.html (or the specific file being requested). This rewrite happens at the edge, so the request is routed to the correct S3 path without needing complex origin configuration. The function also handles SPA routing by rewriting requests for paths without file extensions to index.html. This approach is more performant than Lambda@Edge functions and doesn't require additional infrastructure beyond CloudFront.
Q: What's the purpose of the prewarm sandbox image?
Answer: The prewarm sandbox image strategy reduces cold start latency for sandbox provisioning. During API server startup, the system can pull a base Docker image (configured via PREWARM_SANDBOX_IMAGE) so it's cached locally. When a sandbox needs to be provisioned, the container can be created quickly from the cached image instead of waiting for the image pull. This is particularly important for the first user request after a deployment or server restart. The prewarm image is typically the base Node.js image used for all sandboxes, with common dependencies pre-installed. The image pull happens during the initSandboxService call in server bootstrap. This strategy significantly reduces the time from user request to sandbox ready from potentially 30+ seconds to a few seconds.
Testing & Quality
Q: What are the quality gates in the API?
Answer: The API has several quality gates enforced in the CI pipeline. Type checking is required - TypeScript must compile without errors. Linting with ESLint has a max-warnings threshold of 0, meaning no linting warnings are allowed. Function length limits prevent functions from becoming too complex (enforced by ESLint rules). Code duplication detection uses jscpd with a 70-token threshold and 10-line minimum - duplicate code blocks cause CI failure. Architecture boundary checks ensure imports follow the layered structure (controllers/services/lib) and prevent circular dependencies. The CI script runs pnpm ci:local which executes typecheck, lint, and architecture checks in parallel using Turbo. These gates ensure code quality before merging to main.
Q: How does the code duplication check work?
Answer: The code duplication check uses jscpd (JavaScript Copy/Paste Detector) configured in the root package.json. It scans the codebase for similar code blocks with a minimum of 10 lines and 70 tokens similarity threshold. It ignores test files, dist directories, node_modules, and generated files. If duplicate code is found exceeding the threshold, the CI fails. The check is important because code duplication leads to maintenance burden - bugs need to be fixed in multiple places. The configuration allows for some legitimate duplication (e.g., similar error handling patterns) while flagging problematic duplication. The check runs as part of the CI pipeline and can be run locally with pnpm run jscpd.
Q: What's the architecture boundary check?
Answer: The architecture boundary check enforces the layered architecture of the API. The check is implemented as a custom script that analyzes import statements and ensures: (1) Controllers only import from services and lib, (2) Services only import from other services and lib, (3) Lib has no dependencies on controllers or services, (4) No circular dependencies exist between modules. This prevents architectural drift where business logic creeps into controllers or where tight coupling develops between layers. The check runs as part of the CI pipeline and can be run locally. Violations cause CI failure with a detailed report of which imports violate the boundaries. This check is critical for maintaining a clean, testable architecture as the codebase grows.
API Server Architecture
Q: How is the Express app composed?
Answer: The Express app is composed in server/http/app.factory.ts using the factory pattern. The createHttpApp function takes configuration parameters (isDev, isProd, allowedOrigins, environment, trustProxy) and returns a configured Express app. Middleware is added in a specific order: (1) trust proxy setting, (2) Helmet for security headers, (3) HTTPS force middleware in production, (4) CORS with origin checker, (5) security telemetry middleware, (6) cookie parser, (7) JSON body parser with 1MB limit, (8) URL-encoded body parser, (9) request logging in development, (10) auth middleware, (11) route mounting for api-key, chat, and github routes, (12) error handler, (13) 404 handler. This order is critical - auth must come after cookie parsing but before routes, CORS must come early, etc.
Q: How does CORS origin checking work?
Answer: CORS origin checking in createCorsOriginChecker validates that the request origin is allowed. In development, all origins are allowed. In production, the origin is checked against the CORS_ORIGINS environment variable (comma-separated list). The checker extracts the Origin header from the request and compares it against the allowed origins list. If the origin is not in the list, the CORS middleware blocks the request. The checker also handles the case where the Origin header is missing (some browsers don't send it for same-origin requests). The allowed origins list is parsed from the environment variable in app.config.ts. This is critical for security - without proper CORS checking, any website could make requests to the API and steal user data.
Q: How does the security telemetry middleware detect anomalies?
Answer: The security telemetry middleware detects anomalies by examining HTTP response status codes. In the finish event handler, it checks if the status is 401 (unauthorized), 403 (forbidden), 429 (rate limited), or >= 500 (server error). If any of these conditions are true, it logs a security event with context including request ID, method, path, status, duration, and IP. This detection is important for identifying potential attacks (brute force auth attempts, abuse patterns) and system issues (high error rates). The logged events can be analyzed for patterns - e.g., many 401s from a single IP might indicate a brute force attack, many 429s might indicate abuse, many 5xxs might indicate a system outage.
Message Orchestration
Q: How does the unifiedSendMessage function work?
Answer: The unifiedSendMessage function in messageOrchestrator.service.ts orchestrates the entire message flow from user request to queued agent run. It performs: (1) Run admission window check to prevent overload, (2) User API key retrieval and validation, (3) Provider/model validation to ensure compatibility, (4) Chat creation or retrieval, (5) Intent detection via planning workflow, (6) Message persistence, (7) Image attachment handling, (8) Run creation with admission limits, (9) Run queuing to BullMQ, (10) Stream handoff to persistence-based streaming. The function sets up SSE headers and sends initial META events. It handles various error cases (API key missing, model mismatch, admission rejection) with appropriate error responses. The function is the main entry point for the chat API.
Q: How does run admission work?
Answer: Run admission in runAdmission.service.ts controls how many runs can execute concurrently. The getRunAdmissionWindow function queries the database for active runs (queued or running status) and returns metrics including active run depth and user-specific limits. The admission check happens before run creation. If the system is overloaded (global limit exceeded), the request is rejected with 429. If the user has too many active runs (user limit exceeded), the request is rejected. If the chat has an active run (chat limit exceeded), the request is rejected. These limits prevent resource exhaustion and ensure fair allocation of compute resources. The limits are configured via environment variables and can be adjusted based on capacity.
Q: How does the planning workflow integrate with message sending?
Answer: The planning workflow is integrated into the message flow in unifiedSendMessage. After the chat is created/retrieved, the system calls createWorkflow with the user request. It then calls advanceWorkflow to execute the ANALYZE phase, which determines the intent (GENERATE, BUILD, DEPLOY, etc.) and suggested framework. The workflow context includes recommended packages that are passed as preVerifiedDeps to the agent run. This integration ensures that the AI agent has context about what the user wants to accomplish before it starts generating code. The workflow state is persisted in Redis and can be resumed if needed. The workflow also handles recovery if the ANALYZE phase fails.
Rate Limiting
Q: How is the daily chat quota implemented?
Answer: The daily chat quota is implemented differently from other rate limits because it tracks successful chat completions rather than requests. The dailyChatRateLimiter middleware calls getDailyChatSuccessSnapshot which queries Redis for the user's successful chat completion count in the current daily window. The count is incremented when a chat successfully completes (not when the request is made). This ensures users are limited by actual usage, not failed requests. The quota resets daily based on a timestamp in Redis. The middleware sets rate limit headers (Limit, Remaining, Reset) so the UI can show the user's remaining quota. If the quota is exceeded, the request is rejected with 429. This implementation is more accurate than request-based limiting for chat usage.
Q: How does the Redis-backed rate limiting work?
Answer: Redis-backed rate limiting uses the rate-limit-redis store with express-rate-limit. The store implements the rate limit interface using Redis commands. For each request, it increments a counter in Redis with a key pattern like rl:scope:userId. The increment is done with an expiration (PX) equal to the window duration, so the counter automatically resets. The store checks if the counter exceeds the max - if so, the request is rate limited. Using Redis ensures rate limits are enforced across multiple API server instances. The sharedRedisRateLimitConfig function configures the Redis client with the sendCommand method that the store uses. This approach is more scalable than in-memory rate limiting for distributed systems.
Q: What are the different rate limit scopes?
Answer: Edward has multiple rate limit scopes for different operations: API_KEY (for API key management operations), CHAT_BURST (for rapid chat requests, short window), CHAT_DAILY (for daily chat quota, 24-hour window), IMAGE_UPLOAD_BURST (for image uploads), GITHUB_BURST (for GitHub operations), GITHUB_DAILY (for daily GitHub operations), PROMPT_ENHANCE_BURST (for prompt enhancement). Each scope has its own policy with different window durations and max requests defined in @edward/shared/constants. This granular scoping allows appropriate limits for different operations - e.g., burst limits for rapid requests, daily limits for quota management. The scopes are applied to specific routes in the route definitions.
Configuration Management
Q: How is the configuration structured?
Answer: The configuration in app.config.ts is structured as a nested object with getter functions for lazy evaluation and validation. The config object has sections for redis, server, cors, encryption, aws, github, deployment, previewRouting, docker, and webSearch. Environment variables are validated on access - for example, redis.host calls validateEnvVar which throws if the variable is missing or empty. Some values like deployment type have resolver functions that determine the value based on other environment variables. The trust proxy setting has a custom parser that handles boolean, number, and list formats. This structure ensures configuration is validated at startup rather than at runtime when it's too late.
Q: How are environment variables validated?
Answer: Environment variables are validated using helper functions in app.config.ts. The validateEnvVar function checks that a variable exists and is not empty after trimming - if validation fails, it throws an error with the variable name. The validatePort function parses a string as an integer and checks it's a valid port (1-65535). The parseTrustProxy function handles the various formats (boolean, number, list) that the trust proxy setting can take. The parseRedisUrl function parses a Redis connection string and extracts host and port. Validation happens when the config property is first accessed, not at module load time. This lazy validation allows the config module to be imported without requiring all environment variables to be present during testing.
Q: How does the deployment type resolution work?
Answer: The deployment type resolution in resolveDeploymentType automatically determines whether to use path-based or subdomain-based preview routing. It first checks if EDWARD_DEPLOYMENT_TYPE is explicitly set to "path" or "subdomain". If not, it checks if all required Cloudflare credentials are present (CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID, CLOUDFLARE_KV_NAMESPACE_ID, PREVIEW_ROOT_DOMAIN) using hasCompletePreviewRoutingConfig. If all credentials are present, it defaults to subdomain mode; otherwise, it defaults to path mode. This automatic resolution makes it easy to switch between deployment modes by just adding or removing environment variables, without code changes.