InkdownInkdown
Start writing

Interview Questions

13 filesยท4 subfolders

Shared Workspace

Interview Questions
Agentic

interview-questions-part3

Shared from "Interview Questions" on Inkdown

Edward Project - Comprehensive Interview Questions - Part 3

Table of Contents

  • Authentication & Security
  • Error Handling & Resilience
  • Frontend Architecture
  • Planning Workflow
  • Performance & Optimization

Authentication & Security

Q: How does the auth middleware validate sessions?

Answer: The auth middleware validates user sessions using the Better Auth library. For each request (except OPTIONS preflight), it calls . The function converts Express Headers (lowercase) to the format expected by Better Auth (standard case). Better Auth checks the session cookie from the request headers and returns session data including the user and session objects. If the session data is missing or the user/session are null, the middleware logs a security event with context (IP, path, request ID) and returns a 401 Unauthorized response. If valid, it extracts the and and attaches them to the request object as and . These attached values are used by downstream handlers and services for authorization and auditing.

interview-questions.md
Bonkers
BONKERS_END_TO_END_GUIDE.md
BONKERS_INTERVIEW_QUESTIONS.md
interview_questions.md
PROJECT_WALKTHROUGH_SCRIPT.md
Edward
pookie
questions
interview-questions-part1.md
interview-questions-part2.md
interview-questions-part3.md
interview-questions-part4.md
interview-questions-part5.md
interview-questions-part6.md
interview-questions-part7.md
auth.api.getSession({ headers: toFetchHeaders(req.headers) })
toFetchHeaders
userId
sessionId
req.userId
req.sessionId
Q: What's the purpose of security telemetry middleware?

Answer: The security telemetry middleware provides comprehensive security monitoring and audit logging. It generates or propagates a unique x-request-id header for each request, included in all logs for tracing. It extracts and logs the client IP address using getClientIp, which handles various proxy scenarios (X-Forwarded-For, X-Real-IP, etc.). On response completion, it checks the status code and logs a security event if the status indicates an anomaly (401 unauthorized, 403 forbidden, 429 rate limit, or 5xx server error). These security events are logged with context including the request ID, method, path, status, duration, and IP. The middleware also sets the x-request-id in the response header so clients can include it in support requests. This telemetry is critical for detecting attacks, debugging security issues, and maintaining an audit trail.

Q: How are rate limits scoped and enforced?

Answer: Rate limiting is scoped at multiple levels and enforced using Redis-backed rate limiting with express-rate-limit and rate-limit-redis. Separate rate limiters exist for different scopes: API_KEY, CHAT_BURST, CHAT_DAILY, IMAGE_UPLOAD_BURST, GITHUB_BURST, and PROMPT_ENHANCE_BURST. Each scope has a policy with window duration, max requests, and Redis prefix. Rate limiters use a key generator that uses userId for authenticated requests and IP address for anonymous requests. The Redis store ensures rate limits are enforced across multiple API server instances. The daily chat quota is special - it tracks successful chat completions in Redis with a daily reset, rather than request counting. Rate limit headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset, RateLimit-Scope) are set in responses to inform clients of their limits.

Q: Why use Helmet with specific CSP directives?

Answer: Helmet sets HTTP security headers, and the Content Security Policy (CSP) is configured specifically for Edward's needs. The CSP allows inline styles because Tailwind CSS generates inline styles dynamically. It allows script sources from the domain and CDN-hosted dependencies. It allows image sources from the S3/CloudFront domain for user-uploaded images and from data URIs for base64-encoded images. The connect-src directive allows connections to the API domain and external APIs (LLM providers, GitHub, etc.). The CSP also includes frame-ancestors to prevent clickjacking. The specific directives are tuned to allow Edward's functionality while preventing XSS attacks. For example, the script-src directive doesn't allow unsafe-inline or unsafe-eval, which would be a security risk.

Q: How is the API key encryption handled?

Answer: API key encryption ensures that user-provided LLM API keys are never stored in plaintext. When a user saves their API key via the settings UI, it's encrypted using AES encryption with the ENCRYPTION_KEY environment variable (a 32-character hex string). The encryption happens in the API layer before saving to the database. The encrypted key is stored in the users table's apiKey column. When the key is needed for LLM calls, it's decrypted using the same encryption key. The decrypted key exists only in memory for the duration of the operation and is never logged. The encryption/decryption uses Node.js's crypto module with a secure algorithm. This protects user credentials if the database is compromised. The system supports key rotation by having users re-save their API key, which re-encrypts it with the current encryption key.


Error Handling & Resilience

Q: How does the ensureError utility work?

Answer: The ensureError utility provides consistent error handling by normalizing various error types to standard Error objects. It takes an unknown value and returns an Error. If the input is already an Error instance, it returns it as-is. If it's a string, it creates a new Error with that string as the message. If it's an object with a message property (string), it creates a new Error with that message and copies the stack property if present. If it's any other type, it creates a generic Error with a message derived from JSON.stringify or "An unknown error occurred". This utility is critical because TypeScript's error handling often produces unknown types in catch blocks, and different libraries throw different error shapes. By normalizing everything to Error objects, the codebase can consistently access error.message and error.stack.

Q: What's the graceful shutdown pattern in API server?

Answer: The graceful shutdown pattern ensures the API server shuts down cleanly without dropping in-flight requests. The server registers signal handlers for SIGINT and SIGTERM. When a signal is received, the handler sets isShuttingDown = true and initiates shutdown. It first stops accepting new connections by calling serverInstance.close(), which waits for existing connections to complete (with a timeout). Then it calls shutdownSandboxService() to clean up Docker containers and Redis state. It calls shutdownRedisPubSub() to close Redis pub/sub connections. Finally, it calls process.exit(0) to terminate. The handlers are registered with once: true to prevent double-shutdown. The server also has handlers for uncaughtException and unhandledRejection to prevent the process from crashing unexpectedly. The shutdown has a 15-second timeout after which it forces exit.

Q: How does the worker handle job retries?

Answer: The BullMQ worker handles job retries through built-in BullMQ retry policies and job-level configuration. Each worker (buildWorker and agentRunWorker) is created with options including concurrency and retry settings. BullMQ automatically retries failed jobs based on the job's attempts configuration. For build jobs, the retry policy is configured via job options when enqueuing. The worker also implements custom retry logic - if the build execution fails, it updates the build status to FAILED and publishes the status event. The job itself may be retried by BullMQ if the error is deemed retryable. The worker includes a stalled interval that checks for jobs running too long and marks them as failed. For agent runs, if a run fails mid-execution, it can be resumed from a checkpoint rather than restarted. The worker also has retry for Redis publish operations with exponential backoff.

Q: Why use withTimeout wrapper for operations?

Answer: The withTimeout wrapper prevents operations from hanging indefinitely by adding a timeout to any Promise. It takes a Promise, a timeout duration, and a timeout message. It creates a timeout Promise that rejects after the specified duration using setTimeout. It then races the original Promise against the timeout Promise using Promise.race. If the original completes first, its result is returned. If the timeout fires first, the wrapper rejects with an Error containing the timeout message. The timeout handle is cleared in a finally block to prevent memory leaks. This wrapper is used for operations that could potentially hang: build execution, backup operations, and Redis publish operations. By wrapping these operations, the system ensures a single slow operation doesn't block the worker indefinitely.

Q: How does the stale run reaper work?

Answer: The stale run reaper is a scheduled task that cleans up runs stuck in queued or running state for too long. It runs every 5 minutes using a setInterval that is unref'd (doesn't keep the process alive). The reaper queries the database for runs in ACTIVE_RUN_STATUSES that were created more than a threshold time ago (e.g., 1 hour). For each stale run, it updates the status to FAILED with a termination reason indicating timeout. It also cleans up any associated resources like sandbox state. The reaper is important because runs can get stuck if the worker crashes mid-execution, if there's a database transaction issue, or if the job is lost from the Redis queue. By periodically checking for stale runs and marking them as failed, the system prevents zombie runs from consuming user quota or blocking new runs.


Frontend Architecture

Q: How does the chat stream state management work?

Answer: The chat stream state management uses Zustand stores with chatId-keyed state to manage streaming data for multiple concurrent chats. The main store organizes state by chatId: streams: Record<string, StreamState>. Each StreamState contains streaming text, thinking text, active files, completed files, sandbox status, installing dependencies, commands, web searches, metrics, and errors. The store provides hooks like useChatStreamState() to access the state and useChatStreamActions() for dispatching actions. Actions include APPEND_TEXT, START_FILE, COMPLETE_FILE, SET_SANDBOXING, etc. The store uses optimistic updates - when an event arrives, the state is updated immediately without waiting for server confirmation. The stream processor dispatches these actions as SSE events arrive. Separate stores exist for sandbox state to manage the file editor UI.

Q: Why use RefCell for callback references?

Answer: The RefCell pattern (using a RefCell interface with a current property) avoids triggering React re-renders when callbacks change. In the stream processor, onMetaRef is a RefCell containing a callback function that gets called when META events arrive. If this were stored as regular React state or prop, updating it would trigger a re-render. By using a mutable ref, the callback can be updated without causing re-renders. The thinkingStartRef stores the timestamp when thinking starts, which is used to calculate thinking duration. This value is updated during streaming but doesn't need to trigger a re-render. The RefCell pattern is particularly important in high-frequency streaming scenarios where callbacks might be updated frequently.

Q: How does the aggressive run lookup window work?

Answer: The aggressive run lookup window handles cases where a user sends a message but the agent run hasn't started yet in the worker. There's a constant AGGRESSIVE_ACTIVE_RUN_LOOKUP_WINDOW_MS = 90_000 (90 seconds). When the latest message is from the user and was created within this window, the activeRunLookupMode is set to "aggressive" instead of "single". In aggressive mode, the orchestration hook polls more frequently for active runs to detect when the worker starts processing the run. This handles the delay between the HTTP request returning (after queuing the run) and the worker actually picking up the job. Without this window, the user might see a "no active run" state even though the run is queued. After the window expires, the mode switches to "single" lookup.

Q: What's the stop notice pattern in chat page?

Answer: The stop notice pattern provides visual feedback when a user cancels a run mid-execution. When a user clicks the stop button, the system calls the cancel API and stores a "stop notice" in localStorage via setRunStopNotice. The notice contains the chatId and the userMessageId. In chatPageClient.tsx, the component checks for a stored stop notice using getRunStopNotice. If a notice exists and the latest message is still from the user, the component injects a synthetic assistant message into the messages array with content like "Generation stopped at your request. Send another message when you want me to continue." The message has a special ID format to identify it as synthetic. When the assistant eventually responds, the notice is cleared via clearRunStopNotice. This ensures immediate feedback even if the run is cancelled in the background.

Q: How does the file editor integration work?

Answer: The file editor integration allows users to view and edit files generated by the AI in a Monaco-based editor. The editor uses the Monaco Editor library (VS Code's editor). It receives the current file path and content from the sandbox store state. When a user edits a file, changes are sent to the API via the sandbox write endpoints. The editor supports syntax highlighting for various file extensions based on the file path. It handles file selection - users can click on files in the file tree to open them. The editor integrates with the stream state - when new files are generated during streaming, they appear in the file tree and can be opened. The editor also has read-only mode for files that shouldn't be modified. The Monaco editor is loaded asynchronously to avoid blocking the initial page load.


Planning Workflow

Q: How does the workflow engine handle step transitions?

Answer: The workflow engine implements a state machine with phases: ANALYZE, BUILD, DEPLOY, and RECOVER. The advanceWorkflow function orchestrates transitions. When called, it first checks if the workflow is already COMPLETED or FAILED - if so, it returns. Otherwise, it sets the status to RUNNING and saves the state. It looks up the configuration for the current step from PHASE_CONFIGS, which includes the max retry count. It calls executeStep with the current step and input, wrapped in a withRetry function. If the step succeeds, the engine determines the next step by finding the current step's index in the step order array and moving to the next. If the current step is the last (DEPLOY), it sets the status to COMPLETED. If the step fails and retries are exhausted, it sets the status to FAILED. If the step fails but retries are available, it transitions to RECOVER.

Q: What's the purpose of the RECOVER phase?

Answer: The RECOVER phase provides a fallback mechanism for handling step failures without immediately failing the entire workflow. When a step fails but hasn't exhausted its retry quota, the workflow transitions to RECOVER instead of FAILED. The RECOVER phase can attempt recovery actions like retrying the failed step with different parameters, rolling back partial changes, or skipping non-critical steps. The RECOVER phase is special in the step order - it's not part of the normal progression. When advancing from RECOVER, the engine looks for the last successful step in the history and continues from there. This allows the workflow to resume from a known good state rather than restarting from the beginning. The RECOVER phase is particularly useful for transient failures like network issues or temporary resource unavailability.

Q: How does workflow cancellation work?

Answer: Workflow cancellation handles cleanup when a user cancels or a workflow fails irrecoverably. The cancelWorkflow function first retrieves the workflow state from Redis. If the workflow has a sandboxId, it calls cleanupSandbox to remove the Docker container and clean up Redis state. This is wrapped in a catch block to ensure that even if sandbox cleanup fails, the workflow is still deleted. Then it calls deleteWorkflow to remove the workflow state from Redis. The function returns true if the workflow was found and deleted, false if it didn't exist. Cancellation is important for resource cleanup - abandoned sandboxes would continue consuming resources. The cleanupSandbox function stops and removes the Docker container, deletes the Redis state, and handles errors gracefully.

Q: Why use Redis for workflow persistence?

Answer: Workflow state is persisted in Redis rather than Postgres because workflows are transient and short-lived. Redis provides faster access for the frequent read/write operations during workflow execution. Workflows have a natural TTL - they can be set to expire after a certain time, which Redis handles automatically. Redis operations are atomic, which is important for concurrent access patterns. Workflow state doesn't need complex queries or joins - it's simple key-value access by workflow ID. Using Redis separates workflow state from the primary database, keeping the Postgres schema focused on persistent business data. The workflow state is also accessed by worker processes, which may not have direct database access. Redis's pub/sub capabilities could be used for workflow status updates in the future.


Performance & Optimization

Q: How does token usage optimization work?

Answer: Token usage optimization uses multiple strategies: (1) composePrompt supports different prompt profiles (COMPACT, VERBOSE) that control verbosity, (2) For follow-up conversations, buildConversationMessages can truncate older messages to stay within context limits, (3) Skill compaction merges similar consecutive messages, (4) Pre-verified dependencies from planning workflow avoid including dependency analysis in the prompt, (5) Reserved output tokens are calculated based on model specifications to avoid over-allocation, (6) Context window validation happens before each LLM call. If the limit would be exceeded, the system can truncate history, reduce system prompt verbosity, or emit an error. For vision models, images are resized and compressed to reduce image token count.

Q: Why use connection pooling for Redis?

Answer: Redis connection pooling is handled by the ioredis library, which maintains a pool of connections. A single ioredis client instance is created and reused throughout the application. This client maintains multiple connections internally and manages their lifecycle. Connection pooling reduces the overhead of establishing new TCP connections for each Redis operation. A new connection requires a TCP handshake, authentication, and TLS negotiation, which adds latency. By reusing connections, Redis operations are faster. The pool also limits the maximum number of connections, preventing the application from overwhelming the Redis server. ioredis handles connection failures automatically by reconnecting. This pooling is particularly important for high-frequency Redis operations during streaming and queue operations.

Q: How does the build concurrency work?

Answer: Build concurrency is controlled by separate worker processes with configurable concurrency limits. The queue.worker.ts creates two workers: buildWorker for build jobs and agentRunWorker for agent run jobs. Each worker is configured with a concurrency setting that limits how many jobs of that type can run simultaneously. This prevents the system from being overwhelmed by too many concurrent builds. The workers use BullMQ's built-in concurrency control - when a job is picked up, it's marked as "active" and won't be picked up by another worker until completed. The concurrency limit is per-worker, so if multiple worker processes are running, the total concurrency is the sum of all workers' limits. This design allows horizontal scaling. The build and agent run queues are separate, so builds don't compete with agent runs for worker slots.

Q: What's the purpose of the scheduled flush interval unref?

Answer: The scheduled flush interval uses unref() to prevent the interval from keeping the Node.js process alive. In Node.js, active timers keep the event loop alive, preventing natural process exit. By calling interval.unref(), the interval is marked as not requiring the process to stay alive. This is important for graceful shutdown - when the worker receives a shutdown signal, it can exit without waiting for the next interval tick. The flush interval runs every 250ms to process scheduled sandbox flushes. Without unref, if the worker had no active jobs and tried to shut down, the interval would prevent exit. The unref pattern is used for all background intervals to ensure clean shutdown.

Q: How does the frontend optimize re-renders during streaming?

Answer: The frontend optimizes re-renders through the frame-batched action queue pattern and careful state management. The stream processor batches Redux/Zustand dispatches using requestAnimationFrame, ensuring multiple events arriving in quick succession trigger only one re-render per frame (~16ms). State is structured to minimize re-render scope - chat-specific state is keyed by chatId, so updating one chat doesn't affect others. The file editor state is in a separate store from chat state, so editing files doesn't re-render the chat interface. React.memo is used on components that don't need to re-render when parent state changes. The synthetic stop notice injection uses useMemo to avoid recalculating the messages array. These optimizations are critical because LLM streams can emit hundreds of events per second.