Interview Questions

13 files4 subfolders

Shared Workspace

Interview Questions

Agentic

interview-questions-part6

Shared from "Interview Questions" on Inkdown

Edward Project - Comprehensive Interview Questions - Part 6

Table of Contents

Scalability & Performance Deep Dive
Edge Cases & Behavioral Questions
Business Logic Questions
Integration Questions
CI/CD & DevOps Questions
Testing Questions
Code Quality Questions
System Design Questions

interview-questions.md

Bonkers

BONKERS_END_TO_END_GUIDE.md

BONKERS_INTERVIEW_QUESTIONS.md

interview_questions.md

PROJECT_WALKTHROUGH_SCRIPT.md

Edward

pookie

interview-questions-part1.md

interview-questions-part2.md

interview-questions-part3.md

interview-questions-part4.md

interview-questions-part5.md

interview-questions-part6.md

interview-questions-part7.md

Scalability & Performance Deep Dive

Q: How does horizontal scaling work for the API servers?

Answer: The API servers are stateless and can be horizontally scaled behind a load balancer. Since authentication state is stored in Better Auth sessions (cookies with database backing), any API server can handle requests for any user. Rate limiting uses Redis, which is shared across all instances, ensuring consistent enforcement. The database connection pool limits are configured per instance, so adding more instances increases total database capacity. Docker socket access for sandbox operations would need to be handled via Docker-in-Docker or a Docker daemon service in a scaled environment. The system is designed to run multiple API server instances behind a load balancer for high availability and increased capacity.

Q: How does the worker scaling work?

Answer: Workers can be horizontally scaled by running multiple instances of the queue.worker.ts process. BullMQ handles job distribution across workers using Redis queues. Each worker has a concurrency limit (BUILD_WORKER_CONCURRENCY, AGENT_RUN_WORKER_CONCURRENCY) that controls how many jobs each worker processes simultaneously. Adding more workers increases total job processing capacity. Workers are independent and don't share state, so they can be scaled based on queue depth. The system could implement auto-scaling based on queue length using Kubernetes HPA or similar. The stale run reaper runs in each worker but uses distributed locking to prevent duplicate processing.

Q: How does database scaling work?

Answer: Database scaling is handled at the Postgres level. The system uses a single Postgres instance with connection pooling. For higher scalability, Postgres read replicas could be added for read-heavy queries (chat history, run lookups). Write operations would still go to the primary. Connection pooling limits the maximum concurrent connections per application instance. The system could implement database sharding by chatId or userId if a single instance becomes a bottleneck. Indexes are optimized for the most common query patterns. JSONB columns provide flexibility without schema changes, reducing migration overhead.

Q: How does cache invalidation work?

Answer: Cache invalidation happens at multiple levels: (1) CloudFront invalidation is triggered after successful builds to clear the CDN cache, (2) The container status cache in Redis has a 5-minute TTL and is explicitly cleared when containers are created or destroyed, (3) The workflow state in Redis has a TTL and is deleted when workflows complete or are cancelled, (4) Rate limit counters in Redis expire based on window duration, (5) The frontend uses React state which is invalidated on page refresh. The system doesn't currently implement application-level caching beyond these mechanisms. For read-heavy data like user settings or chat metadata, a cache layer (Redis or in-memory) could be added.

Q: How does the system handle load balancing?

Answer: Load balancing is handled at the infrastructure level. The API servers can be placed behind a load balancer (AWS ALB, Nginx, etc.) that distributes requests based on round-robin or least-connections algorithms. Since the API is stateless, any server can handle any request. The load balancer should be configured with health checks to route traffic away from unhealthy instances. For WebSocket/SSE connections, sticky sessions may be needed to keep connections on the same server. The workers don't need load balancing - they pull jobs from Redis queues which naturally distribute work.

Q: How does the system handle database connection limits?

Answer: Database connection limits are managed through connection pooling. Each application instance (API server, worker) has its own connection pool configured via the DATABASE_URL. The pool size is controlled by Postgres configuration (max_connections) and the connection string parameters. The system uses a reasonable pool size (typically 10-20 connections per instance). With multiple instances, the total connections stay within Postgres limits. For very high scale, PgBouncer could be added as a connection pooler to reduce connection overhead. The system monitors connection pool usage and could alert if approaching limits.

Q: How does the system handle memory management?

Answer: Memory management is handled at multiple levels: (1) Node.js has a memory limit per process, (2) The parser buffer has a MAX_BUFFER_SIZE to prevent unbounded memory growth, (3) Redis has maxmemory settings for eviction policies, (4) Docker containers have memory limits (though not currently configured), (5) Streaming responses use chunked reading to avoid loading entire responses into memory. The system could benefit from adding memory limits to Docker containers and Node.js processes. The flush scheduler uses batched operations to reduce memory footprint. For large chat histories, the system implements truncation to keep memory usage bounded.

Edge Cases & Behavioral Questions

Q: What happens when the user's API key rate limit is exceeded?

Answer: When the user's LLM API key rate limit is exceeded, the LLM provider returns a 429 error. The system catches this error in the stream session or worker. The run is marked as FAILED with an appropriate error message indicating the rate limit was exceeded. The user sees an error in the UI explaining that their API key has hit the rate limit and they should either wait or upgrade their plan. The system doesn't automatically retry on rate limit errors to avoid exacerbating the issue. Users can retry the request manually after the rate limit window expires.

Q: What happens when the LLM returns malformed or invalid code?

Answer: When the LLM returns malformed code, the build process will fail. The build orchestrator captures the build error (compiler errors, runtime errors) and includes it in the error report. The user sees the build failure message with the error details. The system could implement a post-generation validation step to check for syntax errors before attempting to build, but currently relies on the build process to detect issues. The strict retry mechanism in the stream session can attempt to fix common errors automatically. If the code is fundamentally malformed, the user is informed and can request a regeneration with different instructions.

Q: What happens when the sandbox runs out of disk space?

Answer: When the sandbox runs out of disk space, file write operations will fail. The flush scheduler catches these errors and logs them. The build process will fail if it can't write build artifacts. The system could implement disk space monitoring before builds and clean up old files, but currently doesn't have this check. The user sees an error indicating disk space issues. The sandbox TTL cleanup helps prevent disk space accumulation over time, but concurrent builds on the same sandbox could still exhaust space.

Q: What happens when two users try to edit the same file simultaneously?

Answer: The current system doesn't implement file-level locking or conflict resolution. If two users edit the same file simultaneously, the last write wins. This is acceptable in Edward's use case because chats are typically single-user. For collaboration features, the system would need to implement operational transformation (OT) or CRDTs for real-time collaborative editing. Currently, the file editor is single-user per chat. If collaboration is added, the system would need WebSocket-based synchronization and conflict resolution.

Q: What happens when the user provides contradictory instructions?

Answer: When a user provides contradictory instructions in a follow-up message (e.g., "change the color to red" then "change it to blue" in quick succession), the agent processes them sequentially based on conversation history. The LLM sees the full context and should resolve the contradiction by following the latest instruction. The system doesn't have special logic to detect contradictions - it relies on the LLM's reasoning. If the user sends contradictory messages very quickly, they might be processed in the wrong order depending on timing, but this is rare in practice.

Q: What happens when the user's message is too long for the context window?

Answer: When a user's message is too long, the context window validation before the LLM call will detect this and reject the request with an error message indicating the message is too long. The system could implement message truncation or summarization, but currently rejects long messages outright. This prevents the LLM call from failing with a context limit error. The user is informed of the character limit and asked to shorten their message. This is a safety measure to ensure reliable operation.

Q: What happens when the build output is larger than S3 upload limits?

Answer: When the build output is larger than S3 upload limits (typically 5GB per object), the upload will fail. The system doesn't currently implement chunked uploads or multi-part uploads. The build orchestrator catches upload errors and marks the build as FAILED. The user sees an error indicating the build output is too large. For large applications, the system could implement selective uploading (excluding node_modules) or compression before upload. Currently, the system assumes build outputs are within reasonable limits for typical web applications.

Q: What happens when the user cancels a run during tool execution?

Answer: When the user cancels a run during tool execution, the abort signal is propagated to the worker. The worker checks the abort signal after each tool execution and aborts the run if cancelled. The run status is updated to CANCELLED, and a cancellation event is persisted. Partial results (files written up to cancellation) are preserved in the sandbox. The user sees a cancellation notice in the UI. The system ensures that cancellation is graceful - it doesn't kill the process abruptly but allows current operations to complete before stopping.

Q: What happens when the network disconnects during file upload?

Answer: When the network disconnects during file upload, the upload will fail. The image upload handler catches the error and returns a 500 error to the user. The user can retry the upload. The system doesn't implement resumable uploads or chunked uploads for images. For large images, this could be problematic. The system could add support for chunked uploads using the tus protocol or similar, but currently relies on the network being stable during the upload. The error handling ensures failed uploads don't leave partial state.

Q: What happens when the database connection is lost mid-transaction?

Answer: When the database connection is lost mid-transaction, Drizzle will throw an error. The transaction will be automatically rolled back by Postgres. The system catches this error and treats it as a failure. For critical operations like run admission, this could result in the run not being created. The system could implement retry logic for transient database errors, but currently treats database errors as failures. The connection pool will attempt to reconnect on subsequent operations. The system relies on database high availability (replication, failover) to minimize connection loss.

Business Logic Questions

Q: How does the AI decide what framework to use?

Answer: The AI decides on the framework through the planning workflow's ANALYZE phase. The workflow analyzes the user's request, extracts intent, and suggests a framework. For explicit requests (e.g., "build a Next.js app"), the framework is directly extracted. For implicit requests, the AI infers the framework based on the type of application (SPA, SSR, static site). The suggested framework is included in the workflow context and passed to the agent run. The system can also detect the framework from package.json if the user provides code or files. The framework selection influences the build commands, runtime configuration, and template used.

Q: How does the AI handle user feedback and corrections?

Answer: The AI handles user feedback through the conversational interface. When a user provides feedback (e.g., "that's not right, make it blue instead"), the message is added to the conversation history with the user role. The LLM sees the full context including previous turns and the feedback, and can adjust its approach accordingly. The system doesn't have explicit "undo" or "rollback" functionality - corrections are handled through natural conversation. The agent can also read existing files in the sandbox to understand the current state before making changes.

Q: How does the system determine when to build vs. just generate code?

Answer: The system determines when to build based on the intent from the planning workflow. The workflow can return intents like GENERATE (just generate code), BUILD (generate code and build), or DEPLOY (generate, build, and deploy). The user can also explicitly request a build. The system also automatically triggers builds when files are modified in certain ways or when the user clicks the build button. The build decision is made in the message orchestrator based on the workflow context and user intent. Not all messages trigger builds - some are just conversational or generate code without building.

Q: How does the system handle package dependency resolution?

Answer: Package dependency resolution happens at multiple levels. The planning workflow's ANALYZE phase can recommend packages based on the user's intent. The agent can also request packages during tool execution. The mergeAndInstallDependencies function merges AI-requested packages with existing package.json dependencies, resolving conflicts by keeping the highest version. The system uses pnpm for installation which has efficient dependency resolution. Pre-verified packages from the workflow are passed to the agent to avoid redundant analysis. The system doesn't currently implement semantic version analysis or vulnerability scanning, but could be added.

Q: How does the system handle version control integration?

Answer: Version control integration is handled through the GitHub integration in apps/api/services/githubIntegration/. Users can authorize Edward to access their GitHub repositories via OAuth. The system can push generated code to GitHub repositories. This allows users to continue development outside of Edward. The integration includes operations like creating repositories, pushing commits, and managing branches. The GitHub integration is optional - users can also download code manually. The system uses the octokit package for GitHub API interactions.

Q: How does the system determine when a task is complete?

Answer: The system determines task completion through the agent loop's termination conditions. The loop stops when: (1) No tool results are returned (the LLM indicates completion), (2) A completion signal is detected in the LLM response, (3) The user cancels the run, (4) The context window is exceeded, (5) Max turns are reached. The loop stop reason is tracked and emitted in the session complete event. The system also has a notion of "done" in the prompt - the LLM is instructed to indicate when it has completed the requested task.

Q: How does the system handle multi-file operations?

Answer: Multi-file operations are handled through the file tool in the agent loop. The LLM can generate multiple file operations in a single turn, and the parser extracts each file (FILE_START, FILE_CONTENT, FILE_END events). The file operations are executed sequentially or in parallel depending on the implementation. The sandbox write service handles writing multiple files to the container. The system tracks all generated files in the generatedFiles map. The build process includes all generated files. Multi-file operations are essential for generating complete applications.

Q: How does the system handle code review and quality?

Answer: The system doesn't currently implement automated code review or quality checks. Code quality is enforced through the LLM's training and the system prompts. The system could add linting, type checking, or security scanning as post-generation steps. The strict retry mechanism can fix common errors automatically. Users can review the generated code in the file editor and request changes. The system relies on the LLM's capabilities to generate quality code, but could be enhanced with automated quality gates.

Integration Questions

Q: How does the GitHub integration work?

Answer: The GitHub integration uses OAuth 2.0 for authorization. Users click a button to authorize Edward to access their GitHub repositories. The authorization flow is handled by Better Auth's GitHub provider. Once authorized, the system receives an access token which is stored encrypted in the database. The system uses the octokit library to interact with GitHub's API. Operations include creating repositories, pushing commits, creating branches, and managing webhooks. The integration allows users to push their Edward-generated code to GitHub for further development or deployment.

Q: How does the S3 integration work?

Answer: The S3 integration uses the AWS SDK v3. The system is configured with AWS credentials (access key, secret key, region) and bucket names. The uploadBuildFilesToS3 function iterates through build output files and uploads them to S3 with appropriate content types. The system uses path-based keys (e.g., userId/chatId/preview/file.js) for organization. The cleanupS3FolderExcept function deletes stale files to keep the preview folder clean. S3 is used for storing build artifacts, user-uploaded images, and backup snapshots. The integration also supports CloudFront as a CDN in front of S3.

Q: How does the CloudFront integration work?

Answer: The CloudFront integration provides a CDN for preview hosting. The system uses CloudFront Functions or Lambda@Edge for request rewriting. In subdomain mode, a CloudFront Function rewrites subdomain requests to the S3 path. The system invalidates CloudFront cache after successful builds using the AWS SDK. The CloudFront distribution URL is configured in the environment variables. The integration also supports custom domains via CloudFront alternate domain names. CloudFront provides low-latency access to previews globally and reduces load on the origin S3 buckets.

Q: How does the Cloudflare integration work?

Answer: The Cloudflare integration is used for subdomain-based preview routing. The system uses Cloudflare Workers or CloudFront Functions (depending on deployment) to route subdomain requests. The registerPreviewSubdomain function writes routing rules to Cloudflare KV, mapping subdomains to S3 paths. The integration requires Cloudflare API token, account ID, and KV namespace ID. Cloudflare provides fast edge routing and custom domain support. The system can also use Cloudflare for DDoS protection and SSL termination. This integration is optional and only used in subdomain deployment mode.

Q: How does the web search integration work?

Answer: The web search integration uses the Tavily API for web search capabilities. The system can invoke a web search tool during agent execution to gather information from the web. The search results are fed back to the LLM as tool results. The web search tool is invoked when the LLM determines it needs current information or to verify facts. The search results include snippets, titles, and URLs. The integration requires a Tavily API key configured in the environment variables. This capability allows the agent to provide up-to-date information and verify claims.

Q: How does the URL scraping integration work?

Answer: The URL scraping integration allows the agent to fetch and analyze web pages. When the LLM needs to read a specific URL, it can invoke a URL scraping tool. The system fetches the page content, extracts text, and returns it to the LLM. This is useful for reading documentation, analyzing existing websites, or gathering information from specific URLs. The scraping is done with appropriate user-agent headers and respects robots.txt. The integration handles various content types and encoding.

Q: How does the Better Auth integration work?

Answer: Better Auth is used for authentication and session management. The system uses Better Auth's GitHub OAuth provider for user authentication. Sessions are stored in the database via the Better Auth adapter. The API uses Better Auth's getSession API to validate sessions on each request. Better Auth handles session creation, refresh, and termination. The system also uses Better Auth for API key management (though this is custom). The integration is configured via the BETTER_AUTH_SECRET environment variable. Better Auth provides a clean, type-safe authentication solution.

CI/CD & DevOps Questions

Q: How does the build process work?

Answer: The build process uses Turbo for monorepo builds. The root package.json defines build scripts that Turbo orchestrates. Turbo runs builds in topological order based on package dependencies. The pnpm build command builds all packages and apps. The system uses Turbo's caching to skip unchanged packages. The build output includes TypeScript compilation, bundling, and asset optimization. The CI pipeline runs typecheck, lint, and build in parallel using Turbo. The build process is optimized for fast feedback during development.

Q: How does the deployment pipeline work?

Answer: The deployment pipeline depends on the hosting environment. For self-hosting, the system can be deployed as Docker containers. The API server and workers can be deployed as services. The frontend is deployed as a static site to S3/CloudFront. The system includes health checks for the API server. The deployment process includes: (1) Building the Docker images, (2) Pushing to a registry, (3) Rolling out new containers, (4) Running database migrations, (5) Monitoring for errors. The system could be deployed to Kubernetes, ECS, or traditional VMs depending on requirements.

Q: How does environment management work?

Answer: Environment management uses environment variables for configuration. The system has different configurations for development, production, and test environments. The app.config.ts file centralizes environment variable access with validation. Environment variables are documented in the README. The system uses .env files for local development (gitignored). For production, environment variables are set via the deployment platform (Docker env, Kubernetes secrets, etc.). The system validates required environment variables at startup and fails fast if they're missing.

Q: How does the monitoring setup work?

Answer: Monitoring is implemented through multiple channels: (1) Structured logging with context for operational monitoring, (2) Sentry for error tracking and alerting, (3) Security telemetry for anomaly detection, (4) Custom metrics for run performance, token usage, etc. The logs can be shipped to a log aggregation service (ELK, Datadog, etc.). Sentry provides dashboards for error rates and performance. The system could add application performance monitoring (APM) for deeper insights. Monitoring is critical for operating the system in production.

Q: How does the health check work?

Answer: The API server implements a health check endpoint that can be used by load balancers or orchestrators. The health check verifies: (1) Database connectivity, (2) Redis connectivity, (3) Docker daemon availability. The health check returns a 200 status if all checks pass, 503 otherwise. Workers also implement health checks that verify queue connectivity. These health checks allow automated deployment systems to route traffic away from unhealthy instances and trigger restarts.

Q: How does the log aggregation work?

Answer: Log aggregation is handled through structured logging. The system uses the createLogger function which creates loggers with context (request ID, user ID, etc.). Logs are output to stdout/stderr in JSON format for easy parsing. In production, logs are collected by a log aggregation service (CloudWatch Logs, ELK stack, etc.). The logging includes different levels (info, warn, error, debug) for filtering. Structured logging allows querying logs by specific fields like userId or requestId for debugging.

Testing Questions

Q: How does the testing strategy work?

Answer: The testing strategy includes multiple levels: (1) Unit tests for pure functions and utilities, (2) Integration tests for service layers with database, (3) End-to-end tests for critical user flows, (4) Manual testing for UI/UX. The system uses Jest for unit testing and Playwright for E2E testing. Tests are organized alongside the code they test. The CI pipeline runs tests on every commit. The system could benefit from more comprehensive test coverage, particularly around the agent loop and streaming functionality.

Q: How does the E2E testing work?

Answer: E2E testing uses Playwright to simulate user interactions. Tests can cover flows like: (1) User signs up and authenticates, (2) User sends a chat message, (3) User views generated code in the editor, (4) User triggers a build, (5) User views the preview. E2E tests provide confidence that the system works end-to-end. They are slower than unit tests but catch integration issues that unit tests miss. The system currently has limited E2E test coverage but could be expanded.

Q: How does the API testing work?

Answer: API testing can be done using tools like Postman or automated test frameworks. The API has well-defined request/response schemas that can be tested. Tests can cover: (1) Authentication flows, (2) Rate limiting behavior, (3) Error handling, (4) Streaming endpoints. The system could implement API tests using Supertest or similar. API tests are important for ensuring the HTTP layer works correctly before integrating with the frontend.

Q: How does the load testing work?

Answer: Load testing can be performed using tools like k6 or Artillery. Tests can simulate concurrent users sending messages, triggering builds, and streaming responses. Load testing helps identify bottlenecks in the system (database queries, Redis operations, Docker daemon limits). The system should be load tested before production deployment to ensure it can handle expected traffic. Load testing is particularly important for the streaming endpoints and queue processing.

Q: How does the chaos testing work?

Answer: Chaos testing involves intentionally failing components to test resilience. The system could benefit from chaos testing for: (1) Redis failures, (2) Database failures, (3) Docker daemon failures, (4) LLM provider outages. Chaos testing helps identify single points of failure and ensures graceful degradation. The system has some resilience (retries, fallbacks) but could be enhanced with more comprehensive chaos testing.

Code Quality Questions

Q: How does the linting configuration work?

Answer: Linting uses ESLint with custom rules and configurations. The root ESLint config extends Next.js and TypeScript recommended configs. Custom rules enforce code style and architectural boundaries. The system uses eslint-plugin-import for import ordering. Linting is configured to fail on warnings (max-warnings: 0) to enforce code quality. The CI pipeline runs linting on every commit. The system also uses Prettier for code formatting, though this is optional.

Q: How does the type checking work?

Answer: Type checking uses TypeScript with strict mode enabled. The tsconfig.json extends Next.js base config with custom settings. Type checking is run as part of the build process. The CI pipeline runs typecheck separately for faster feedback. The system uses Drizzle's type generation for database schema types. Type safety is enforced across the codebase, providing confidence in refactoring and catching errors at compile time.

Q: How does the code review process work?

Answer: The code review process is human-driven using pull requests. The CI pipeline runs automated checks (typecheck, lint, build) before allowing merge. Code review guidelines are enforced through the architecture boundary check which prevents wrong-layer imports. The system could benefit from automated code review tools like SonarQube for deeper analysis. Code review focuses on architectural correctness, security, and maintainability.

Q: How does the documentation work?

Answer: Documentation is maintained in README files in each package and module. The root README provides an overview of the project. Service modules have README files explaining their purpose and public API. Code comments explain complex logic. The system uses JSDoc for function documentation. The documentation is kept up-to-date alongside code changes. Good documentation is critical for onboarding and maintenance.

System Design Questions

Q: How would you design Edward to support real-time collaboration?

Answer: To support real-time collaboration, I would add: (1) WebSocket or WebRTC connections for real-time file editing, (2) Operational Transformation (OT) or CRDTs for conflict resolution, (3) Presence indicators showing which users are viewing/editing files, (4) Real-time chat between collaborators, (5) Permission system for read/write access. The architecture would need a signaling server for WebRTC coordination. The file editor would need to be enhanced for collaborative editing. The database would need to track collaborators and permissions.

Q: How would you design Edward to support multiple deployment targets?

Answer: To support multiple deployment targets, I would add: (1) Adapter pattern for different deployment platforms (Vercel, Netlify, AWS Amplify, custom servers), (2) Configuration per deployment target in the workflow context, (3) Platform-specific build commands and runtime configurations, (4) Deployment status tracking per target, (5) Rollback capability per target. The system could use the planning workflow to determine the best deployment target based on user requirements.

Q: How would you design Edward to support mobile app development?

Answer: To support mobile app development, I would add: (1) React Native templates and build configurations, (2) Mobile-specific UI generation patterns, (3) Native module integration guidance, (4) App store deployment workflows, (5) Device preview capabilities. The system would need mobile simulators or device emulators for preview. The LLM prompts would need to be trained on mobile development patterns. The build pipeline would need to support iOS and Android builds.

Q: How would you design Edward to support database-backed applications?

Answer: To support database-backed applications, I would add: (1) Database schema generation capabilities, (2) ORM integration (Prisma, TypeORM) in generated code, (3) Migration generation and execution, (4) Seed data capabilities, (5) Database connection configuration in the sandbox. The system would need to support multiple database types (Postgres, MySQL, SQLite). The LLM would need to understand database design patterns. The preview environment would need database provisioning.

Q: How would you design Edward to support API development?

Answer: To support API development, I would add: (1) API route generation capabilities, (2) OpenAPI/Swagger specification generation, (3) API documentation auto-generation, (4) Mock server capabilities for testing, (5) API client SDK generation. The system would need to understand REST and GraphQL patterns. The LLM prompts would need to be trained on API design best practices. The preview environment would need to support API testing tools.

Q: How would you design Edward to support microservices architectures?

Answer: To support microservices architectures, I would add: (1) Multi-container project generation, (2) Docker Compose configuration generation, (3) Service discovery patterns, (4) Inter-service communication patterns, (5) Distributed tracing integration. The system would need to understand microservices patterns and best practices. The build pipeline would need to support multi-container builds and deployments. The preview environment would need to support multi-container orchestration.

Q: How would you design Edward to support offline development?

Answer: To support offline development, I would add: (1) Local LLM integration (Ollama, local models), (2) Offline mode detection and graceful degradation, (3) Local caching of chat history and generated code, (4) Sync capabilities when back online, (5) Conflict resolution for offline changes. The system would need to handle network detection and queue operations locally. The architecture would need to support both online and offline modes seamlessly.