InkdownInkdown
Start writing

Claude-Code

62 files·4 subfolders

Shared Workspace

Claude-Code
codex

02-query-engine

Shared from "Claude-Code" on Inkdown

Query Engine & API Layer

The heart of Claude Code — how it talks to the AI model, streams responses, executes tools, and manages context.


The Query Loop (query.ts)

Location: src/query.ts (1500+ lines)

This is the most important file in the codebase. Every AI interaction goes through this loop.

The Loop, Simplified
TypeScript
0000_start_here_index_and_recommended_reading_order.md
0100_project_overview_tech_stack_runtime_modes_and_folder_map.md
0200_startup_flow_entry_points_and_cold_start_sequence.md
0300_codebase_modules_layers_state_models_and_schemas.md
0400_system_architecture_and_design_rationale.md
0500_interactive_repl_request_flow_end_to_end.md
0600_headless_sdk_and_print_mode_request_flow_end_to_end.md
0700_mcp_integration_connection_and_tool_call_flow.md
0800_external_services_sdks_storage_and_local_dependencies.md
0900_environment_variables_settings_feature_flags_and_failure_modes.md
1000_non_obvious_patterns_gotchas_and_debugging_traps.md
1100_full_codebase_file_inventory_grouped_by_directory.md
kimi
00-overview.md
01-entrypoints.md
02-state-management.md
03-query-system.md
04-tools-system.md
05-tasks-system.md
06-ui-components.md
07-bridge-remote.md
08-services.md
09-skills-plugins.md
10-commands.md
11-testing-architecture.md
12-permission-system.md
13-build-system.md
14-ink-internals.md
15-git-internals.md
16-context-compaction.md
17-vim-mode.md
18-mailbox-notifications.md
19-session-persistence.md
20-hooks-system.md
21-error-recovery.md
README.md
qwen
00-overview.md
01-entry-points.md
02-query-engine.md
03-tools-and-tasks.md
04-commands-and-skills.md
05-state-management.md
06-ink-rendering.md
07-bridge-remote.md
08-mcp-services.md
09-services-overview.md
10-multi-agent.md
11-system-prompt-constants.md
12-tool-interface.md
13-memory-system.md
14-buddy-companion.md
15-keybindings.md
16-stop-hooks.md
17-vim-mode.md
18-upstreamproxy.md
19-cost-tracking-history.md
20-contexts-styles-onboarding.md
21-hooks.md
22-screens.md
tweets-explain
claude-code-memory-analysis.md
compact
memory-system
agentic-architecture
async function* queryLoop(params): AsyncGenerator<..., Terminal> {
  while (true) {
    // STEP 1: Prepare messages
    messages = applyToolResultBudget(messages)     // Cap tool result sizes
    messages = snipCompactIfNeeded(messages)       // Remove old messages (snip)
    messages = microcompact(messages)              // Remove redundant tool blocks
    messages = applyCollapses(messages)            // Context collapse
    messages = autocompact(messages)               // Summarize if too large

    // STEP 2: Call the model
    for await (const message of callModel({ messages, tools, ... })) {
      yield message  // Stream to UI

      if (message has tool_use blocks) {
        streamingToolExecutor.addTool(toolBlock)
        needsFollowUp = true
      }
    }

    // STEP 3: Execute tools
    for await (const result of streamingToolExecutor.getCompletedResults()) {
      yield result.message
      toolResults.push(result)
    }

    // STEP 4: Decide what to do next
    if (aborted) return { reason: 'aborted_streaming' }

    if (!needsFollowUp) {
      // No more tools — we're done
      // But first: check for recoverable errors
      if (prompt_too_long) {
        try collapse drain → try reactive compact → surface error
      }
      if (max_output_tokens) {
        inject recovery message → retry (up to 3 times)
      }
      // Run stop hooks (e.g., "should I dream?", "PR ready?")
      if (stopHookRetried) continue
      return { reason: 'done' }
    }

    // STEP 5: Loop — send tool results back to the model
    state = { messages: [...messages, ...assistantMessages, ...toolResults], ... }
    continue
  }
}
Key Concepts
Token Budget

Each query tracks token usage. If context approaches the model's limit, the loop triggers compaction before the API call.

Streaming Tool Execution

Tools execute concurrently while the model is still streaming. The StreamingToolExecutor:

  1. Queues tool calls as they arrive in the stream
  2. Executes them in parallel (respecting concurrency limits)
  3. Yields results as they complete
  4. Generates synthetic tool_result blocks for aborted tools
Recovery Mechanisms
ErrorRecovery Strategy
Prompt too long (413)1. Drain staged context collapses
2. Reactive compact (summarize)
3. Surface error if both fail
Max output tokens1. Escalate to 64k output (once)
2. Inject "resume mid-thought" message
3. Retry up to 3 times
Model fallbackSwitch to fallback model, strip thinking signatures, retry
Media size errorReactive compact strips oversized media, retries

Context Management

Compaction Strategies

The system has four context reduction strategies, applied in order:

1. Snip (snipCompact.ts)
  • Removes the oldest messages from the conversation
  • Token-based: triggers when messages exceed a threshold
  • Fast — no API call needed
2. Micro-Compact
  • Removes redundant tool_use/tool_result pairs
  • Keeps only the essential conversation flow
  • Can use cached edits for efficiency
3. Context Collapse
  • Archives old conversation segments
  • Replaces them with summaries stored in a collapse store
  • Persists across turns (unlike compaction which is per-turn)
  • projectView() replays the commit log on every entry
4. Auto-Compact
  • The heavyweight strategy — uses an AI model to summarize old messages
  • Triggers when context approaches the model's limit
  • Sends old messages to a compact model (usually Haiku) for summarization
  • The summary replaces the original messages
  • Tracks consecutive failures (circuit breaker)
5. Reactive Compact (feature flag)
  • Triggered reactively when the API returns a 413 (prompt too long)
  • Strips oversized content and retries
  • Handles media size errors by removing large attachments

API Layer (services/api/)

Claude API Client (claude.ts)

The main API client that talks to Anthropic's API:

Plain text
callModel({
  messages,
  systemPrompt,
  tools,
  thinkingConfig,
  model,
  signal,
  options: {
    onStreamingFallback,
    querySource,
    agents,
    mcpTools,
    taskBudget,
    ...
  }
})
Streaming

The API uses server-sent events (SSE) streaming:

Plain text
data: {"type": "message_start", "message": {...}}
data: {"type": "content_block_start", "content_block": {"type": "text", "text": ""}}
data: {"type": "content_block_delta", "delta": {"type": "text_delta", "text": "H"}}
data: {"type": "content_block_delta", "delta": {"type": "text_delta", "text": "ello"}}
data: {"type": "content_block_stop"}
data: {"type": "message_delta", "usage": {...}}
data: {"type": "message_stop"}

Each event is parsed and yielded as a StreamEvent or Message.

Retry & Fallback
  • Retry: Uses exponential backoff for transient errors (rate limits, 5xx)
  • Fallback: If the primary model is overloaded, switches to a fallback model
    • Strips thinking blocks (signatures are model-specific)
    • Retries the entire request with the fallback model
    • Logs tengu_model_fallback_triggered event
Prompt Caching

The system optimizes for Anthropic's prompt cache:

  • 1-hour cache allowlist: Certain prompts are eligible for extended caching
  • Sticky-on latches: Once a beta header (fast mode, AFK mode, cache editing, thinking) is enabled, it stays on for the session to avoid busting the cache
  • Tool ordering: Tools are sorted by name for cache stability
  • Content-hash-based temp paths: Settings files use content hashes instead of UUIDs to keep tool descriptions stable

Tool Execution (services/tools/)

StreamingToolExecutor

The orchestrator for parallel tool execution:

Plain text
addTool(toolBlock, assistantMessage)
  → Queue the tool call
  → Start execution (async)

getCompletedResults()
  → Return all finished tool results
  → Generate synthetic results for aborted tools

getRemainingResults()
  → Wait for all in-flight tools to complete
  → Called on abort to ensure every tool_use has a tool_result
Tool Permission Flow
Plain text
Model calls tool
    │
    ▼
canUseTool(toolName, input)
    │
    ├─ Check deny rules (settings-based)
    │   └─ If denied → return false (tool not available to model)
    │
    ├─ Check auto-allow rules
    │   └─ If auto-allowed → return true
    │
    ├─ Check if user previously approved this tool
    │   └─ If yes → return true
    │
    └─ Prompt user for permission
        ├─ User approves → return true (remember for session)
        ├─ User denies → return false
        └─ User approves always → return true (save to settings)

QueryEngine.ts

Location: src/QueryEngine.ts

A higher-level wrapper around the query loop that:

  1. Manages the message history
  2. Handles user input processing
  3. Coordinates with the UI layer
  4. Manages session lifecycle (clear, resume, compact)

Query Configuration (query/config.ts)

Builds a snapshot of immutable config for each query:

  • Feature gates (evaluated once, not per-iteration)
  • Session ID
  • Model configuration
  • Permission mode
  • Client type

This avoids re-evaluating feature flags on every loop iteration.


Token Estimation (services/tokenEstimation.ts)

Estimates token counts for:

  • Messages (before sending to API)
  • Tool definitions (in system prompt)
  • Context window usage

Used to decide when to trigger compaction.


Key Files Reference

FilePurpose
src/query.tsThe query loop — model calls, tool execution, recovery
src/QueryEngine.tsHigh-level query orchestration
src/query/config.tsQuery configuration snapshot
src/query/deps.tsDependency injection for query (callModel, compact, etc.)
src/query/transitions.tsContinue/terminal state types
src/query/tokenBudget.tsPer-turn token budget tracking
src/query/stopHooks.tsPost-turn hooks (dream, PR review, etc.)
src/services/api/claude.tsAnthropic API client, streaming
src/services/api/bootstrap.tsBootstrap data fetching
src/services/api/filesApi.tsFile download/upload API
src/services/api/withRetry.tsRetry logic with exponential backoff
src/services/tools/toolOrchestration.tsTool execution orchestration
src/services/tools/StreamingToolExecutor.tsParallel tool execution
src/services/compact/compact.tsAuto-compaction
src/services/compact/autoCompact.tsAuto-compact trigger logic
src/services/compact/microcompact.tsMicro-compaction
src/services/compact/reactiveCompact.tsReactive compact (on 413)
src/services/compact/snipCompact.tsSnip compact (remove old messages)
src/services/contextCollapse/Context collapse service
src/services/toolUseSummary/Tool use summary generation