InkdownInkdown
Start writing

Claude-Code

62 files·4 subfolders

Shared Workspace

Claude-Code
codex

16-context-compaction

Shared from "Claude-Code" on Inkdown

Context Compaction & Memory Management

Overview

Claude Code handles massive conversations that can exceed LLM context limits. The compaction system intelligently summarizes old messages to free up tokens while preserving critical information.

Plain text
┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONTEXT WINDOW MANAGEMENT                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐  │
│  │                    CONTEXT WINDOW                                    │  │
│  │                                                                      │  │
│  │  [System Prompt] ............... 500 tokens                        │  │
│  │                                                                      │  │
│  │  [Old Messages] ................ 10,000 tokens                       │  │
│  │    ↓                                                                 │  │
│  │    ┌─────────────────────────────────────────────────────────────┐ │  │
│  │    │ COMPACTED (Summary) ......... 500 tokens                    │ │  │
│  │    │ "Previously discussed: API design, chose REST over GraphQL"   │ │  │
│  │    └─────────────────────────────────────────────────────────────┘ │  │
│  │                                                                      │  │
│  │  [Recent Messages] ............. 2,000 tokens                      │  │
│  │    (Full fidelity, not compacted)                                    │  │
│  │                                                                      │  │
│  │  [Tool Results Pending] ........ 500 tokens                        │  │
│  │                                                                      │  │
│  │  ─────────────────────────────────────────────                       │  │
│  │  TOTAL: ~3,500 / 200,000 tokens (Claude 4.6 limit)                  │  │
│  │                                                                      │  │
│  └─────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐  │
│  │                 COMPACTION TRIGGERS                                  │  │
│  │                                                                      │  │
│  │  1. TOKEN THRESHOLD     > 80% of context window                     │  │
│  │  2. USER COMMAND        /compact                                    │  │
│  │  3. AUTO-COMPACT        Enabled in settings                         │  │
│  │  4. ERROR RECOVERY      prompt_too_long error                       │  │
│  │  5. HISTORY SNIP        Long sessions (HISTORY_SNIP feature)        │  │
│  │                                                                      │  │
│  └─────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
0000_start_here_index_and_recommended_reading_order.md
0100_project_overview_tech_stack_runtime_modes_and_folder_map.md
0200_startup_flow_entry_points_and_cold_start_sequence.md
0300_codebase_modules_layers_state_models_and_schemas.md
0400_system_architecture_and_design_rationale.md
0500_interactive_repl_request_flow_end_to_end.md
0600_headless_sdk_and_print_mode_request_flow_end_to_end.md
0700_mcp_integration_connection_and_tool_call_flow.md
0800_external_services_sdks_storage_and_local_dependencies.md
0900_environment_variables_settings_feature_flags_and_failure_modes.md
1000_non_obvious_patterns_gotchas_and_debugging_traps.md
1100_full_codebase_file_inventory_grouped_by_directory.md
kimi
00-overview.md
01-entrypoints.md
02-state-management.md
03-query-system.md
04-tools-system.md
05-tasks-system.md
06-ui-components.md
07-bridge-remote.md
08-services.md
09-skills-plugins.md
10-commands.md
11-testing-architecture.md
12-permission-system.md
13-build-system.md
14-ink-internals.md
15-git-internals.md
16-context-compaction.md
17-vim-mode.md
18-mailbox-notifications.md
19-session-persistence.md
20-hooks-system.md
21-error-recovery.md
README.md
qwen
00-overview.md
01-entry-points.md
02-query-engine.md
03-tools-and-tasks.md
04-commands-and-skills.md
05-state-management.md
06-ink-rendering.md
07-bridge-remote.md
08-mcp-services.md
09-services-overview.md
10-multi-agent.md
11-system-prompt-constants.md
12-tool-interface.md
13-memory-system.md
14-buddy-companion.md
15-keybindings.md
16-stop-hooks.md
17-vim-mode.md
18-upstreamproxy.md
19-cost-tracking-history.md
20-contexts-styles-onboarding.md
21-hooks.md
22-screens.md
tweets-explain
claude-code-memory-analysis.md
compact
memory-system
agentic-architecture

Core Files

FilePurpose
services/compact/compact.tsCore compaction logic
services/compact/autoCompact.tsAutomatic compaction triggers
services/compact/reactiveCompact.tsReactive/real-time compaction
services/compact/snipCompact.tsHistory snipping (HISTORY_SNIP)
services/compact/compact.tsSummary generation
services/contextCollapse/Context collapse feature
utils/context.tsContext window calculations
utils/tokens.tsToken counting

Compaction Types

1. Manual Compaction (/compact)

User explicitly triggers compaction:

TypeScript
// commands/compact/index.ts
const compactCommand: ActionCommand = {
  type: 'action',
  name: 'compact',
  description: 'Summarize old conversation history',

  async action(args, context) {
    const messages = context.getAppState().messages

    // Find compaction boundary
    const boundary = findCompactBoundary(messages)

    // Generate summary
    const summary = await generateSummary(
      messages.slice(0, boundary)
    )

    // Replace old messages with summary
    const compactedMessages: Message[] = [
      createSystemMessage({
        content: `## Previous Conversation Summary\n\n${summary}`,
      }),
      ...messages.slice(boundary),
    ]

    context.setAppState(prev => ({
      ...prev,
      messages: compactedMessages,
    }))

    return { type: 'success' }
  },
}
2. Auto-Compact

Automatic compaction at thresholds:

TypeScript
// services/compact/autoCompact.ts
export type AutoCompactConfig = {
  enabled: boolean
  thresholdPercent: number  // 0.8 = 80%
  preserveRecent: number    // Keep N most recent messages
}

export function shouldAutoCompact(
  messages: Message[],
  config: AutoCompactConfig,
  modelLimit: number
): boolean {
  if (!config.enabled) return false

  const tokenCount = estimateTokenCount(messages)
  const ratio = tokenCount / modelLimit

  return ratio >= config.thresholdPercent
}

// Background compaction
export async function triggerAutoCompact(
  state: AppState,
  setAppState: SetAppState
): Promise<void> {
  const boundary = findCompactBoundary(state.messages)

  // Show notification
  showNotification('Compacting conversation history...')

  // Generate summary
  const summary = await generateSummary(
    state.messages.slice(0, boundary)
  )

  // Update state
  setAppState(prev => ({
    ...prev,
    messages: buildPostCompactMessages(prev.messages, boundary, summary),
  }))
}
3. Snip (HISTORY_SNIP Feature)

Truncate history for very long sessions:

TypeScript
// services/compact/snipCompact.ts
export function shouldSnipHistory(
  messages: Message[],
  maxMessages: number = 1000
): boolean {
  return messages.length > maxMessages
}

export function snipHistory(
  messages: Message[],
  keepRecent: number = 100
): { messages: Message[]; wasSnipped: boolean } {
  if (messages.length <= keepRecent) {
    return { messages, wasSnipped: false }
  }

  // Keep first message (usually system)
  const systemMessages = messages.filter(m => m.type === 'system')

  // Keep recent messages
  const recentMessages = messages.slice(-keepRecent)

  // Add snip boundary marker
  const snipMarker: SystemMessage = createSystemMessage({
    content: `[${messages.length - keepRecent} earlier messages removed for performance]`,
  })

  return {
    messages: [...systemMessages, snipMarker, ...recentMessages],
    wasSnipped: true,
  }
}

Summary Generation

The Compaction Prompt
TypeScript
// services/compact/compact.ts
export async function generateSummary(
  messages: Message[],
  options: SummaryOptions = {}
): Promise<string> {
  const prompt = buildSummaryPrompt(messages, options)

  // Use a cheaper/faster model for summarization
  const summary = await callCompactModel({
    model: 'claude-haiku-4-5',
    messages: [createUserMessage({ content: prompt })],
    max_tokens: 1000,
  })

  return summary.content
}

function buildSummaryPrompt(messages: Message[], options: SummaryOptions): string {
  return `Summarize this conversation for an AI assistant to continue from.
Include:
- Key decisions made
- Important context (file paths, APIs, approaches chosen)
- Current task state
- Open questions or blockers

Exclude:
- Specific code snippets (keep file paths)
- Full error messages (keep error types)
- Routine tool outputs

Conversation:
${formatMessagesForSummary(messages)}

Provide a concise summary the AI can use to continue the conversation without losing important context.`
}
What to Preserve
TypeScript
// services/compact/preservation.ts
export type PreservationRule = {
  // Always keep these message types
  preserveTypes: ('user' | 'assistant' | 'tool_use' | 'tool_result')[]

  // Keep messages containing these patterns
  preservePatterns: RegExp[]

  // Keep messages with tool uses of these types
  preserveTools: string[]
}

export const DEFAULT_PRESERVATION_RULES: PreservationRule = {
  preserveTypes: ['user'],  // Always keep user messages

  preservePatterns: [
    /TODO|FIXME|BUG/i,           // Action items
    /decided|agreed|chose/i,     // Decisions
    /error|failed|exception/i, // Errors to remember
  ],

  preserveTools: [
    'FileWriteTool',   // File changes
    'FileEditTool',    // File changes
    'BashTool',        // Commands run
    'AgentTool',       // Agent spawns
  ],
}

Boundary Detection

Finding the Cut Point
TypeScript
// services/compact/boundary.ts
export function findCompactBoundary(
  messages: Message[],
  options: BoundaryOptions = {}
): number {
  const { preserveRecent = 10, preserveTurns = 5 } = options

  // Always keep most recent N messages
  if (messages.length <= preserveRecent) {
    return 0
  }

  // Find a good boundary (complete turn)
  let boundary = messages.length - preserveRecent

  // Adjust to complete conversation turn
  // (user message -> assistant response)
  while (boundary > 0) {
    const msg = messages[boundary]

    // Good boundary: user message starting a new turn
    if (msg.type === 'user') {
      break
    }

    // Also good: after a tool result (complete tool cycle)
    if (boundary > 0 &&
        messages[boundary - 1].type === 'tool_result') {
      break
    }

    boundary--
  }

  return boundary
}

Post-Compact Messages

Building the Compacted List
TypeScript
// services/compact/compact.ts
export function buildPostCompactMessages(
  originalMessages: Message[],
  boundary: number,
  summary: string,
  config: CompactConfig = {}
): Message[] {
  // Messages before boundary become summary
  const summaryMessage: SystemMessage = createSystemMessage({
    content: formatSummary(summary),
  })

  // Messages after boundary kept as-is
  const recentMessages = originalMessages.slice(boundary)

  // Add boundary marker
  const boundaryMessage: SystemMessage = createCompactBoundaryMessage({
    originalMessageCount: boundary,
    summaryTokenCount: estimateTokenCount([summaryMessage]),
  })

  return [
    summaryMessage,
    boundaryMessage,
    ...recentMessages,
  ]
}

function formatSummary(summary: string): string {
  return `## Conversation Summary\n\n${summary}\n\n---\n*(Previous messages compacted to save context space)*`
}

Token Calculation

Estimation
TypeScript
// utils/tokens.ts
export function estimateTokenCount(messages: Message[]): number {
  let count = 0

  for (const message of messages) {
    // Base per-message overhead
    count += 4

    // Content tokens (approximate: 4 chars ~= 1 token)
    const content = extractTextContent(message.content)
    count += Math.ceil(content.length / 4)

    // Tool calls add tokens
    if (message.type === 'assistant') {
      for (const block of message.content) {
        if (block.type === 'tool_use') {
          count += estimateToolUseTokens(block)
        }
      }
    }
  }

  return count
}

// More accurate for specific models
export function getContextWindowForModel(model: string): number {
  const contextWindows: Record<string, number> = {
    'claude-opus-4': 200_000,
    'claude-sonnet-4': 200_000,
    'claude-haiku-4': 200_000,
  }

  return contextWindows[model] || 200_000
}

Reactive Compaction

Real-Time Streaming Compaction
TypeScript
// services/compact/reactiveCompact.ts
export type ReactiveCompactState = {
  // In-progress compaction
  status: 'idle' | 'compact_pending' | 'compacting'

  // Messages withheld from SDK during compaction
  withheldMessages: Message[]

  // Original message count for recovery
  originalMessageCount: number
}

// During streaming, hold back max_output_tokens errors
export function isWithheldMaxOutputTokens(
  msg: Message | StreamEvent
): msg is AssistantMessage {
  return msg?.type === 'assistant' && msg.apiError === 'max_output_tokens'
}

// After compaction succeeds, release withheld messages
export function releaseWithheldMessages(
  state: ReactiveCompactState
): Message[] {
  return state.withheldMessages
}

Compaction Recovery

Error Handling
TypeScript
// query.ts - Compaction recovery
async function handleContextLengthError(
  error: APIError,
  messages: Message[],
  attempt: number
): Promise<CompactResult> {
  if (attempt > MAX_COMPACT_RETRIES) {
    throw new Error('Context compaction failed after retries')
  }

  // Try more aggressive compaction
  const aggressiveBoundary = Math.floor(messages.length * 0.5)  // Keep only 50%

  const summary = await generateSummary(
    messages.slice(0, aggressiveBoundary),
    { detail: 'low' }  // Less detailed
  )

  return {
    messages: buildPostCompactMessages(messages, aggressiveBoundary, summary),
    compacted: true,
  }
}

Context Collapse

Emergency Compaction
TypeScript
// services/contextCollapse/index.ts
export async function performContextCollapse(
  messages: Message[]
): Promise<Message[]> {
  // Most aggressive compaction - emergency only

  // Keep only:
  // 1. System prompt
  // 2. Last user message
  // 3. Last 2 turns

  const systemMessages = messages.filter(m => m.type === 'system')
  const lastUserMessage = findLast(messages, m => m.type === 'user')

  const collapsed: Message[] = [
    ...systemMessages,
    createSystemMessage({
      content: `[Context collapsed due to extreme length. ${messages.length} messages removed.]`,
    }),
    lastUserMessage,
  ]

  return collapsed
}

Best Practices

1. Preserve User Messages
TypeScript
// Never compact away the user's question
const userMessages = messages.filter(m => m.type === 'user')
if (userMessages.length <= 1) {
  // Don't compact - user needs context
  return { shouldCompact: false }
}
2. Keep Tool Context
TypeScript
// If last operation was tool execution, keep full context
const lastMessage = messages[messages.length - 1]
if (lastMessage.type === 'tool_use') {
  // Don't compact yet - need tool result
  return { shouldCompact: false }
}
3. Progressive Compaction
TypeScript
// Try gentle first, get more aggressive if needed
const strategies = [
  { preserveRecent: 20 },  // Gentle
  { preserveRecent: 10 },    // Medium
  { preserveRecent: 5 },     // Aggressive
]

for (const strategy of strategies) {
  const result = await tryCompact(messages, strategy)
  if (result.fits) return result
}

Debugging Compaction

Bash
# See token count
claude /context

# Manual compact
claude /compact

# Debug compaction
CLAUDE_CODE_DEBUG_COMPACT=1 claude

# Force aggressive compact
claude /compact --aggressive

Key Concepts

  1. Compaction: Replace old messages with summary
  2. Boundary: Where to cut (complete turns preferred)
  3. Preservation Rules: What must survive compaction
  4. Auto-Compact: Automatic when approaching limit
  5. Snip: Truncate for very long sessions
  6. Reactive: Hold errors during compaction attempt
  7. Context Collapse: Emergency last resort