InkdownInkdown
Start writing

Claude-Code

62 files·4 subfolders

Shared Workspace

Claude-Code
codex

11-testing-architecture

Shared from "Claude-Code" on Inkdown

Testing Architecture

Overview

Claude Code has a multi-layered testing strategy that balances unit tests, integration tests, and "VCR" recorded API interactions. Given the complexity of LLM interactions, traditional mocking isn't enough - we need to record and replay actual API responses.

Plain text
┌─────────────────────────────────────────────────────────────────────────────┐
│                      TESTING PYRAMID                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│                           ┌─────────┐                                      │
│                           │   E2E   │  Full CLI interactions               │
│                           │  Tests  │  (expensive, full API calls)          │
│                           └────┬────┘                                      │
│                                │                                           │
│                          ┌─────▼─────┐                                     │
│                          │   VCR     │  Recorded API interactions         │
│                          │  Tests    │  (deterministic, fast)             │
│                          └─────┬─────┘                                     │
│                                │                                           │
│                     ┌──────────▼──────────┐                               │
│                     │   Integration       │  Services, tools with mocks     │
│                     │      Tests          │                               │
│                     └──────────┬──────────┘                               │
│                                │                                           │
│              ┌─────────────────▼─────────────────┐                         │
│              │           Unit Tests            │  Pure functions, utils    │
│              │        (jest/bun test)          │                               │
│              └─────────────────────────────────┘                         │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
0000_start_here_index_and_recommended_reading_order.md
0100_project_overview_tech_stack_runtime_modes_and_folder_map.md
0200_startup_flow_entry_points_and_cold_start_sequence.md
0300_codebase_modules_layers_state_models_and_schemas.md
0400_system_architecture_and_design_rationale.md
0500_interactive_repl_request_flow_end_to_end.md
0600_headless_sdk_and_print_mode_request_flow_end_to_end.md
0700_mcp_integration_connection_and_tool_call_flow.md
0800_external_services_sdks_storage_and_local_dependencies.md
0900_environment_variables_settings_feature_flags_and_failure_modes.md
1000_non_obvious_patterns_gotchas_and_debugging_traps.md
1100_full_codebase_file_inventory_grouped_by_directory.md
kimi
00-overview.md
01-entrypoints.md
02-state-management.md
03-query-system.md
04-tools-system.md
05-tasks-system.md
06-ui-components.md
07-bridge-remote.md
08-services.md
09-skills-plugins.md
10-commands.md
11-testing-architecture.md
12-permission-system.md
13-build-system.md
14-ink-internals.md
15-git-internals.md
16-context-compaction.md
17-vim-mode.md
18-mailbox-notifications.md
19-session-persistence.md
20-hooks-system.md
21-error-recovery.md
README.md
qwen
00-overview.md
01-entry-points.md
02-query-engine.md
03-tools-and-tasks.md
04-commands-and-skills.md
05-state-management.md
06-ink-rendering.md
07-bridge-remote.md
08-mcp-services.md
09-services-overview.md
10-multi-agent.md
11-system-prompt-constants.md
12-tool-interface.md
13-memory-system.md
14-buddy-companion.md
15-keybindings.md
16-stop-hooks.md
17-vim-mode.md
18-upstreamproxy.md
19-cost-tracking-history.md
20-contexts-styles-onboarding.md
21-hooks.md
22-screens.md
tweets-explain
claude-code-memory-analysis.md
compact
memory-system
agentic-architecture

Core Testing Files

Directory/FilePurpose
tests/Test utilities, mocks, setup
tests/mocks/Mock implementations
tests/fixtures/VCR recordings, test data
services/vcr.tsVCR recording/replay
utils/testing/Test helpers
**/__tests__/*.test.tsUnit tests (co-located)

VCR Testing System

VCR (Video Cassette Recorder) captures real API responses and replays them in tests.

How VCR Works
Plain text
┌─────────────────────────────────────────────────────────────────┐
│                     VCR FLOW                                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  RECORD MODE                                                    │
│  ────────────                                                   │
│                                                                 │
│  Test calls API  ──►  Real API call  ──►  Save response         │
│                         │                    │                  │
│                         ▼                    ▼                  │
│                    Actual tokens       To fixtures/             │
│                    (costs $)            test-name.json          │
│                                                                 │
│  REPLAY MODE                                                    │
│  ────────────                                                   │
│                                                                 │
│  Test calls API  ──►  Match request  ──►  Return recorded       │
│                         │                   response            │
│                         ▼                                       │
│                    Hash of:            No API call              │
│                    - endpoint          Deterministic            │
│                    - body              Fast                      │
│                    - headers                                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
VCR Implementation
TypeScript
// services/vcr.ts
const VCR_MODE = process.env.VCR_MODE || 'replay'  // 'record' | 'replay' | 'off'

const recordings = new Map<string, Recording>()

export async function withVCR<T>(
  cassetteName: string,
  fn: () => Promise<T>
): Promise<T> {
  const fixturePath = `tests/fixtures/${cassetteName}.json`

  // Load existing recordings
  if (VCR_MODE !== 'record' && existsSync(fixturePath)) {
    loadRecordings(fixturePath)
  }

  // Intercept fetch
  const originalFetch = global.fetch
  global.fetch = vcrFetch

  try {
    const result = await fn()

    // Save new recordings
    if (VCR_MODE === 'record') {
      saveRecordings(fixturePath)
    }

    return result
  } finally {
    global.fetch = originalFetch
  }
}

async function vcrFetch(input: RequestInfo, init?: RequestInit): Promise<Response> {
  const key = hashRequest(input, init)

  // Replay mode: return recorded
  if (VCR_MODE === 'replay') {
    const recording = recordings.get(key)
    if (!recording) {
      throw new Error(`No recording for: ${key}`)
    }
    return new Response(recording.body, {
      status: recording.status,
      headers: recording.headers,
    })
  }

  // Record mode: make real call, save response
  const response = await originalFetch(input, init)
  const body = await response.clone().text()

  recordings.set(key, {
    request: key,
    status: response.status,
    headers: Object.fromEntries(response.headers),
    body,
    timestamp: Date.now(),
  })

  return response
}
Using VCR in Tests
TypeScript
// query.test.ts
import { withVCR } from '../services/vcr.js'

describe('query', () => {
  test('processes tool use correctly', async () => {
    await withVCR('tool-use-flow', async () => {
      // This will replay recorded API responses
      const result = await query({
        messages: [createUserMessage('Run ls')],
        systemPrompt: getSystemPrompt(),
      })

      expect(result.messages).toHaveLength(3)  // user, assistant(tool), result
    })
  })

  test('handles rate limits', async () => {
    await withVCR('rate-limit-retry', async () => {
      // Recorded once with real 429, replays deterministically
      const result = await query({ messages: [testMessage] })

      expect(result.retryCount).toBeGreaterThan(0)
    })
  })
})
VCR Fixtures
JSON
// tests/fixtures/tool-use-flow.json
{
  "recordings": [
    {
      "request": "POST https://api.anthropic.com/v1/messages\n{\"model\":\"claude-opus-4\"...}",
      "response": {
        "status": 200,
        "headers": { "content-type": "application/json" },
        "body": "{\"id\":\"msg_123\",\"content\":[{\"type\":\"tool_use\",\"name\":\"BashTool\"...}]}"
      },
      "timestamp": "2024-01-15T10:30:00Z"
    },
    {
      "request": "POST https://api.anthropic.com/v1/messages\n{\"messages\":[{\"role\":\"user\",\"content\":[{\"type\":\"tool_result\"...}]}",
      "response": {
        "status": 200,
        "headers": { "content-type": "application/json" },
        "body": "{\"id\":\"msg_124\",\"content\":[{\"type\":\"text\",\"text\":\"I found...\"}]}"
      }
    }
  ],
  "meta": {
    "created_at": "2024-01-15T10:30:00Z",
    "version": "1.0"
  }
}

Mock System

Service Mocks
TypeScript
// tests/mocks/services.ts
export function mockAnthropicAPI(options: MockOptions = {}): MockAPI {
  return {
    streamMessages: jest.fn(async function* () {
      if (options.error) {
        throw options.error
      }

      // Yield mock events
      yield { type: 'message_start', message: { id: 'mock-123' } }
      yield { type: 'content_block_start', content_block: { type: 'text' } }
      yield { type: 'content_block_delta', delta: { type: 'text_delta', text: 'Mock ' } }
      yield { type: 'content_block_delta', delta: { type: 'text_delta', text: 'response' } }
      yield { type: 'content_block_stop' }
      yield { type: 'message_stop' }
    }),

    getUsage: jest.fn(() => ({
      input_tokens: 100,
      output_tokens: 50,
    })),
  }
}

export function mockGrowthBook(features: Record<string, boolean> = {}): MockGrowthBook {
  return {
    isOn: (key: string) => features[key] ?? false,
    getFeatureValue: (key: string, defaultValue: any) =>
      features[key] ?? defaultValue,
  }
}
Tool Mocks
TypeScript
// tests/mocks/tools.ts
export function mockBashTool(): MockTool {
  return {
    name: 'Bash',
    execute: jest.fn(async (input) => {
      const { command } = input as BashInput

      // Mock common commands
      if (command.includes('git status')) {
        return {
          content: 'On branch main\nnothing to commit',
          is_error: false,
        }
      }

      if (command.includes('ls')) {
        return {
          content: 'file1.ts\nfile2.ts\nnode_modules/',
          is_error: false,
        }
      }

      // Default
      return {
        content: `Mock output for: ${command}`,
        is_error: false,
      }
    }),
  }
}
Context Mocks
TypeScript
// tests/mocks/context.ts
export function createMockToolContext(
  overrides: Partial<ToolUseContext> = {}
): ToolUseContext {
  const mockSetAppState = jest.fn((updater) => {
    // Simple mock that returns updated state
    return updater(mockAppState)
  })

  return {
    getAppState: jest.fn(() => mockAppState),
    setAppState: mockSetAppState,
    abortController: new AbortController(),
    readFileState: new Map(),
    options: {
      commands: [],
      debug: false,
      mainLoopModel: 'claude-sonnet-4-6',
      tools: [],
      verbose: false,
      thinkingConfig: { type: 'disabled' },
      mcpClients: [],
      mcpResources: {},
      isNonInteractiveSession: false,
      agentDefinitions: { agents: [], errors: [] },
    },
    ...overrides,
  }
}

Test Patterns

Unit Test Pattern
TypeScript
// utils/__tests__/stringUtils.test.ts
import { describe, test, expect } from 'bun:test'
import { plural, truncate, stripAnsi } from '../stringUtils.js'

describe('stringUtils', () => {
  describe('plural', () => {
    test('returns singular for count of 1', () => {
      expect(plural(1, 'file', 'files')).toBe('file')
    })

    test('returns plural for count > 1', () => {
      expect(plural(5, 'file', 'files')).toBe('files')
    })

    test('returns plural for count of 0', () => {
      expect(plural(0, 'file', 'files')).toBe('files')
    })
  })

  describe('truncate', () => {
    test('does not truncate short strings', () => {
      expect(truncate('hello', 10)).toBe('hello')
    })

    test('truncates long strings with ellipsis', () => {
      expect(truncate('hello world', 8)).toBe('hello...')
    })
  })
})
Integration Test Pattern
TypeScript
// tools/__tests__/BashTool.test.ts
import { describe, test, expect, beforeEach } from 'bun:test'
import { BashTool } from '../BashTool/BashTool.js'
import { createMockToolContext } from '../../tests/mocks/context.js'

describe('BashTool', () => {
  let context: ToolUseContext

  beforeEach(() => {
    context = createMockToolContext()
  })

  test('executes simple command', async () => {
    const result = await BashTool.execute(
      { command: 'echo hello' },
      context
    )

    expect(result.content).toBe('hello\n')
    expect(result.is_error).toBe(false)
  })

  test('captures exit code on failure', async () => {
    const result = await BashTool.execute(
      { command: 'exit 1' },
      context
    )

    expect(result.is_error).toBe(true)
    expect(result.content).toContain('exit code 1')
  })

  test('respects timeout', async () => {
    const result = await BashTool.execute(
      { command: 'sleep 10', timeout: 100 },
      context
    )

    expect(result.is_error).toBe(true)
    expect(result.content).toContain('timeout')
  })
})
Query Engine Test with VCR
TypeScript
// QueryEngine.test.ts
import { describe, test, expect } from 'bun:test'
import { QueryEngine } from '../QueryEngine.js'
import { withVCR } from '../services/vcr.js'

describe('QueryEngine', () => {
  test('completes single-turn conversation', async () => {
    await withVCR('single-turn', async () => {
      const engine = new QueryEngine({
        cwd: '/test',
        tools: [BashTool, FileReadTool],
        commands: [],
        mcpClients: [],
        agents: [],
        canUseTool: () => ({ allowed: true }),
        getAppState: () => mockAppState,
        setAppState: mockSetAppState,
      })

      const events: QueryEvent[] = []

      for await (const event of engine.submitMessage(
        createUserMessage('Say hello')
      )) {
        events.push(event)
      }

      // Should get text response
      const textEvents = events.filter(e => e.type === 'text')
      expect(textEvents.length).toBeGreaterThan(0)
      expect(textEvents[0].text).toContain('hello')
    })
  })

  test('handles multi-turn with tool use', async () => {
    await withVCR('multi-turn-tool', async () => {
      const engine = new QueryEngine({ ... })

      const events: QueryEvent[] = []
      for await (const event of engine.submitMessage(
        createUserMessage('Run ls')
      )) {
        events.push(event)
      }

      // Should see: user -> assistant(tool_use) -> tool_result -> assistant
      const types = events.map(e => e.type)
      expect(types).toContain('tool_use')
      expect(types).toContain('tool_result')
      expect(types).toContain('assistant')
    })
  })
})

E2E Testing

CLI E2E Tests
TypeScript
// tests/e2e/cli.test.ts
import { describe, test, expect, beforeAll, afterAll } from 'bun:test'
import { spawn } from 'child_process'

describe('CLI E2E', () => {
  let claudeProcess: ChildProcess

  beforeAll(async () => {
    // Start Claude Code in test mode
    claudeProcess = spawn('bun', ['run', 'src/entrypoints/cli.tsx', '--test-mode'], {
      cwd: TEST_REPO,
      env: { ...process.env, CLAUDE_CODE_TEST: '1' },
    })

    await waitForReady(claudeProcess)
  })

  afterAll(() => {
    claudeProcess.kill()
  })

  test('initializes and shows prompt', async () => {
    const output = await sendInput(claudeProcess, '/help\n')

    expect(output).toContain('Available Commands')
    expect(output).toContain('/commit')
  })

  test('executes bash tool', async () => {
    const output = await sendInput(claudeProcess, 'Run ls\n')

    expect(output).toContain('$ ls')
    expect(output).toContain('package.json')
  })
})

Snapshot Testing

TypeScript
// Message rendering tests
import { describe, test, expect } from 'bun:test'

describe('Message rendering', () => {
  test('renders assistant message', () => {
    const message: AssistantMessage = {
      type: 'assistant',
      content: [{ type: 'text', text: 'Hello' }],
      uuid: 'test-123',
    }

    const output = renderToString(<AssistantMessageView message={message} />)

    expect(output).toMatchSnapshot()
  })

  test('renders tool use', () => {
    const toolUse: ToolUseBlock = {
      type: 'tool_use',
      name: 'Bash',
      id: 'tool-123',
      input: { command: 'ls' },
    }

    const output = renderToString(<ToolUseView toolUse={toolUse} />)

    expect(output).toMatchSnapshot()
  })
})

Test Data Builders

TypeScript
// tests/builders.ts
export function buildUserMessage(overrides: Partial<UserMessage> = {}): UserMessage {
  return {
    type: 'user',
    content: [{ type: 'text', text: 'Hello' }],
    uuid: generateUUID(),
    created_at: Date.now(),
    ...overrides,
  }
}

export function buildAssistantMessage(
  overrides: Partial<AssistantMessage> = {}
): AssistantMessage {
  return {
    type: 'assistant',
    content: [{ type: 'text', text: 'Hello back' }],
    uuid: generateUUID(),
    created_at: Date.now(),
    ...overrides,
  }
}

export function buildToolUse(
  name: string = 'Bash',
  input: Record<string, unknown> = { command: 'ls' }
): ToolUseBlock {
  return {
    type: 'tool_use',
    name,
    id: `tool-${generateUUID()}`,
    input,
  }
}

export function buildAppState(overrides: Partial<AppState> = {}): AppState {
  return {
    ...getDefaultAppState(),
    messages: [],
    tasks: {},
    ...overrides,
  }
}

Running Tests

Bash
# All tests
bun test

# Specific file
bun test tools/BashTool.test.ts

# With coverage
bun test --coverage

# Watch mode
bun test --watch

# VCR record mode (updates fixtures)
VCR_MODE=record bun test

# VCR replay mode (default)
VCR_MODE=replay bun test

Test Isolation

Per-Test State Reset
TypeScript
// tests/setup.ts
import { beforeEach } from 'bun:test'
import { resetGlobalState } from './stateManager.js'

beforeEach(() => {
  // Reset all global state between tests
  resetGlobalState()

  // Clear mocks
  jest.clearAllMocks()

  // Reset VCR recordings
  clearVCRRecordings()
})
Test Database/Files
TypeScript
// Use temporary directories
import { mkdtemp, rm } from 'fs/promises'
import { tmpdir } from 'os'
import { join } from 'path'

let tempDir: string

beforeEach(async () => {
  tempDir = await mkdtemp(join(tmpdir(), 'claude-test-'))
})

afterEach(async () => {
  await rm(tempDir, { recursive: true, force: true })
})

Performance Testing

TypeScript
// tests/performance/startup.test.ts
import { describe, test, expect } from 'bun:test'
import { performance } from 'perf_hooks'

describe('Startup Performance', () => {
  test('initializes in under 2 seconds', async () => {
    const start = performance.now()

    await import('../../src/main.js')

    const duration = performance.now() - start
    expect(duration).toBeLessThan(2000)
  })
})

// Profile checkpoint testing
import { profileReport, profileCheckpoint } from '../../utils/startupProfiler.js'

describe('Profile Checkpoints', () => {
  test('all checkpoints complete', () => {
    profileCheckpoint('test_start')
    // ... do work ...
    profileCheckpoint('test_end')

    const report = profileReport()
    expect(report).toContain('test_start')
    expect(report).toContain('test_end')
  })
})

Key Testing Principles

  1. VCR for Determinism: API responses recorded once, replayed forever
  2. Mocks for Speed: External dependencies mocked for unit tests
  3. E2E for Confidence: Full CLI tests verify end-to-end flow
  4. Snapshots for UI: Visual regression testing with Ink output
  5. Builders for Readability: Factory functions create test data
  6. Isolation for Reliability: Each test starts fresh

Debugging Failed Tests

Bash
# Verbose output
bun test --verbose

# Single test
bun test -t "test name"

# With logging
DEBUG=1 bun test

# Inspect VCR recordings
cat tests/fixtures/failed-test.json | jq '.recordings[0]'