InkdownInkdown
Start writing

Study

59 filesยท8 subfolders

Shared Workspace

Study
core

07-MemoryAndContext

Shared from "Study" on Inkdown

Memory and Context Management Architecture

Overview

The context management system prevents token overflow while preserving critical information. It intelligently trims conversation history using multiple strategies based on token costs.


The Engine: Context Trimming

File: src/server/repositories/engine/engine.ts:201

TypeScript
export const engine = async (
	input: TEngineInput,
	toolIterationInfo?: TToolIterationInfo,
): Promise<TEngineOutput> => {
	const { messages, response, inLoop, contextLimit } = input;

	// Build current message (assistant response + tool results)
	const currentMessage = AssistantMessage({
		content: [
			{
				type: "TEXT",
				toolCalls: response.toolCalls,
				text: response.content,
				tokens: response.tokens.output + response.tokens.reasoning,
			},
			{ type: "TOOL_RESULT", toolResults: response.toolResults },
		],
		reasoning: response.reasoning,
	});

	// Calculate token counts for each section
	const chatHistorySoFarTokens = getTokens(messages);
	const inLoopTokens = calculateMessagesTokens(inLoop);
	const inLoopSummaryIfPossibleTokens = calculateMessagesTokens(inLoop, true);
	const currentMessageTokens = calculateMessagesTokens([currentMessage]);
	const currentMessageSummaryIfPossibleTokens = calculateMessagesTokens(
		[currentMessage],
		true,
	);

	// Default summary size
	const DEFAULT_SUMMARY_SIZE = 1024;

	// Build token cost table for layout selection
	const tokenTable: TTokenTable = {
		HISTORY: {
			FULL: chatHistorySoFarTokens,
			TOOL_PROVIDED_SUMMARY_IF_POSSIBLE: Infinity, // Never use for history
			SUMMARY: DEFAULT_SUMMARY_SIZE,
		},
		IN_LOOP: {
			FULL: inLoopTokens,
			TOOL_PROVIDED_SUMMARY_IF_POSSIBLE: inLoopSummaryIfPossibleTokens,
			SUMMARY: DEFAULT_SUMMARY_SIZE,
		},
		CURRENT_MESSAGE: {
			FULL: currentMessageTokens,
			TOOL_PROVIDED_SUMMARY_IF_POSSIBLE: currentMessageSummaryIfPossibleTokens,
			SUMMARY: DEFAULT_SUMMARY_SIZE,
		},
	};

	// Choose optimal layout (cheapest that fits within contextLimit)
	const bestLayout = chooseOptimalLayout(contextLimit, tokenTable);
	if (!bestLayout) throw new Error("LAYOUT_NOT_FOUND");

	if (toolIterationInfo) {
		toolIterationInfo.layout = bestLayout;
	}

	// Execute handlers for each section
	const history_ = await HANDLER_MAP[bestLayout.HISTORY]({
		messages,
		config: input,
		section: "HISTORY",
	});

	const inLoop_ = await HANDLER_MAP[bestLayout.IN_LOOP]({
		messages: inLoop,
		config: input,
		section: "IN_LOOP",
	});

	const currentMessage_ = await HANDLER_MAP[bestLayout.CURRENT_MESSAGE]({
		messages: [currentMessage],
		config: input,
		section: "CURRENT_MESSAGE",
	});

	return {
		inLoop: [...inLoop_.trimmedMessages, ...currentMessage_.trimmedMessages],
		summary: history_?.summary,
		shouldDumpSummaryInDB:
			chatHistorySoFarTokens +
				inLoopSummaryIfPossibleTokens +
				currentMessageSummaryIfPossibleTokens >
			contextLimit,
	};
};
programming-language-concepts.md
zero-language-explanation.md
DB
01-introduction.md
02-relational-databases.md
03-database-design.md
04-indexing.md
05-transactions-acid.md
06-nosql-databases.md
07-query-optimization.md
08-replication-ha.md
09-sharding-partitioning.md
10-caching-strategies.md
11-cap-theorem.md
12-connection-pooling.md
13-backup-recovery.md
14-monitoring.md
15-database-selection.md
README.md
JS
Event loop
Merlin Backend
01-Orchestration.md
02-DeepResearch.md
03-Search.md
04-Scraping.md
05-Streaming.md
06-MultiProviderLLM.md
07-MemoryAndContext.md
08-ErrorHandling.md
09-RateLimiting.md
10-TaskQueue.md
11-SecurityAndAuth.md
Orchestration-2nd-draft
OpenAI Agents Python
00_OVERVIEW.md
01_AGENT_SYSTEM.md
02_RUNNER_SYSTEM.md
03_TOOL_SYSTEM.md
04_ITEMS_SYSTEM.md
05_GUARDRAILS.md
06_HANDOFFS.md
07_MEMORY_SESSIONS.md
08_MODEL_PROVIDERS.md
09_SANDBOX_SYSTEM.md
10_TRACING.md
11_RUN_STATE.md
12_CONTEXT.md
13_LIFECYCLE_HOOKS.md
14_CONFIGURATION.md
15_ERROR_HANDLING.md
16_STREAMING.md
17_EXTENSIONS.md
18_MCP_INTEGRATION.md
19_BEST_PRACTICES.md
20_ARCHITECTURE_PATTERNS.md
opencode-study
context-handling
core
Python
Alembic
Basics
sqlalchemy - fastapi
SQLAlchemy overview
tweets
system_design_for_agentic_apps.md

Layout Selection Algorithm

File: src/server/repositories/engine/engine.ts:38

TypeScript
// Calculate total cost of a layout
const getCost = ({
	tokenTable,
	layout,
}: {
	tokenTable: TTokenTable;
	layout: TLayout;
}): number =>
	tokenTable.HISTORY[layout.HISTORY] +
	tokenTable.IN_LOOP[layout.IN_LOOP] +
	tokenTable.CURRENT_MESSAGE[layout.CURRENT_MESSAGE];

// Find cheapest layout under context limit
const chooseOptimalLayout = (
	contextLimit: number,
	tokenTable: TTokenTable,
	layouts: readonly TLayout[] = PREFERRED_TRIMMING_LAYOUTS,
): TLayout | null =>
	layouts.find((layout) => getCost({ tokenTable, layout }) < contextLimit) ??
	null;

Layout Options:

TypeScript
type TLayoutStrategy = "FULL" | "TOOL_PROVIDED_SUMMARY_IF_POSSIBLE" | "SUMMARY";

type TLayout = {
	HISTORY: TLayoutStrategy;
	IN_LOOP: TLayoutStrategy;
	CURRENT_MESSAGE: TLayoutStrategy;
};

// Preferred layouts (in priority order)
const PREFERRED_TRIMMING_LAYOUTS: readonly TLayout[] = [
	{ HISTORY: "FULL", IN_LOOP: "FULL", CURRENT_MESSAGE: "FULL" }, // No trimming
	{ HISTORY: "SUMMARY", IN_LOOP: "FULL", CURRENT_MESSAGE: "FULL" }, // Summarize history only
	{
		HISTORY: "SUMMARY",
		IN_LOOP: "TOOL_PROVIDED_SUMMARY_IF_POSSIBLE",
		CURRENT_MESSAGE: "FULL",
	},
	{
		HISTORY: "SUMMARY",
		IN_LOOP: "TOOL_PROVIDED_SUMMARY_IF_POSSIBLE",
		CURRENT_MESSAGE: "TOOL_PROVIDED_SUMMARY_IF_POSSIBLE",
	},
	{
		HISTORY: "SUMMARY",
		IN_LOOP: "SUMMARY",
		CURRENT_MESSAGE: "TOOL_PROVIDED_SUMMARY_IF_POSSIBLE",
	},
	{ HISTORY: "SUMMARY", IN_LOOP: "SUMMARY", CURRENT_MESSAGE: "SUMMARY" }, // Maximum trimming
];

How It Works:

  1. Calculate token cost for each layout option
  2. Start with cheapest (FULL everywhere)
  3. If total > contextLimit, try next layout
  4. Continue until find one that fits
  5. If none fit, throw LAYOUT_NOT_FOUND error

Token Calculation

File: src/server/repositories/engine/engine.ts:148

TypeScript
// Get tokens from message array
const getTokens = (messages: TPromptMessage[]) =>
	messages.map((msg) => msg.tokens).reduce((sum, tokens) => tokens + sum, 0);

// Calculate tokens from V2 content (with metadata)
const getTokensFromContentV2 = (
	contentV2: TMessageContentV2WithMetadata,
	summaryIfPossible?: boolean,
): number => {
	return contentV2.reduce((sum, content) => {
		switch (content.type) {
			case "TEXT": {
				let toolCallsCount = 0;
				if (content.toolCalls) {
					for (const toolCall of content.toolCalls) {
						toolCallsCount += toolCall.tokens ?? 0;
					}
				}
				return sum + content.tokens + toolCallsCount;
			}
			case "PROGRESS":
				return sum; // No tokens (UI only)
			case "TOOL_RESULT":
				return getToolResultsToken(content.toolResults, summaryIfPossible);
		}
	}, 0);
};

// Calculate tokens from tool results
const getToolResultsToken = (
	results: TToolResultTypeWithMetadata[],
	summaryIfPossible?: boolean,
) =>
	results
		.map(
			(result) =>
				summaryIfPossible &&
				result.summaryTokens &&
				!result.shouldIncludeInHistory
					? result.summaryTokens // Use summary if available
					: result.tokens, // Use full content
		)
		.reduce((sum, tokens) => tokens + sum, 0);

Handler Implementations

1. Full Handler (No Trimming)

File: src/server/repositories/engine/engine.ts:133

TypeScript
const getFull: TEngineHandler = async (args) => {
	return { trimmedMessages: args.messages };
};

Returns messages unchanged.

2. Tool-Provided Summary Handler

File: src/server/repositories/engine/engine.ts:94

TypeScript
const getToolProvidedSummaryIfPossible: TEngineHandler = async (args) => {
	return {
		trimmedMessages: args.messages.map((msg) => {
			return {
				...msg,
				content: msg.content.map((part) => {
					switch (part.type) {
						case "TEXT":
							return part;
						case "PROGRESS":
							return part;
						case "TOOL_RESULT":
							return {
								...part,
								toolResults: part.toolResults.map((result) => {
									// @ts-expect-error Types need fixing
									if (result.summary) {
										return {
											...result,
											// Use summary instead of full content
											// @ts-expect-error Types need fixing
											content: result.summary,
										};
									}
									return result;
								}),
							};
					}
				}),
			};
		}),
	};
};

Uses tool-provided summary if shouldIncludeInHistory: false.

3. LLM Summary Handler

File: src/server/repositories/engine/engine.ts:46

TypeScript
const getSummary: TEngineHandler = async (args) => {
	const { response, agentName } = args.config;

	const currentMessage = AssistantMessage({
		content: [
			{
				type: "TEXT",
				toolCalls: response.toolCalls ? response.toolCalls : undefined,
				text: response.content,
				tokens: response.tokens.output + response.tokens.reasoning,
			},
			{ type: "TOOL_RESULT", toolResults: response.toolResults },
		],
		reasoning: response.reasoning,
	});

	// For deep research agents, exclude current message from summarization
	const shouldExcludeCurrentMessage = agentName
		? isDeepResearchSupervisor(agentName) || isResearcherAgent(agentName)
		: false;

	const messagesToSummarize = shouldExcludeCurrentMessage
		? args.messages
		: [...args.messages, currentMessage];

	// Call summarization service
	const response_ = await chatHistorySummariser({
		branch: messagesToSummarize,
		openAiTools: args.config.toolDefinitions ?? [],
		model: args.config.summarizationModel,
	});

	return {
		summary: response_.summary,
		trimmedMessages: [
			SystemMessage({
				content: prompts.getUserChatHistorySummaryWithPrompt(response_.summary),
				tokens: (await tokenizer.encode(response_.summary)).length,
			}),
		],
	};
};

Uses LLM to generate summary of conversation history.


Handler Map

File: src/server/repositories/engine/engine.ts:142

TypeScript
const HANDLER_MAP = {
	SUMMARY: getSummary,
	TOOL_PROVIDED_SUMMARY_IF_POSSIBLE: getToolProvidedSummaryIf_POSSIBLE,
	FULL: getFull,
} satisfies TEngineHandlerMap;

Research Memory System

File: src/server/endpoints/unified/features/deepResearch/memory/index.ts

For deep research, maintains research context across steps:

TypeScript
export const ResearchMemoryManager = {
	// Core operations
	getMemory: () => getMemory(),
	updateMemory: (memory: TResearchMemory) => updateMemory(memory),
	initializeMemory: () => initializeMemory(),
	cleanupMemory: () => cleanupMemory(),

	// Confidence tracking per step
	getStepConfidence: (stepId: string) => getStepConfidence(stepId),
	updateStepConfidence: (
		stepId: string,
		confidence: number,
		iteration: number,
	) =>
		updateStepConfidence(stepId, {
			confidence,
			timestamp: new Date().toISOString(),
			iteration,
		}),
	calculateConfidence: (learnings: TLearning[], iteration: number) =>
		calculateConfidence(learnings, iteration),

	// Context analysis
	scoreContextElementsForStep: (
		elements: TScoredContextElement[],
		step: TDeepResearchPlanStep,
	) => {
		const transformedElements = elements.map((e) => ({
			id: e.source || generateId("contextElement"),
			content: e.content,
			type: e.type,
			score: e.relevance,
			metadata: { source: e.source },
		}));
		return scoreContextElementsForStep(transformedElements, step);
	},

	extractInsightsAndGaps: (
		step: TDeepResearchPlanStep,
		learnings: TLearning[],
	) => extractInsightsAndGaps(step, learnings),

	deduplicateInsights: (memory: TResearchMemory) => deduplicateInsights(memory),
};
Memory Structure
TypeScript
type TResearchMemory = {
	insights: TInsight[]; // Key findings with confidence
	knowledgeGaps: TKnowledgeGap[]; // Unanswered questions
	contradictions: TContradiction[]; // Conflicting information
	stepConfidence: Map<string, TStepConfidence>; // Per-step confidence
	metadata: {
		createdAt: string;
		updatedAt: string;
		totalElements: number;
	};
};

type TInsight = {
	id: string;
	content: string;
	confidence: number; // 0.0 - 1.0
	source: string; // stepId or URL
	type: "insight";
	timestamp: string;
};

type TKnowledgeGap = {
	id: string;
	question: string;
	relevance: number; // 0.0 - 1.0
	relatedInsights: string[];
	timestamp: string;
};

type TContradiction = {
	id: string;
	insightA: string;
	insightB: string;
	description: string;
	severity: "low" | "medium" | "high";
};
Context Scoring
TypeScript
// Score context elements by relevance to current step
export const scoreContextElementsForStep = async (
	elements: TContextElement[],
	step: TDeepResearchPlanStep,
): Promise<TScoredElement[]> => {
	// Use LLM to score each element
	const scoredElements = await Promise.all(
		elements.map(async (element) => {
			const relevanceScore = await calculateRelevance(
				element.content,
				step.task,
			);
			return {
				...element,
				score: relevanceScore,
			};
		}),
	);

	// Sort by score (highest first)
	return scoredElements.sort((a, b) => b.score - a.score);
};

Why Context Scoring:

  • Deep research generates many insights
  • Not all relevant to current step
  • Scoring prioritizes important context
  • Prevents context overflow

Integration with Orchestrator

TypeScript
// In toolOrchestrator.ts
const trimmedMessages = await engine(
	{
		messages: currentMessages,
		inLoop: state.inLoopTrimmedMessages,
		contextLimit: contextLimit,
		response: {
			tokens: streamUsage.tokens,
			toolCalls: toolCalls,
			toolResults: await formatToolResultsForLLM(toolResults),
			content: llmResponseContent,
			reasoning: reasoningContent,
		},
		agentName: this.agentName,
	},
	toolIterationInfo[toolIterationInfo.length - 1],
);

state.inLoopTrimmedMessages = trimmedMessages.inLoop;

// If history was summarized
if (trimmedMessages.summary) {
	const { messages: adjustedMessages } = await buildMessages(
		this.chatCtx,
		config,
		this.registry,
		this.agentName,
		trimmedMessages.summary,
	);
	currentMessages = adjustedMessages;

	if (trimmedMessages.shouldDumpSummaryInDB) {
		assistantMessageNode.isChatSummarizedSoFar = true;
		assistantMessageNode.summary = trimmedMessages.summary;
	}
}

Context Limit by Plan

Different plans get different context limits:

TypeScript
const getLimitsBasedOnUserPlan = (
	user: TUserDoc,
	modelConfig: TModelConfig,
	agentName: TAgentName,
) => {
	// Model max tokens (e.g., GPT-4o: 128k)
	const modelMaxTokens = modelConfig.maxContextTokens;

	// Plan-based allocation
	const contextLimit = (() => {
		switch (user.userPlan) {
			case "FREE":
				return Math.min(modelMaxTokens, 16000); // 16k limit
			case "PRO":
				return Math.min(modelMaxTokens, 32000); // 32k limit
			case "ULTRA":
			default:
				return modelMaxTokens; // Full model limit
		}
	})();

	// Tool call limits
	const toolCallsLimit = (() => {
		switch (user.userPlan) {
			case "FREE":
				return 8;
			case "PRO":
				return 15;
			case "ULTRA":
			default:
				return 25;
		}
	})();

	return {
		contextLimit,
		toolCallsLimit,
		shouldDoParallelToolCalls: user.userPlan !== "FREE",
	};
};

Summary

The context management system:

  1. Three Sections: HISTORY, IN_LOOP, CURRENT_MESSAGE
  2. Three Strategies: FULL, SUMMARY, TOOL_PROVIDED_SUMMARY
  3. Optimal Layout: Cheapest that fits in contextLimit
  4. LLM Summarization: When tool summaries insufficient
  5. Research Memory: Insights, gaps, contradictions with scores
  6. Context Scoring: Prioritize relevant information
  7. Plan-Based Limits: Free (16k), Pro (32k), Ultra (full)

Key Principle: Preserve maximum useful information while staying within token limits. Summarize only when necessary, use tool summaries when available.