Skip to Content
DocumentationFundamentalsThe Three Tiers of Memory

The Three Tiers of Memory

Overview

Effective agent memory systems operate across three distinct tiers, each serving different purposes and operating at different timescales.

Tier 1: Working Memory (Context Window)

Timescale: Current conversation/task Size: Limited by model context window (4K-128K+ tokens) Speed: Immediate access Cost: Token cost per API call

Characteristics

  • Direct inclusion in prompts
  • Immediate availability to the model
  • No retrieval latency
  • Constrained by token limits

Use Cases

  • Current conversation history
  • Active task context
  • Immediate user preferences
  • Session-specific data

Tier 2: Short-term Memory (Recent History)

Timescale: Recent sessions (days to weeks) Size: Moderate (thousands of entries) Speed: Fast retrieval (<100ms) Cost: Database storage + retrieval compute

Characteristics

  • Recently accessed information
  • Frequently used patterns
  • User-specific recent context
  • Cached for quick access

Use Cases

  • Recent conversation summaries
  • User interaction patterns
  • Temporary preferences
  • Session bridging context

Tier 3: Long-term Memory (Persistent Knowledge)

Timescale: Extended periods (months to years) Size: Large (millions+ of entries) Speed: Variable retrieval latency Cost: Persistent storage + indexing

Characteristics

  • Comprehensive historical data
  • Learned user models
  • Domain knowledge
  • Relationship graphs

Use Cases

  • User personality models
  • Historical interaction patterns
  • Learned preferences and behaviors
  • Long-term relationship context

Memory Tier Interactions

Promotion

Information moves from working → short-term → long-term based on:

  • Frequency of access
  • Importance scores
  • User feedback signals
  • Temporal patterns

Retrieval

The system searches across tiers:

  1. Check working memory first (free)
  2. Query short-term for recent context
  3. Search long-term for deep history

Eviction

Managing tier capacity through:

  • LRU (Least Recently Used) policies
  • Importance-based retention
  • User-controlled deletion
  • Automated summarization

Implementation Patterns

Cascade Architecture

Request → Working Memory → Short-term Cache → Long-term Store

Parallel Architecture

Request → All tiers simultaneously → Merge results

Hierarchical Architecture

Request → Tier selection logic → Targeted retrieval

Next Steps