The Three Tiers of Memory
Overview
Effective agent memory systems operate across three distinct tiers, each serving different purposes and operating at different timescales.
Tier 1: Working Memory (Context Window)
Timescale: Current conversation/task Size: Limited by model context window (4K-128K+ tokens) Speed: Immediate access Cost: Token cost per API call
Characteristics
- Direct inclusion in prompts
- Immediate availability to the model
- No retrieval latency
- Constrained by token limits
Use Cases
- Current conversation history
- Active task context
- Immediate user preferences
- Session-specific data
Tier 2: Short-term Memory (Recent History)
Timescale: Recent sessions (days to weeks) Size: Moderate (thousands of entries) Speed: Fast retrieval (<100ms) Cost: Database storage + retrieval compute
Characteristics
- Recently accessed information
- Frequently used patterns
- User-specific recent context
- Cached for quick access
Use Cases
- Recent conversation summaries
- User interaction patterns
- Temporary preferences
- Session bridging context
Tier 3: Long-term Memory (Persistent Knowledge)
Timescale: Extended periods (months to years) Size: Large (millions+ of entries) Speed: Variable retrieval latency Cost: Persistent storage + indexing
Characteristics
- Comprehensive historical data
- Learned user models
- Domain knowledge
- Relationship graphs
Use Cases
- User personality models
- Historical interaction patterns
- Learned preferences and behaviors
- Long-term relationship context
Memory Tier Interactions
Promotion
Information moves from working → short-term → long-term based on:
- Frequency of access
- Importance scores
- User feedback signals
- Temporal patterns
Retrieval
The system searches across tiers:
- Check working memory first (free)
- Query short-term for recent context
- Search long-term for deep history
Eviction
Managing tier capacity through:
- LRU (Least Recently Used) policies
- Importance-based retention
- User-controlled deletion
- Automated summarization
Implementation Patterns
Cascade Architecture
Request → Working Memory → Short-term Cache → Long-term StoreParallel Architecture
Request → All tiers simultaneously → Merge resultsHierarchical Architecture
Request → Tier selection logic → Targeted retrievalNext Steps
- Understand The Memory Problem
- Learn about Mental Models & Terminology
- Explore Token Budgeting strategies