Memory system design

A real agent memory system combines short-term, long-term, episodic, and procedural memory into one architecture. Here's how to design it so the agent has the right context at the right time without drowning in data.

The layered architecture

Layer 1: Session context (short-term)

In-memory, per-session. Current conversation and recent tool results. Cleaned up at session end.

Layer 2: User profile (long-term, structured)

Key-value store. Preferences, settings, stable facts. Loaded at session start.

Layer 3: Semantic memory (long-term, unstructured)

Vector store of past conversations, notes, artifacts. Retrieved on demand by similarity.

Layer 4: Episodic log (history)

Chronological record of past sessions with summaries. Queryable by time or topic.

Layer 5: Procedure library

Named how-to recipes the agent can invoke.

The retrieval stack

At each agent step, the orchestrator decides what memory to pull:

  1. Always: user profile (short, cheap, always relevant)
  2. On session start: relevant recent episodes
  3. On demand: agent calls recall(topic) tool to fetch specific memories
  4. Automatic: vector search for semantically related memories when the task matches

Writes are as important as reads

Writing to long-term memory should be deliberate, not automatic:

Expiry and updates

Memory that never decays or updates becomes wrong. Build explicit:

Tools for memory

Expose memory operations as tools the LLM can call:

The LLM uses these as first-class citizens alongside its other tools.