A real agent memory system combines short-term, long-term, episodic, and procedural memory into one architecture. Here's how to design it so the agent has the right context at the right time without drowning in data.
In-memory, per-session. Current conversation and recent tool results. Cleaned up at session end.
Key-value store. Preferences, settings, stable facts. Loaded at session start.
Vector store of past conversations, notes, artifacts. Retrieved on demand by similarity.
Chronological record of past sessions with summaries. Queryable by time or topic.
Named how-to recipes the agent can invoke.
At each agent step, the orchestrator decides what memory to pull:
recall(topic) tool to fetch specific memoriesWriting to long-term memory should be deliberate, not automatic:
Memory that never decays or updates becomes wrong. Build explicit:
Expose memory operations as tools the LLM can call:
remember(fact), write to long-termrecall(topic), semantic retrievalforget(fact_id), deletelist_procedures(), browse procedure libraryThe LLM uses these as first-class citizens alongside its other tools.