You've got four memory types: short-term, long-term facts, episodic, procedural. A real agent uses all of them, layered. The art is picking what to pull in when, and what to write out, so the agent has the right context at the right moment without drowning in its own history. This page is the architecture.
Most production agents don't need all five. The minimum useful system has short-term + user profile. Add vector and episodic when you want "remembers past conversations." Add procedures when you want the agent to get better at recurring tasks.
recall(topic) when it needs older context.Each retrieval path is a separate decision. Aggressive always-retrieve = slow, expensive, noisy. Too conservative = agent feels amnesic. Tune per retrieval path, not as one dial.
Writing is where memory systems rot if you're not careful. Default rules:
The cleanest way to give the agent access to memory is to expose it as tools the model can call:
remember(fact) # write to long-term facts recall(topic) # semantic search over memory list_episodes(user, days) # browse recent sessions get_procedure(name) # load a procedure forget(fact_id) # delete on user request
Now the model can decide when to reach for memory, same as it decides when to reach for a search tool. This keeps the loop clean and traceable.
User: "Fix the failing tests in the billing module."
fix_failing_tests procedure retrieved. Steps: run the tests, read the failure, narrow to the specific assertion, read surrounding code, propose fix, re-run.recall("billing timezone bug") and finds a note from the past saying the team standardized on UTC.Five different memory layers, each contributing something the agent couldn't have figured out from scratch. The task that would've taken 20 tool calls takes 8.
Memory that never decays becomes wrong memory:
Every memory operation must be scoped to a user ID. One missing filter somewhere and user A's private facts leak into user B's session. Build a memory layer API that takes user_id as a required arg everywhere; never let a raw DB query into your retrieval path.
Track: what memory was loaded per session, how much context it consumed, how often each retrieval path produced something the agent actually used. Unused retrievals are cost you can cut.