Home›Expertise›AI Agents›Episodic memory

Episodic memory

📖 2 min readUpdated 2026-04-19

Episodic memory is the agent remembering specific past events, not just facts. "Last week when we debugged that deployment issue." "In our session on Tuesday, we decided to skip the retry." It's the difference between an agent that knows things about you and an agent that feels like a long-term collaborator who remembers the shared history.

Episodic vs the other memory types

Long-term facts say what is true. Episodic memory says what happened, when, with what outcome.

What an episode looks like

{
  "episode_id": "session_2026_04_19_abc",
  "timestamp": "2026-04-19T14:30:00Z",
  "user": "user_123",
  "topic_tags": ["debugging", "python", "database"],
  "summary": "User hit a database timeout. We traced it to a missing index on the orders table. Added the index. Query went from 4.2s to 180ms.",
  "artifacts": ["migrations/020_add_index.sql"],
  "outcome": "resolved"
}

Not the full transcript. A distilled record: who, when, what happened, what worked, what artifact was produced. Small enough to store millions of them.

How the agent uses episodes

At the start of a new session, or when the agent reaches for recall mid-task:

Load by user → bring in this user's recent episodes.
Filter by topic similarity → "the current task mentions database timeouts, pull episodes tagged database."
Weight by recency → newer episodes get more attention unless an older one is strongly relevant.

A handful of relevant episodes gets injected into context. Now the agent can reference "last week when we added the index" without being told.

A worked example

Session 1. User and agent work through a timeout. Agent writes an episode summary at session end.

Session 2, a month later. User: "I'm seeing slow queries on the orders table again." Agent retrieves the previous episode (topic match + user match), loads it into context, opens with: "Last month we added an index on orders(customer_id, created_at) which fixed a 4.2s timeout. Is this a different slow query, or has that regressed?"

That's the payoff. The agent isn't smarter. It just has the history available so it doesn't start from zero.

The summarization step

Raw transcripts are too long to store and retrieve efficiently. At the end of every session that produced something meaningful:

Generate a short summary (3-5 sentences).
Tag with 2-5 topic keywords.
Record outcome (resolved, abandoned, escalated, etc.).
Store alongside references to artifacts (files, PRs, docs).

If the agent later needs the full transcript, it can expand from the episode record. But most of the time the summary is enough.

Privacy and retention

Episodes contain everything the user and agent discussed. Respect:

Retention limits (auto-delete after N months unless explicitly marked keep).
Access control (this user's episodes, not someone else's).
Deletion rights (user can ask you to forget a session; actually forget it).
Redaction (strip secrets before writing to the episode store).

Pitfalls

Storing raw transcripts instead of summaries. Your retrieval gets noisy; the model's context gets huge.
No topic tagging. You can only retrieve by user+time, which misses the interesting semantic matches.
Recency blindness. Agent loads a three-year-old episode about a stack that doesn't exist anymore.
No outcome field. Agent doesn't know whether last time's approach worked; may repeat a mistake.

What to do with this

Start by writing episode summaries at session end. Even without retrieval, you'll have a useful audit log.
Read procedural memory for the "how to do this" complement to "what happened last time."
Read memory system design for composing all the memory types.