Short-term memory
📖 3 min readUpdated 2026-04-19
Short-term memory is whatever's in the agent's context window right now. It's what the model sees in the current LLM call. Managing short-term memory well means keeping the relevant, trimming the stale, and surviving long sessions without quality collapse.
What lives in short-term
- System prompt
- User's current message
- Recent turns in conversation
- Tool call history for this session
- Any context the agent actively needs
The long-session problem
After 20-30 tool calls, a session's context can balloon to 50K+ tokens. Problems:
- "Lost in the middle", the model under-attends to middle sections of long context
- Cost, every LLM call pays for the full context
- Latency, longer context = slower inference
- Quality decay, models' reasoning degrades with very long contexts
Strategies
Trimming
Drop old tool call results once they're no longer relevant. Keep a summary, drop the raw data.
Summarization
When context exceeds a threshold, compress old parts into a summary. "Earlier in this session: [key facts, decisions, results]."
Chunking
Break very long tasks into sub-sessions. Each sub-session has bounded context; final summary bridges them.
Selective recall
Store everything externally (vector DB); retrieve only what's relevant for the current step. This blends short-term and long-term.
What to keep in full
- Original user request
- System prompt
- Last 2-3 turns of conversation
- Most recent tool results (if still relevant)
- Any ground-truth artifacts (the document being edited, the plan)
What to trim or summarize
- Old tool calls whose results were consumed
- Intermediate reasoning traces from earlier phases
- Search results that were skimmed, not used