Short-term memory

Short-term memory is whatever's in the agent's context window right now. It's what the model sees in the current LLM call. Managing short-term memory well means keeping the relevant, trimming the stale, and surviving long sessions without quality collapse.

What lives in short-term

The long-session problem

After 20-30 tool calls, a session's context can balloon to 50K+ tokens. Problems:

Strategies

Trimming

Drop old tool call results once they're no longer relevant. Keep a summary, drop the raw data.

Summarization

When context exceeds a threshold, compress old parts into a summary. "Earlier in this session: [key facts, decisions, results]."

Chunking

Break very long tasks into sub-sessions. Each sub-session has bounded context; final summary bridges them.

Selective recall

Store everything externally (vector DB); retrieve only what's relevant for the current step. This blends short-term and long-term.

What to keep in full

What to trim or summarize