Procedural memory is the agent remembering how to do things. Not facts, not past events. Skills. "Here's the recipe that worked for this kind of task." "Here are the defaults that made it succeed." An agent with procedural memory gets better at recurring tasks over time because it doesn't reason from scratch every run. It starts from a known-good recipe and adapts.
A procedure is a named, reusable recipe for a class of task. It includes the steps, the default parameters, the constraints, and the signals that a given situation matches it.
{
"procedure_id": "deploy_staging",
"description": "Deploy a branch to staging",
"when_to_use": "User says 'deploy to staging' or 'push this to staging'",
"steps": [
"Confirm branch is up to date with main",
"Run tests locally",
"Push to remote",
"Trigger staging deploy workflow",
"Verify at staging.example.com"
],
"defaults": {"timeout_minutes": 10},
"constraints": "Don't deploy on Fridays after 3pm ET"
}
The first time you ask the agent to do a task, it explores: tries things, hits dead ends, eventually succeeds. Post-task, you (or the agent) distill that successful trajectory into a procedure. Next time, the agent retrieves the procedure and follows it (or adapts when the situation is different).
At the start of a task, the agent searches the procedure library: "Does any existing procedure match this task?" If yes, it retrieves the steps and uses them as a scaffold. If no, it plans from scratch and maybe writes a new procedure after.
The retrieval can be keyword-based (simple: match on description), vector-based (semantic: current task embedding vs procedure embeddings), or explicit (the user says "use the staging deploy procedure").
Month 1. Agent sees 50 "my invoice is wrong" tickets. It figures out the right flow: check the customer record, pull the latest invoice, compare line items to the order history, identify the discrepancy, either issue a credit or escalate if over $200.
Post-ticket distillation. Agent writes a procedure resolve_invoice_discrepancy with those five steps and a "escalate above $200" constraint.
Month 2. A new "my invoice is wrong" ticket arrives. Agent retrieves resolve_invoice_discrepancy, follows it directly. Resolves in 4 tool calls instead of 12. Success rate climbs from 70% to 95%.
Same model. Same tools. The only thing that changed is the agent started with a known recipe instead of exploring from scratch.
Procedures drift from reality. Build in review: