Fetch-relevant-context vs stuff-everything-in-context.
| RAG | Long-context window | |
|---|---|---|
| Cost per query | Low | High (full context billed) |
| Latency | Lower | Higher |
| Quality | Depends on retrieval | Can degrade mid-context |
| Freshness | Easy to refresh | Same |
Large corpora, recurring queries, cost sensitivity.
One-off analysis of a single long document.
For production systems, RAG. Long-context is for specific use cases, not a general replacement.