Query rewriting

Users don't phrase queries the way documents do. They use pronouns, they're brief, they ask follow-ups that depend on context. Query rewriting uses an LLM to rewrite the user's query into a form better suited for retrieval. It's one of the lowest-effort quality wins in modern RAG.

The problem query rewriting solves

Ambiguous pronouns

User: "how does it handle rate limiting?"

The embedding of "how does it handle rate limiting" doesn't know what "it" is. Retrieved chunks may be about rate limiting in general, missing the specific product the user meant.

Conversational follow-ups

Turn 1: "Tell me about Pinecone's pricing."

Turn 2: "What about performance?"

Turn 2 alone embeds as a generic performance question. With rewriting: "How does Pinecone perform?"

Short queries

"refund policy" vs "what is our customer refund policy including eligibility, timeframe, and process"

The expanded version retrieves better because it provides more signal to match against.

Mismatched vocabulary

User says "login not working"; docs say "authentication failures." Rewriting to include synonyms bridges the gap.

The rewriting patterns

1. Contextualization

Rewrite the query to resolve pronouns and implicit context from conversation history.

Given the conversation history:
- User: Tell me about Pinecone pricing
- Assistant: [response about Pinecone]
- User: What about performance?

Rewrite the last user query as a standalone search query.

→ "How does Pinecone perform?"

2. Query expansion

Add synonyms, related terms, or rephrasings to improve lexical matching.

"login issues" → "login authentication sign-in access issues problems failures"

3. Query decomposition

Break a compound query into sub-queries. Retrieve for each separately.

"What's our refund policy and how do I escalate a billing dispute?"
→ ["What is the refund policy?", "How to escalate a billing dispute?"]

4. Step-back questions

Generate a more general version of the query to retrieve broader context.

"Why did my OAuth token expire after 2 hours?"
→ step-back: "How does OAuth token expiration work?"

5. Hypothetical Document Embeddings (HyDE)

Generate a fake answer, embed that, retrieve based on its embedding. See HyDE.

Prompt patterns for rewriting

SYSTEM: You are a search query optimizer. Given a user query, rewrite
it as a clear, self-contained search query that would retrieve the
most relevant documents. Expand abbreviations, resolve pronouns, and
include relevant synonyms. Return only the rewritten query.

USER: [raw query]

Using a small model

Query rewriting doesn't need your best reasoning model. A fast, cheap model (GPT-4o-mini, Claude Haiku, Gemini Flash) does this well. Latency 50-200ms per rewrite. Cost is a fraction of the generation step.

The multi-query pattern

Generate multiple rewrites and run retrieval for each. Union and dedupe. More coverage, more cost. See multi-query + fusion.

When to skip query rewriting

Diagnostic: is rewriting helping?

Compare retrieval quality with and without rewriting on the same eval set. Rewriting should improve hit rate, especially on:

If rewriting doesn't help, your initial retrieval already handles query variety well (or your rewriting prompt is wrong). Test both hypotheses.

Preserving user intent

The failure mode of query rewriting: the rewrite subtly changes what the user asked. Guard against this by:

Next: HyDE.