HyDE (Hypothetical Document Embeddings)

HyDE (Hypothetical Document Embeddings) is a retrieval technique with a counterintuitive core idea: generate a fake answer to the user's query with an LLM, embed that fake answer, and retrieve real documents similar to it. The LLM's hallucination serves as a better query representation than the original question. It works surprisingly well.

Why it works

Dense retrieval matches query embeddings to document embeddings. A short query is a weak representation, it matches questions better than answers. But your documents are answers. A hallucinated "answer-shaped" text is a stronger representation of what you're looking for, so it matches better.

The LLM's factual accuracy doesn't matter for retrieval, only its structural and vocabulary similarity to real documents.

The flow

1. User query: "How do I set up OAuth with Google?"
2. LLM prompt: "Write a short passage that would answer this question"
3. LLM output: "To set up OAuth with Google, first create a project in
   the Google Cloud Console. Then enable the Google+ API, configure
   OAuth consent, and generate client credentials. Use these credentials
   in your application's OAuth flow..."
4. Embed the LLM output (not the query)
5. Retrieve documents similar to the hypothetical passage
6. Pass real retrieved docs to final generation

When HyDE wins

When HyDE helps vs doesn't

Cost and latency

HyDE adds one LLM call before retrieval. With a small fast model, this is 100-500ms. For real-time RAG, weigh the latency cost against quality gain.

Mitigation: use the cheapest fast model (Haiku, GPT-4o-mini, Flash) for the hypothetical generation. Accuracy of the hallucination isn't critical, structural similarity is.

Variants

Query + HyDE combined

Retrieve using both the original query embedding and the HyDE embedding. Fuse results with RRF. Often better than either alone.

Multiple hypothetical documents

Generate 3-5 hypothetical passages, embed each, retrieve with each, union. More coverage but more cost.

HyDE with reasoning

Prompt the LLM to first reason about what kinds of documents would answer the question, then write one. Produces higher-quality hypothetical passages for complex queries.

The hallucination risk

HyDE generates fake content. The concern: what if the fake passage anchors the user's understanding or biases downstream generation?

The safeguard: the fake passage is only used for retrieval, never shown to the user and never passed to the final generation step. Final generation uses the real retrieved documents. HyDE is invisible to everything after retrieval.

Implementation

SYSTEM: Given a user question, write a 3-5 sentence passage that would
be a plausible excerpt from a document that answers the question. Use
the vocabulary, structure, and level of detail typical of technical
documentation. Don't worry about factual accuracy; focus on writing
text that matches the style of real documentation.

USER: [user question]

ASSISTANT: [hypothetical passage]

Then: retrieval_query = embed(hypothetical_passage)

Measured gains

On benchmarks, HyDE typically improves retrieval metrics by 5-15% over standard dense retrieval for short queries. The gain shrinks as query length grows (long queries are already answer-shaped).

The mental model

Dense retrieval matches "the shape of what you're asking for" to "the shape of documents in the corpus." A short query is a poor shape-match. An answer passage is a great shape-match. HyDE manufactures the latter from the former. That's it.

What to do with this