HyDE (Hypothetical Document Embeddings) is a retrieval technique with a counterintuitive core idea: generate a fake answer to the user's query with an LLM, embed that fake answer, and retrieve real documents similar to it. The LLM's hallucination serves as a better query representation than the original question. It works surprisingly well.
Dense retrieval matches query embeddings to document embeddings. A short query is a weak representation, it matches questions better than answers. But your documents are answers. A hallucinated "answer-shaped" text is a stronger representation of what you're looking for, so it matches better.
The LLM's factual accuracy doesn't matter for retrieval, only its structural and vocabulary similarity to real documents.
1. User query: "How do I set up OAuth with Google?" 2. LLM prompt: "Write a short passage that would answer this question" 3. LLM output: "To set up OAuth with Google, first create a project in the Google Cloud Console. Then enable the Google+ API, configure OAuth consent, and generate client credentials. Use these credentials in your application's OAuth flow..." 4. Embed the LLM output (not the query) 5. Retrieve documents similar to the hypothetical passage 6. Pass real retrieved docs to final generation
HyDE adds one LLM call before retrieval. With a small fast model, this is 100-500ms. For real-time RAG, weigh the latency cost against quality gain.
Mitigation: use the cheapest fast model (Haiku, GPT-4o-mini, Flash) for the hypothetical generation. Accuracy of the hallucination isn't critical, structural similarity is.
Retrieve using both the original query embedding and the HyDE embedding. Fuse results with RRF. Often better than either alone.
Generate 3-5 hypothetical passages, embed each, retrieve with each, union. More coverage but more cost.
Prompt the LLM to first reason about what kinds of documents would answer the question, then write one. Produces higher-quality hypothetical passages for complex queries.
HyDE generates fake content. The concern: what if the fake passage anchors the user's understanding or biases downstream generation?
The safeguard: the fake passage is only used for retrieval, never shown to the user and never passed to the final generation step. Final generation uses the real retrieved documents. HyDE is invisible to everything after retrieval.
SYSTEM: Given a user question, write a 3-5 sentence passage that would be a plausible excerpt from a document that answers the question. Use the vocabulary, structure, and level of detail typical of technical documentation. Don't worry about factual accuracy; focus on writing text that matches the style of real documentation. USER: [user question] ASSISTANT: [hypothetical passage]
Then: retrieval_query = embed(hypothetical_passage)
On benchmarks, HyDE typically improves retrieval metrics by 5-15% over standard dense retrieval for short queries. The gain shrinks as query length grows (long queries are already answer-shaped).
Dense retrieval matches "the shape of what you're asking for" to "the shape of documents in the corpus." A short query is a poor shape-match. An answer passage is a great shape-match. HyDE manufactures the latter from the former. That's it.
Next: Multi-query + fusion.