Home›Expertise›RAGS to Riches›HyDE (Hypothetical Document Embeddings)

HyDE (Hypothetical Document Embeddings)

📖 4 min readUpdated 2026-04-18

HyDE (Hypothetical Document Embeddings) is a retrieval technique with a counterintuitive core idea: generate a fake answer to the user's query with an LLM, embed that fake answer, and retrieve real documents similar to it. The LLM's hallucination serves as a better query representation than the original question. It works surprisingly well.

Why it works

Dense retrieval matches query embeddings to document embeddings. A short query is a weak representation, it matches questions better than answers. But your documents are answers. A hallucinated "answer-shaped" text is a stronger representation of what you're looking for, so it matches better.

The LLM's factual accuracy doesn't matter for retrieval, only its structural and vocabulary similarity to real documents.

The flow

1. User query: "How do I set up OAuth with Google?"
2. LLM prompt: "Write a short passage that would answer this question"
3. LLM output: "To set up OAuth with Google, first create a project in
   the Google Cloud Console. Then enable the Google+ API, configure
   OAuth consent, and generate client credentials. Use these credentials
   in your application's OAuth flow..."
4. Embed the LLM output (not the query)
5. Retrieve documents similar to the hypothetical passage
6. Pass real retrieved docs to final generation

When HyDE wins

Queries are short and retrieval quality is noisy
The corpus is answer-shaped (documentation, encyclopedias, manuals)
Vocabulary mismatch between queries and documents
No labeled query-doc pairs are available to fine-tune an embedding model

When HyDE helps vs doesn't

Queries are already detailed and verbose
Corpus has strong BM25-friendly keyword overlap with queries
Latency budget is tight (HyDE adds an LLM call)
The corpus domain is unfamiliar to the LLM (it can't hallucinate plausibly)

Cost and latency

HyDE adds one LLM call before retrieval. With a small fast model, this is 100-500ms. For real-time RAG, weigh the latency cost against quality gain.

Mitigation: use the cheapest fast model (Haiku, GPT-4o-mini, Flash) for the hypothetical generation. Accuracy of the hallucination isn't critical, structural similarity is.

Variants

Query + HyDE combined

Retrieve using both the original query embedding and the HyDE embedding. Fuse results with RRF. Often better than either alone.

Multiple hypothetical documents

Generate 3-5 hypothetical passages, embed each, retrieve with each, union. More coverage but more cost.

HyDE with reasoning

Prompt the LLM to first reason about what kinds of documents would answer the question, then write one. Produces higher-quality hypothetical passages for complex queries.

The hallucination risk

HyDE generates fake content. The concern: what if the fake passage anchors the user's understanding or biases downstream generation?

The safeguard: the fake passage is only used for retrieval, never shown to the user and never passed to the final generation step. Final generation uses the real retrieved documents. HyDE is invisible to everything after retrieval.

Implementation

SYSTEM: Given a user question, write a 3-5 sentence passage that would
be a plausible excerpt from a document that answers the question. Use
the vocabulary, structure, and level of detail typical of technical
documentation. Don't worry about factual accuracy; focus on writing
text that matches the style of real documentation.

USER: [user question]

ASSISTANT: [hypothetical passage]

Then: retrieval_query = embed(hypothetical_passage)

Measured gains

On benchmarks, HyDE typically improves retrieval metrics by 5-15% over standard dense retrieval for short queries. The gain shrinks as query length grows (long queries are already answer-shaped).

The mental model

Dense retrieval matches "the shape of what you're asking for" to "the shape of documents in the corpus." A short query is a poor shape-match. An answer passage is a great shape-match. HyDE manufactures the latter from the former. That's it.

What to do with this

HyDE is a quick win for short-query corpora; add one LLM call upstream.
Use a cheap fast model for the hypothetical; accuracy doesn't matter, shape does.
Never show the hypothetical to the user or pass it to final generation.