Coursify

Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems

HyDE — Hypothetical Document Embeddings

Build HyDE pipelines that use LLMs to generate hypothetical answers before retrieval. Bridge the query-document semantic gap for better RAG results.

Learning Goals

  • Build HyDE pipelines for bridging semantic gaps
  • Implement two-step hypothetical generation and retrieval

HyDE — Hypothetical Document Embeddings

Vector search works by comparing the embeddings of a query and a document. However, queries and documents are fundamentally different: queries are short questions, while documents are long, informative answers. This "Asymmetry" can lead to poor matches. HyDE (Hypothetical Document Embeddings) bridges this gap by using an LLM to generate a "fake" (hypothetical) answer to the user's question, and then using that fake answer as the query for the vector database.

By searching with an informative answer instead of a short question, HyDE significantly improves semantic matching.

Learning Goals

  • Explain the semantic gap between queries and documents in RAG.
  • Define the two-step HyDE process (Generation → Retrieval).
  • Implement a HyDE pipeline using LangChain.

Core Concepts

1. The Asymmetry Problem

  • Query: "How do I fix a leaky faucet?" (Vector represents a need).
  • Document: "To repair a dripping tap, first turn off the main water valve..." (Vector represents a solution). In high-dimensional space, the "Need" vector might be far from the "Solution" vector.

2. The HyDE Solution

HyDE transforms the "Need" into a "Solution" before searching.

  1. Generate: The LLM receives the query and generates a plausible (but potentially hallucinated) answer.
  2. Embed: The hallucinated answer is embedded into a vector.
  3. Retrieve: The database returns real documents that are semantically similar to the hypothetical answer.

HyDE Workflow

Implementing HyDE with LangChain

  1. 1
    Step 1
    1from langchain_openai import ChatOpenAI, OpenAIEmbeddings 2 3llm = ChatOpenAI(temperature=0) 4embeddings = OpenAIEmbeddings()
  2. 2
    Step 2

    Build a chain that generates the fake document:

    1from langchain_core.prompts import ChatPromptTemplate 2from langchain_core.output_parsers import StrOutputParser 3 4hyde_prompt = ChatPromptTemplate.from_template( 5 "Please write a detailed technical paragraph answering this question: {question}" 6) 7hyde_chain = hyde_prompt | llm | StrOutputParser()
  3. 3
    Step 3

    Use a custom function or a chain to perform the two-step search:

    1def hyde_retriever(query): 2 # 1. Generate hypothetical doc 3 fake_doc = hyde_chain.invoke({"question": query}) 4 # 2. Search using the fake doc as the embedding query 5 return vector_store.similarity_search(fake_doc, k=3)

Example: Zero-Shot Domain Adaptation

If you ask a RAG system about a niche topic it wasn't specifically trained on, a basic search might fail. HyDE works well here because the LLM can use its general knowledge to "imagine" what a relevant document would look like, which is often enough to guide the vector search to the correct technical manual in your database.

Common Mistakes

  • Trusting the Hypothetical Doc: Never show the hypothetical document to the user. It is purely an "Internal Map" used for retrieval. The final answer must always be grounded in the real documents found.
  • Using Weak LLMs: If the LLM generating the hypothetical answer is poor (e.g., it produces random text), the resulting vector will be useless. Use a high-quality reasoning model for the generation step.

Recap

  • HyDE aligns queries with documents by creating a "bridge" hypothetical answer.
  • It is excellent for handling broad or abstract questions where keyword overlap is low.
  • The pattern relies on the "Vector Proximity" between two informative answers.

Knowledge Check

Question 1 of 3
Q1Single choice

What is the primary purpose of generating a 'fake' document in HyDE?