Coursify

Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems

Multi-Query Retriever

Use multi-query retriever to generate multiple query variations from a single user question. Improves recall by covering different phrasings and angles of the same question.

Learning Goals

  • Implement multi-query retriever for query expansion
  • Improve recall by generating and combining multiple query variations

Multi-Query Retrieval

In a standard RAG system, retrieval relies on a single user query. However, users often phrase questions in ways that are mathematically distant from the relevant documents, even if the meaning is the same. Multi-Query Retrieval automates the process of "Query Expansion" by using an LLM to generate multiple variations of the user's question from different perspectives.

By retrieving documents for all these variations, we significantly increase the surface area of our search and improve Recall (finding all relevant information).

Learning Goals

  • Explain how Multi-Query Retrieval overcomes the limitations of single-vector search.
  • Implement the MultiQueryRetriever in LangChain.
  • Configure custom prompts for query variation generation.

Core Concepts

1. The Retrieval Gap

Vector search is sensitive to specific wording. A query for "How to fix a flat tire" might have a different embedding than "Emergency roadside assistance for punctured rubber," even though they need the same answer.

2. Query Expansion logic

The LLM takes the original query and generates 3-5 alternative versions.

  • Original: "What is the return policy?"
  • Variation 1: "How many days do I have to return an item?"
  • Variation 2: "Can I get a refund for a damaged product?"
  • Variation 3: "Steps to send back a purchase."

3. Result Union

The system performs a similarity search for every generated query and takes the Union of all results (removing duplicates). This ensures that if any variation hits a relevant document, it gets included in the final context.

Multi-Query Workflow

Implementing Multi-Query Retrieval

  1. 1
    Step 1

    Multi-Query requires an LLM to generate the variations:

    1from langchain_openai import ChatOpenAI 2from langchain_chroma import Chroma 3 4llm = ChatOpenAI(temperature=0) 5retriever = vector_store.as_retriever()
  2. 2
    Step 2

    Import and initialize the high-level wrapper:

    1from langchain.retrievers.multi_query import MultiQueryRetriever 2 3advanced_retriever = MultiQueryRetriever.from_llm( 4 retriever=retriever, 5 llm=llm 6)
  3. 3
    Step 3

    The retriever now handles LLM calls and union logic internally:

    1query = "How do I optimize my RAG pipeline?" 2docs = advanced_retriever.invoke(query) 3 4print(f"Retrieved {len(docs)} unique documents using query expansion.")

Example: Customer Support Expansion

If a customer asks "Why is my internet slow?", a single query might only find articles about "bandwidth." Multi-Query might generate variations like "troubleshoot slow connection," "router placement tips," and "isp service outages," leading to a much more comprehensive answer.

Common Mistakes

  • Using a High Temperature: If the LLM's temperature is too high, it might generate creative but irrelevant variations, leading to "Semantic Drift" and pulling in noisy context. Always keep temperature at 0 for query expansion.
  • Ignoring Token Costs: Since Multi-Query performs NN searches and generates NN variations, it is more expensive than basic RAG. Use it for complex queries where recall is more important than cost.

Recap

  • Multi-Query Retrieval uses an LLM to "triangulate" relevant documents from multiple angles.
  • It is the most effective way to improve Recall in conversational AI.
  • LangChain's MultiQueryRetriever automates the generation and union logic seamlessly.

Knowledge Check

Question 1 of 3
Q1Single choice

What is the primary metric that Multi-Query Retrieval aims to improve?

Multi-Query Retriever | Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems | Coursify