Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems

RAG Fusion — Multi-Query Fusion

Implement RAG Fusion: generate multiple query variations from a single question, retrieve for each, and fuse results using Reciprocal Rank Fusion (RRF) for improved recall.

Learning Goals

Implement RAG Fusion with multi-query generation
Apply Reciprocal Rank Fusion to combine retrieval results

RAG Fusion — Multi-Query Fusion

While Multi-Query retrieval (Module 6) improves recall by generating variations of a query, it doesn't solve the problem of how to optimally rank the results from all those different searches. RAG Fusion takes this a step further by using Reciprocal Rank Fusion (RRF) to mathematically fuse the results from multiple queries into a single, high-fidelity list that prioritizes documents found across multiple searches.

RAG Fusion is one of the most effective patterns for handling complex, ambiguous, or technical queries that a single vector search would miss.

Learning Goals

Define the RAG Fusion pattern and its advantages over simple Multi-Query retrieval.
Understand the mathematical intuition behind Reciprocal Rank Fusion (RRF).
Implement RAG Fusion using LangChain.

Core Concepts

1. The Ranking Problem

In Multi-Query retrieval, if Query A finds Doc 1 and Query B finds Doc 2, which one is better? Their similarity scores (0.82 vs 0.79) aren't comparable because they came from different query vectors.

2. Reciprocal Rank Fusion (RRF)

RRF solves this by looking at the rank (position) of a document in each list.

Formula: $Score = \sum_{d \in R} \frac{1}{k + rank(d)}$
Intuition: A document that appears at Rank 2 in three different searches is likely more relevant than a document that appears at Rank 1 in only one search.

3. The RAG Fusion Loop

Generate: LLM creates 4-5 query variations.
Retrieve: Perform vector search for each variation.
Fuse: Apply RRF to combine all results into one list.
Generate: Send the top fused results to the LLM for the final answer.

RAG Fusion Architecture

Implementing RAG Fusion

Step 1

Create a simple chain to generate 4 variations of a user question:

1from langchain_core.prompts import ChatPromptTemplate
2from langchain_core.output_parsers import StrOutputTemplate
3
4prompt = ChatPromptTemplate.from_template(
5    "Generate 4 variations of the user question: {question}"
6)
7# ... (LLM setup)
8query_gen_chain = prompt | llm | StrOutputParser() | (lambda x: x.split('\
9'))

2
Step 2
Fetch candidates for each variation. This is usually done with a standard vector store retriever.

Step 3

Use the reciprocal_rank_fusion algorithm (often implemented as a custom function in LangChain LCEL) to combine the ranked lists.

1def reciprocal_rank_fusion(results: list[list], k=60):
2    fused_scores = {}
3    for docs in results:
4        for rank, doc in enumerate(docs):
5            if doc.page_content not in fused_scores:
6                fused_scores[doc.page_content] = 0
7            fused_scores[doc.page_content] += 1 / (rank + k)
8    # ... (Sort and return documents)

4
Step 4
Combine query generation, retrieval, and fusion into one execution flow using LCEL.

Example: Searching for Obscure Acronyms

If a user searches for "RRF in RAG," a standard search might fail if "RRF" isn't in the embeddings' core vocabulary. RAG Fusion would generate variations like "Reciprocal Rank Fusion in AI search" or "how to combine multiple retriever results," which are much more likely to find the relevant documentation.

Common Mistakes

K Parameter in RRF: In the RRF formula, $k$ is a smoothing constant (usually 60). Setting it too low makes the top rankers too dominant; setting it too high makes the rankings too flat.
Redundant Context: RAG Fusion can pull in many documents. Ensure you still use a top_k filter after fusion to avoid overloading the LLM's context window.

Recap

RAG Fusion improves on Multi-Query by mathematically optimizing the final ranking.
RRF is a "Score-Agnostic" way to combine results from different retrieval passes.
This pattern is highly resilient to poor user phrasing and gaps in embedding models.

Knowledge Check

Question 1 of 3

Q1Single choice

How does RAG Fusion differ from standard Multi-Query retrieval?

It uses fewer queries

It uses Reciprocal Rank Fusion (RRF) to combine results into a single optimized list

It doesn't use a vector store

Cross-Encoder Re-Ranking

HyDE — Hypothetical Document Embeddings