Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems

Hybrid Search with BM25 and EnsembleRetriever

Build hybrid search combining dense vector similarity with sparse BM25 keyword matching. Use LangChain's EnsembleRetriever to weight and combine both approaches.

Learning Goals

Build hybrid search using dense vectors and BM25
Implement EnsembleRetriever with configurable weights

Hybrid Search with BM25 and EnsembleRetriever

In the previous lessons, we focused on Dense Retrieval (using vector embeddings). Dense retrieval is great at capturing semantic meaning, but it can sometimes fail on specific keyword searches (like technical IDs, product names, or rare acronyms).

Hybrid Search combines the strengths of Dense Retrieval with Sparse Retrieval (like BM25, which is based on keyword frequency). LangChain's EnsembleRetriever makes it easy to fuse these two approaches together to get the best of both worlds.

Learning Goals

Define Sparse vs. Dense retrieval and why they are complementary.
Implement a BM25Retriever for keyword-based search.
Combine multiple retrievers using LangChain's EnsembleRetriever.

Core Concepts

1. Dense vs. Sparse

Dense (Vector Search): Finds "Meaning." Query: "How to fix a vehicle" matches "Automobile repair guide."
Sparse (Keyword Search): Finds "Exact Words." Query: "XJ-9000 motherboard" matches documents containing that exact part number.

2. Reciprocal Rank Fusion (RRF)

How do you combine a vector score (e.g., 0.82) with a BM25 score (e.g., 14.5)? You can't just add them. EnsembleRetriever uses Reciprocal Rank Fusion (RRF), which looks at the rank of documents across both lists and re-orders them based on a weighted average of their positions.

Visualizing Hybrid Fusion

Implementing Hybrid Search

Step 1

Install the rank_bm25 package: pip install rank_bm25. Then initialize it with your document chunks:

1from langchain_community.retrievers import BM25Retriever
2
3# bm25 needs the raw text chunks
4sparse_retriever = BM25Retriever.from_texts(all_text_chunks)
5sparse_retriever.k = 3

2
Step 2
Initialize your standard vector store retriever:

1dense_retriever = vector_store.as_retriever(search_kwargs={"k": 3})

Step 3

Combine them and assign weights (e.g., 70% Dense, 30% Sparse):

1from langchain.retrievers import EnsembleRetriever
2
3ensemble_retriever = EnsembleRetriever(
4    retrievers=[dense_retriever, sparse_retriever],
5    weights=[0.7, 0.3]
6)

Step 4

1docs = ensemble_retriever.invoke("How to reset XJ-9000?")

Example: Technical Support Bot

Imagine a user asks: "My server is throwing a 502 error."

Dense Search might find articles about general "connection issues" or "network timeouts."
Sparse Search (BM25) will specifically find the troubleshooting guide for "502 error."
Hybrid Search ensures that the user gets the guide for the specific error code while also considering the broader context of server connectivity.

Common Mistakes

Equal Weights for all use cases: If your data is highly technical (full of part numbers and codes), increase the weight of the Sparse retriever. If it's conversational, favor the Dense retriever.
Forgetting to update BM25: Unlike vector stores which are dynamic, BM25Retriever.from_texts creates a static index in memory. If your knowledge base changes, you must recreate the BM25 retriever.

Recap

Hybrid search combines semantic (Dense) and keyword (Sparse) retrieval.
BM25 is the standard algorithm for keyword-based retrieval in RAG.
EnsembleRetriever uses RRF to merge results from multiple sources into a single, high-quality list.

Knowledge Check

Question 1 of 3

Q1Single choice

Which retrieval method is better for finding a specific product code like 'SKU-4021'?

Dense Retrieval (Embeddings)

Sparse Retrieval (BM25/Keywords)

MMR Retrieval

Maximum Marginal Relevance (MMR)

Hands-On: Building a Basic RAG Pipeline