Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems

Maximum Marginal Relevance (MMR)

Use MMR to balance relevance and diversity in retrieval results. Learn how MMR prevents redundant results and improves coverage of multiple topics in a single query.

Learning Goals

Implement MMR for diversity in retrieval
Balance relevance and diversity with lambda parameters

Maximum Marginal Relevance (MMR)

A common problem with standard similarity search is redundancy. If your knowledge base has five documents that say nearly the same thing, a search for that topic will likely retrieve all five. This wastes the LLM's context window and often provides a narrow, repetitive viewpoint.

Maximum Marginal Relevance (MMR) solves this by balancing relevance (how well the document matches the query) with diversity (how different the document is from what has already been selected).

Learning Goals

Define MMR and its role in reducing redundancy in RAG.
Understand the fetch_k vs. k parameters.
Implement MMR retrieval using LangChain's vector store interface.

Core Concepts

1. The MMR Logic

MMR works in two stages:

Selection: Fetch a large pool of candidate documents ( $fetch\_k$ ) based on pure similarity.
Filtering: From that pool, iteratively select documents that are similar to the query but dissimilar to the documents already chosen for the final results ( $k$ ).

2. The Diversity Lambda ( $\lambda$ )

Most MMR implementations use a parameter, often called $\lambda$ (lambda), to control the balance:

$\lambda = 1$ : Pure similarity (identical to standard Top-K).
$\lambda = 0$ : Pure diversity (may return irrelevant documents).
$\lambda = 0.5$ : The "sweet spot" for most RAG applications.

Visualizing MMR

Implementing MMR with LangChain

Step 1

Use search_type="mmr" and provide the fetch_k and lambda_mult parameters:

1retriever = vector_store.as_retriever(
2    search_type="mmr",
3    search_kwargs={
4        "k": 3, 
5        "fetch_k": 20, 
6        "lambda_mult": 0.5
7    }
8)

Step 2

The retriever will now return 3 documents that provide the most diverse coverage of the query:

1query = "How does climate change affect farming?"
2docs = retriever.invoke(query)
3
4for doc in docs:
5    print(f"Source: {doc.metadata.get('source')}")
6    print(f"Content: {doc.page_content[:50]}")

Example: Summarizing a News Topic

If you query a database of 1,000 news articles for "Election results," a standard search might return the top 5 articles from the same news agency saying the exact same thing. MMR would fetch the top 20 candidate articles and then pick 5 that represent different outlets or different sub-topics (e.g., results by state, voter turnout, and candidate speeches).

Common Mistakes

Setting fetch_k too low: If fetch_k is close to k, there is no room for diversity filtering. Ensure fetch_k is at least 3-5x larger than k.
Lambda is too high or low: Always start with 0.5. If your results feel "random," increase lambda toward 1.0. If they feel repetitive, decrease it toward 0.3.

Recap

MMR reduces redundancy by maximizing both relevance and diversity.
Use fetch_k to cast a wide net and k to select the final diverse subset.
Adjust lambda_mult to tune the relevance-diversity trade-off.

Knowledge Check

Question 1 of 3

Q1Single choice

What is the primary goal of Maximum Marginal Relevance (MMR)?

To make the search faster

To reduce redundancy and increase diversity in retrieved documents

To increase the number of documents retrieved

Similarity Score Thresholds

Hybrid Search with BM25 and EnsembleRetriever

Maximum Marginal Relevance (MMR)

Learning Goals

Maximum Marginal Relevance (MMR)

Learning Goals

Core Concepts

1. The MMR Logic

2. The Diversity Lambda (λ\lambdaλ)

Visualizing MMR

Implementing MMR with LangChain

Example: Summarizing a News Topic

Common Mistakes

Recap

Knowledge Check

2. The Diversity Lambda ( $\lambda$ )