Maximum Marginal Relevance (MMR)
Use MMR to balance relevance and diversity in retrieval results. Learn how MMR prevents redundant results and improves coverage of multiple topics in a single query.
Learning Goals
- Implement MMR for diversity in retrieval
- Balance relevance and diversity with lambda parameters
Maximum Marginal Relevance (MMR)
A common problem with standard similarity search is redundancy. If your knowledge base has five documents that say nearly the same thing, a search for that topic will likely retrieve all five. This wastes the LLM's context window and often provides a narrow, repetitive viewpoint.
Maximum Marginal Relevance (MMR) solves this by balancing relevance (how well the document matches the query) with diversity (how different the document is from what has already been selected).
Learning Goals
- Define MMR and its role in reducing redundancy in RAG.
- Understand the
fetch_kvs.kparameters. - Implement MMR retrieval using LangChain's vector store interface.
Core Concepts
1. The MMR Logic
MMR works in two stages:
- Selection: Fetch a large pool of candidate documents () based on pure similarity.
- Filtering: From that pool, iteratively select documents that are similar to the query but dissimilar to the documents already chosen for the final results ().
2. The Diversity Lambda ()
Most MMR implementations use a parameter, often called (lambda), to control the balance:
- : Pure similarity (identical to standard Top-K).
- : Pure diversity (may return irrelevant documents).
- : The "sweet spot" for most RAG applications.
Visualizing MMR
Implementing MMR with LangChain
- 1Step 1
Use
search_type="mmr"and provide thefetch_kandlambda_multparameters:1retriever = vector_store.as_retriever( 2 search_type="mmr", 3 search_kwargs={ 4 "k": 3, 5 "fetch_k": 20, 6 "lambda_mult": 0.5 7 } 8) - 2Step 2
The retriever will now return 3 documents that provide the most diverse coverage of the query:
1query = "How does climate change affect farming?" 2docs = retriever.invoke(query) 3 4for doc in docs: 5 print(f"Source: {doc.metadata.get('source')}") 6 print(f"Content: {doc.page_content[:50]}")
Example: Summarizing a News Topic
If you query a database of 1,000 news articles for "Election results," a standard search might return the top 5 articles from the same news agency saying the exact same thing. MMR would fetch the top 20 candidate articles and then pick 5 that represent different outlets or different sub-topics (e.g., results by state, voter turnout, and candidate speeches).
Common Mistakes
- Setting fetch_k too low: If
fetch_kis close tok, there is no room for diversity filtering. Ensurefetch_kis at least 3-5x larger thank. - Lambda is too high or low: Always start with
0.5. If your results feel "random," increase lambda toward1.0. If they feel repetitive, decrease it toward0.3.
Recap
- MMR reduces redundancy by maximizing both relevance and diversity.
- Use
fetch_kto cast a wide net andkto select the final diverse subset. - Adjust
lambda_multto tune the relevance-diversity trade-off.
Knowledge Check
What is the primary goal of Maximum Marginal Relevance (MMR)?