Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems

Similarity Score Thresholds

Apply similarity score thresholds to filter low-quality retrievals. Learn to set minimum confidence scores and understand the precision-recall trade-off.

Learning Goals

Apply similarity score thresholds for quality control
Analyze precision-recall trade-offs with score thresholds

Similarity Score Thresholds

In many production RAG applications, you don't just want the "closest" documents—you only want documents that are actually relevant. If a user asks a question that is completely unrelated to your knowledge base, a standard Top-K search will still return the $k$ best matches, even if those matches are poor.

Similarity Score Thresholds allow you to set a minimum "confidence" bar. If a document's similarity score is below the threshold, it is discarded.

Learning Goals

Configure retrievers with similarity score thresholds in LangChain.
Understand the precision-recall trade-off when setting thresholds.
Implement quality control for RAG pipelines using score filtering.

Core Concepts

1. The Confidence Bar

By setting a threshold (e.g., 0.8), you are telling the system: "Only show me documents that have at least 80% semantic similarity to the query."

2. Precision vs. Recall

High Threshold (e.g., 0.9): High Precision. You get very relevant results, but you might miss some useful info (Low Recall).
Low Threshold (e.g., 0.5): High Recall. You get almost everything related, but you'll also get a lot of irrelevant "noise" (Low Precision).

3. Normalizing Scores

Different vector stores return scores in different ranges (e.g., Cosine is 0 to 1, while L2 can be 0 to infinity). LangChain's similarity_score_threshold search type attempts to normalize these so you can use a consistent 0-to-1 scale.

Implementing Score Thresholds

Step 1

Convert your vector store into a retriever with the similarity_score_threshold search type:

1retriever = vector_store.as_retriever(
2    search_type="similarity_score_threshold",
3    search_kwargs={"score_threshold": 0.8, "k": 5}
4)

Step 2

Query the retriever like usual. It will return between 0 and $k$ documents depending on the scores:

1query = "How do I reset my API key?"
2docs = retriever.invoke(query)
3
4if not docs:
5    print("No relevant context found above the 0.8 threshold.")
6else:
7    print(f"Found {len(docs)} relevant chunks.")

Example: Handling "Out of Bounds" Queries

Imagine a medical RAG bot. If a user asks "What is the best pizza topping?", the bot shouldn't try to answer using medical documents. By setting a high similarity threshold, the retriever will return an empty list for the pizza query, allowing your application to say: "I'm sorry, I only have information about medical topics."

Common Mistakes

Threshold is too Aggressive: Setting a threshold of 0.95 might result in zero documents being returned for most queries, even if the knowledge base contains the answer. Start at 0.7 and tune based on user feedback.
Ignoring Metric Normalization: If your scores look weird (e.g., negative numbers), check if your vector store needs a specific distance metric (like cosine vs ip) to work with LangChain's normalization logic.

Recap

Score thresholds improve precision by filtering out weak matches.
Use search_type="similarity_score_threshold" in LangChain to enable this.
Threshold tuning is an iterative process: balance "knowing when you don't know" with providing enough context.

Knowledge Check

Question 1 of 3

Q1Single choice

What is the primary benefit of using a similarity score threshold?

It makes the search faster

It prevents the LLM from receiving irrelevant or low-quality context

It allows the LLM to process more tokens

Similarity Search Fundamentals

Maximum Marginal Relevance (MMR)