Coursify

Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems

Similarity Score Thresholds

Apply similarity score thresholds to filter low-quality retrievals. Learn to set minimum confidence scores and understand the precision-recall trade-off.

Learning Goals

  • Apply similarity score thresholds for quality control
  • Analyze precision-recall trade-offs with score thresholds

Similarity Score Thresholds

In many production RAG applications, you don't just want the "closest" documents—you only want documents that are actually relevant. If a user asks a question that is completely unrelated to your knowledge base, a standard Top-K search will still return the kk best matches, even if those matches are poor.

Similarity Score Thresholds allow you to set a minimum "confidence" bar. If a document's similarity score is below the threshold, it is discarded.

Learning Goals

  • Configure retrievers with similarity score thresholds in LangChain.
  • Understand the precision-recall trade-off when setting thresholds.
  • Implement quality control for RAG pipelines using score filtering.

Core Concepts

1. The Confidence Bar

By setting a threshold (e.g., 0.8), you are telling the system: "Only show me documents that have at least 80% semantic similarity to the query."

2. Precision vs. Recall

  • High Threshold (e.g., 0.9): High Precision. You get very relevant results, but you might miss some useful info (Low Recall).
  • Low Threshold (e.g., 0.5): High Recall. You get almost everything related, but you'll also get a lot of irrelevant "noise" (Low Precision).

3. Normalizing Scores

Different vector stores return scores in different ranges (e.g., Cosine is 0 to 1, while L2 can be 0 to infinity). LangChain's similarity_score_threshold search type attempts to normalize these so you can use a consistent 0-to-1 scale.

Implementing Score Thresholds

  1. 1
    Step 1

    Convert your vector store into a retriever with the similarity_score_threshold search type:

    1retriever = vector_store.as_retriever( 2 search_type="similarity_score_threshold", 3 search_kwargs={"score_threshold": 0.8, "k": 5} 4)
  2. 2
    Step 2

    Query the retriever like usual. It will return between 0 and kk documents depending on the scores:

    1query = "How do I reset my API key?" 2docs = retriever.invoke(query) 3 4if not docs: 5 print("No relevant context found above the 0.8 threshold.") 6else: 7 print(f"Found {len(docs)} relevant chunks.")

Example: Handling "Out of Bounds" Queries

Imagine a medical RAG bot. If a user asks "What is the best pizza topping?", the bot shouldn't try to answer using medical documents. By setting a high similarity threshold, the retriever will return an empty list for the pizza query, allowing your application to say: "I'm sorry, I only have information about medical topics."

Common Mistakes

  • Threshold is too Aggressive: Setting a threshold of 0.95 might result in zero documents being returned for most queries, even if the knowledge base contains the answer. Start at 0.7 and tune based on user feedback.
  • Ignoring Metric Normalization: If your scores look weird (e.g., negative numbers), check if your vector store needs a specific distance metric (like cosine vs ip) to work with LangChain's normalization logic.

Recap

  • Score thresholds improve precision by filtering out weak matches.
  • Use search_type="similarity_score_threshold" in LangChain to enable this.
  • Threshold tuning is an iterative process: balance "knowing when you don't know" with providing enough context.

Knowledge Check

Question 1 of 3
Q1Single choice

What is the primary benefit of using a similarity score threshold?