Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems

Similarity Search Fundamentals

Master the fundamentals of similarity search with configurable k parameters. Learn L2, cosine, and inner product distance metrics and when to use each.

Learning Goals

Implement basic similarity search with configurable k
Compare L2, cosine, and inner product distance metrics

Similarity Search Fundamentals

The heart of every RAG system is Similarity Search. Unlike a traditional database that looks for exact keyword matches, a vector database looks for mathematical neighbors. To do this effectively, we must understand how "closeness" is calculated between vectors.

In this lesson, we will explore the three primary distance metrics used in retrieval and how to implement a basic search using LangChain.

Learning Goals

Compare L2 (Euclidean), Cosine Similarity, and Inner Product metrics.
Understand how k (number of neighbors) impacts retrieval precision.
Implement a basic similarity search using LangChain's vector store interface.

Core Concepts

1. The Geometry of Meaning

When we convert text into a vector, we are placing it at a specific coordinate in a high-dimensional space. "Similarity" is simply the measure of how much distance is between two coordinates.

2. Primary Distance Metrics

Metric	Description	Best Use Case
L2 (Euclidean)	Measures the straight-line distance between two points.	Good for general purpose; sensitive to vector magnitude.
Cosine Similarity	Measures the angle between two vectors.	Standard for RAG; ignores text length/magnitude and focuses on direction (meaning).
Inner Product	Measures the projection of one vector onto another.	Extremely fast; often used with normalized vectors (where it equals Cosine).

3. Top-K Retrieval

In RAG, we don't just want the single closest document; we usually want the top 3, 5, or 10. This is known as Top-K Retrieval.

Higher K: More context for the LLM, but higher risk of noise and higher token costs.
Lower K: More precise, but might miss critical information if it was split across multiple chunks.

Visualizing Vector Distance

Basic Search with LangChain

Step 1

Assuming you have an existing Chroma or Pinecone store from Module 4:

1from langchain_chroma import Chroma
2from langchain_openai import OpenAIEmbeddings
3
4vector_store = Chroma(
5    collection_name="my_docs",
6    embedding_function=OpenAIEmbeddings()
7)

Step 2

The simplest way to retrieve data is using the similarity_search method:

1query = "What are the benefits of RAG?"
2docs = vector_store.similarity_search(query, k=3)
3
4for doc in docs:
5    print(f"Content: {doc.page_content[:100]}...")

Step 3

Sometimes you need to see the mathematical distance to judge quality:

1# returns List[Tuple[Document, float]]
2results = vector_store.similarity_search_with_score(query, k=3)
3
4for doc, score in results:
5    print(f"Score: {score:.4f} | Content: {doc.page_content[:50]}")

Common Mistakes

Mismatching Metrics: If your vector database was created using L2 distance, but your code assumes Cosine Similarity, your "scores" will be confusing. Always check your vector store's configuration.
Setting K too High: Providing 20 chunks to an LLM might exceed its context window or cause it to lose track of the most relevant information (the "Lost in the Middle" phenomenon).

Recap

Similarity search is based on mathematical distance in vector space.
Cosine Similarity is the industry standard for text RAG.
LangChain's similarity_search provides a unified API to query any vector database.

Knowledge Check

Question 1 of 3

Q1Single choice

Which distance metric is generally preferred for text embeddings because it ignores the magnitude (length) of the vectors?

L2 (Euclidean) Distance

Cosine Similarity

Manhattan Distance

Hands-On: Building a Vector Store Application

Similarity Score Thresholds