Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems

Hands-On: Building a Basic RAG Pipeline

Build your first complete RAG pipeline: ingest documents, create a vector store, implement similarity search with score thresholds, apply MMR for diversity, and add hybrid search with BM25.

Learning Goals

Build a complete basic RAG pipeline from scratch
Combine similarity search, MMR, and hybrid search in one application

Hands-On: Building a Basic RAG Pipeline

In this final section of Module 5, we will build a functional RAG retrieval pipeline from scratch. We will combine everything we've learned: document loading, chunking, embedding, vector storage, and advanced retrieval using LangChain's partner packages.

Learning Goals

Build a complete RAG retrieval pipeline using LangChain.
Implement a retrieval strategy that combines MMR and Hybrid Search.
Configure score thresholds to ensure only high-quality context reaches the LLM.

Core Concepts

The "Full retrieval" Architecture

To build a production-grade retriever, we will use an Ensemble approach:

Dense Branch: Chroma + OpenAI Embeddings with MMR (Diversity).
Sparse Branch: BM25 (Exact Keywords).
Fusion: EnsembleRetriever (Weighted RRF).

Integration Map

Building the Basic RAG Pipeline

Step 1

Install all necessary packages:

1pip install langchain-openai langchain-chroma rank_bm25

Step 2

1from langchain_core.documents import Document
2from langchain_text_splitters import RecursiveCharacterTextSplitter
3
4texts = [
5    "RAG is a technique to ground LLMs in external knowledge.",
6    "Embeddings represent semantic meaning as vectors.",
7    "Vector stores like Chroma handle similarity search."
8]
9
10# Wrap in Document objects
11docs = [Document(page_content=t) for t in texts]
12
13# Chunking
14splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=0)
15chunks = splitter.split_documents(docs)

Step 3

1from langchain_openai import OpenAIEmbeddings
2from langchain_chroma import Chroma
3from langchain_community.retrievers import BM25Retriever
4
5# 1. Dense Retriever (MMR)
6vector_store = Chroma.from_documents(
7    chunks, 
8    OpenAIEmbeddings(), 
9    persist_directory="./rag_db"
10)
11dense_retriever = vector_store.as_retriever(
12    search_type="mmr", 
13    search_kwargs={"k": 2}
14)
15
16# 2. Sparse Retriever (BM25)
17sparse_retriever = BM25Retriever.from_documents(chunks)
18sparse_retriever.k = 2

Step 4

1from langchain.retrievers import EnsembleRetriever
2
3retriever = EnsembleRetriever(
4    retrievers=[dense_retriever, sparse_retriever],
5    weights=[0.6, 0.4]
6)

Step 5

1query = "How do vector stores work?"
2retrieved_docs = retriever.invoke(query)
3
4print(f"Retrieved {len(retrieved_docs)} documents.")
5for i, doc in enumerate(retrieved_docs):
6    print(f"{i+1}. {doc.page_content}")

Common Mistakes

Incorrect Weights: If you find that the system is missing obvious keyword matches, increase the weight of the BM25 retriever. If it's missing semantic connections, increase the OpenAI/MMR weight.
Persistent Storage issues: When using Chroma.from_documents in a loop, ensure you aren't creating a new database folder every time. Connect to the existing one for production use.

Recap

We combined semantic and keyword search into a single pipeline.
We used MMR to ensure the LLM receives diverse information.
We leveraged LangChain's EnsembleRetriever for robust result fusion.

Knowledge Check

Question 1 of 3

Q1Single choice

Which component is responsible for combining the results of the dense and sparse retrievers?

RecursiveCharacterTextSplitter

EnsembleRetriever

OpenAIEmbeddings

Hybrid Search with BM25 and EnsembleRetriever

Contextual Compression