Hands-On: Building a Basic RAG Pipeline
Build your first complete RAG pipeline: ingest documents, create a vector store, implement similarity search with score thresholds, apply MMR for diversity, and add hybrid search with BM25.
Learning Goals
- Build a complete basic RAG pipeline from scratch
- Combine similarity search, MMR, and hybrid search in one application
Hands-On: Building a Basic RAG Pipeline
In this final section of Module 5, we will build a functional RAG retrieval pipeline from scratch. We will combine everything we've learned: document loading, chunking, embedding, vector storage, and advanced retrieval using LangChain's partner packages.
Learning Goals
- Build a complete RAG retrieval pipeline using LangChain.
- Implement a retrieval strategy that combines MMR and Hybrid Search.
- Configure score thresholds to ensure only high-quality context reaches the LLM.
Core Concepts
The "Full retrieval" Architecture
To build a production-grade retriever, we will use an Ensemble approach:
- Dense Branch: Chroma + OpenAI Embeddings with MMR (Diversity).
- Sparse Branch: BM25 (Exact Keywords).
- Fusion: EnsembleRetriever (Weighted RRF).
Integration Map
Building the Basic RAG Pipeline
- 1Step 1
Install all necessary packages:
1pip install langchain-openai langchain-chroma rank_bm25 - 2Step 2
1from langchain_core.documents import Document 2from langchain_text_splitters import RecursiveCharacterTextSplitter 3 4texts = [ 5 "RAG is a technique to ground LLMs in external knowledge.", 6 "Embeddings represent semantic meaning as vectors.", 7 "Vector stores like Chroma handle similarity search." 8] 9 10# Wrap in Document objects 11docs = [Document(page_content=t) for t in texts] 12 13# Chunking 14splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=0) 15chunks = splitter.split_documents(docs) - 3Step 3
1from langchain_openai import OpenAIEmbeddings 2from langchain_chroma import Chroma 3from langchain_community.retrievers import BM25Retriever 4 5# 1. Dense Retriever (MMR) 6vector_store = Chroma.from_documents( 7 chunks, 8 OpenAIEmbeddings(), 9 persist_directory="./rag_db" 10) 11dense_retriever = vector_store.as_retriever( 12 search_type="mmr", 13 search_kwargs={"k": 2} 14) 15 16# 2. Sparse Retriever (BM25) 17sparse_retriever = BM25Retriever.from_documents(chunks) 18sparse_retriever.k = 2 - 4Step 4
1from langchain.retrievers import EnsembleRetriever 2 3retriever = EnsembleRetriever( 4 retrievers=[dense_retriever, sparse_retriever], 5 weights=[0.6, 0.4] 6) - 5Step 5
1query = "How do vector stores work?" 2retrieved_docs = retriever.invoke(query) 3 4print(f"Retrieved {len(retrieved_docs)} documents.") 5for i, doc in enumerate(retrieved_docs): 6 print(f"{i+1}. {doc.page_content}")
Common Mistakes
- Incorrect Weights: If you find that the system is missing obvious keyword matches, increase the weight of the BM25 retriever. If it's missing semantic connections, increase the OpenAI/MMR weight.
- Persistent Storage issues: When using
Chroma.from_documentsin a loop, ensure you aren't creating a new database folder every time. Connect to the existing one for production use.
Recap
- We combined semantic and keyword search into a single pipeline.
- We used MMR to ensure the LLM receives diverse information.
- We leveraged LangChain's
EnsembleRetrieverfor robust result fusion.
Knowledge Check
Question 1 of 3
Q1Single choice
Which component is responsible for combining the results of the dense and sparse retrievers?