Coursify

Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems

Hands-On: Building a Basic RAG Pipeline

Build your first complete RAG pipeline: ingest documents, create a vector store, implement similarity search with score thresholds, apply MMR for diversity, and add hybrid search with BM25.

Learning Goals

  • Build a complete basic RAG pipeline from scratch
  • Combine similarity search, MMR, and hybrid search in one application

Hands-On: Building a Basic RAG Pipeline

In this final section of Module 5, we will build a functional RAG retrieval pipeline from scratch. We will combine everything we've learned: document loading, chunking, embedding, vector storage, and advanced retrieval using LangChain's partner packages.

Learning Goals

  • Build a complete RAG retrieval pipeline using LangChain.
  • Implement a retrieval strategy that combines MMR and Hybrid Search.
  • Configure score thresholds to ensure only high-quality context reaches the LLM.

Core Concepts

The "Full retrieval" Architecture

To build a production-grade retriever, we will use an Ensemble approach:

  1. Dense Branch: Chroma + OpenAI Embeddings with MMR (Diversity).
  2. Sparse Branch: BM25 (Exact Keywords).
  3. Fusion: EnsembleRetriever (Weighted RRF).

Integration Map

Building the Basic RAG Pipeline

  1. 1
    Step 1

    Install all necessary packages:

    1pip install langchain-openai langchain-chroma rank_bm25
  2. 2
    Step 2
    1from langchain_core.documents import Document 2from langchain_text_splitters import RecursiveCharacterTextSplitter 3 4texts = [ 5 "RAG is a technique to ground LLMs in external knowledge.", 6 "Embeddings represent semantic meaning as vectors.", 7 "Vector stores like Chroma handle similarity search." 8] 9 10# Wrap in Document objects 11docs = [Document(page_content=t) for t in texts] 12 13# Chunking 14splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=0) 15chunks = splitter.split_documents(docs)
  3. 3
    Step 3
    1from langchain_openai import OpenAIEmbeddings 2from langchain_chroma import Chroma 3from langchain_community.retrievers import BM25Retriever 4 5# 1. Dense Retriever (MMR) 6vector_store = Chroma.from_documents( 7 chunks, 8 OpenAIEmbeddings(), 9 persist_directory="./rag_db" 10) 11dense_retriever = vector_store.as_retriever( 12 search_type="mmr", 13 search_kwargs={"k": 2} 14) 15 16# 2. Sparse Retriever (BM25) 17sparse_retriever = BM25Retriever.from_documents(chunks) 18sparse_retriever.k = 2
  4. 4
    Step 4
    1from langchain.retrievers import EnsembleRetriever 2 3retriever = EnsembleRetriever( 4 retrievers=[dense_retriever, sparse_retriever], 5 weights=[0.6, 0.4] 6)
  5. 5
    Step 5
    1query = "How do vector stores work?" 2retrieved_docs = retriever.invoke(query) 3 4print(f"Retrieved {len(retrieved_docs)} documents.") 5for i, doc in enumerate(retrieved_docs): 6 print(f"{i+1}. {doc.page_content}")

Common Mistakes

  • Incorrect Weights: If you find that the system is missing obvious keyword matches, increase the weight of the BM25 retriever. If it's missing semantic connections, increase the OpenAI/MMR weight.
  • Persistent Storage issues: When using Chroma.from_documents in a loop, ensure you aren't creating a new database folder every time. Connect to the existing one for production use.

Recap

  • We combined semantic and keyword search into a single pipeline.
  • We used MMR to ensure the LLM receives diverse information.
  • We leveraged LangChain's EnsembleRetriever for robust result fusion.

Knowledge Check

Question 1 of 3
Q1Single choice

Which component is responsible for combining the results of the dense and sparse retrievers?

Hands-On: Building a Basic RAG Pipeline | Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems | Coursify