Coursify

Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems

Hands-On: Building a Vector Store Application

Apply everything from this module to build a full vector store application: ingest documents into Chroma, perform similarity search with FAISS, deploy with Qdrant cloud, and implement metadata filtering.

Learning Goals

  • Build a complete multi-vector-store application
  • Implement similarity search, metadata filtering, and batch operations

Hands-On: Building a Vector Store Application

In this final section of Module 4, we will put everything we've learned into practice. We are going to build a small, functional "Knowledge Base" application using Chroma and OpenAI Embeddings via LangChain. This application will allow us to index multiple text files and perform semantically accurate searches against them using the latest LangChain Partner Packages.

Learning Goals

  • Integrate an embedding model with a vector store using the langchain-chroma partner package.
  • Index a small dataset of text documents with metadata.
  • Build a search interface to retrieve relevant chunks based on a natural language query.

Core Concepts

The "Full-Loop" Architecture

To build this app, we need three things:

  1. Data: A set of strings or text files wrapped in LangChain Document objects.
  2. Embedder: A model to turn those strings into vectors (using langchain-openai).
  3. Store: A database to hold the vectors and metadata (using langchain-chroma).

Integration Map

Building the Knowledge Base

  1. 1
    Step 1

    Ensure you have the latest partner packages installed:

    1pip install langchain-openai langchain-chroma chromadb
  2. 2
    Step 2

    Set your API key and define your knowledge base strings:

    1import os 2from langchain_openai import OpenAIEmbeddings 3from langchain_chroma import Chroma 4from langchain_core.documents import Document 5 6os.environ["OPENAI_API_KEY"] = "your-api-key-here" 7 8knowledge_base = [ 9 "Gemini CLI is an interactive agent for software engineering.", 10 "RAG stands for Retrieval-Augmented Generation.", 11 "Chroma is an open-source AI-native vector database.", 12 "Embeddings are numerical representations of meaning." 13]
  3. 3
    Step 3

    Convert strings to Document objects with metadata:

    1documents = [ 2 Document(page_content=text, metadata={"source": "manual", "id": i}) 3 for i, text in enumerate(knowledge_base) 4]
  4. 4
    Step 4

    Initialize the embedding model and create a persistent local vector store:

    1embeddings = OpenAIEmbeddings(model="text-embedding-3-small") 2 3vector_store = Chroma( 4 collection_name="rag_basics", 5 embedding_function=embeddings, 6 persist_directory="./rag_storage" 7)
  5. 5
    Step 5

    Add your documents to the store:

    1vector_store.add_documents(documents)
  6. 6
    Step 6

    Perform a similarity search and print the top result:

    1query = "Explain what RAG is." 2results = vector_store.similarity_search(query, k=1) 3 4print(f"Query: {query}") 5print(f"Top Result: {results[0].page_content}")

Example: Building an "Ask My Docs" CLI

You can extend this script into a CLI tool. By using argparse in Python, you could make a command like python search.py \"tell me about embeddings\" that instantly retrieves the relevant sentence from your database. The langchain-chroma adapter ensures that each search is fast and the results are consistent.

Common Mistakes

  • Incorrect Persistence Path: If you provide a path that your application doesn't have write access to, Chroma will fail silently or crash.
  • Missing Partner Packages: If you try to import from langchain_community, you might be using deprecated code. Always prefer langchain-chroma and langchain-openai.

Recap

  • We combined Embeddings and Vector Stores into a single functional unit using Partner Packages.
  • We used LangChain to abstract away the complexity of the database API.
  • We demonstrated that a functional, persistent vector search system can be built in fewer than 25 lines of code.

Knowledge Check

Question 1 of 3
Q1Single choice

Which class is the modern way to integrate Chroma into LangChain?