Coursify

Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems

RAG Architecture Deep Dive

30 mins

Explore the three-stage RAG architecture — Indexing, Retrieval, and Generation. Learn how data transforms from raw files into a dynamic AI brain.

Learning Goals

  • Identify the three primary stages of the RAG architecture.
  • Trace the data flow through Indexing (Ingest, Chunk, Embed, Store).
  • Understand the role of the Vector Database in the Retrieval stage.

The 3 Pillars of RAG

A professional RAG system is not just one script; it is a multi-stage data pipeline. To build one, you must master three distinct architectural stages:

  1. Indexing (Offline): Preparing your data. This happens before the user asks a question.
  2. Retrieval (Online): Finding the relevant data in milliseconds.
  3. Generation (Online): Using the data to create a high-quality answer.

Building a RAG Pipeline From Scratch

The Indexing (ETL) Pipeline

  1. 1
    Step 1

    Loading raw files (PDFs, Markdown, Web Pages) into the system using specialized loaders.

  2. 2
    Step 2

    Breaking large documents into smaller pieces (e.g., 500-token snippets) so the model isn't overwhelmed.

  3. 3
    Step 3

    Converting each chunk into a high-dimensional array of numbers (a vector) that represents its meaning.

  4. 4
    Step 4

    Saving the vectors and their original text in a specialized Vector Database (like Pinecone, Chroma, or Weaviate).

The Retrieval & Generation Loop

  1. 1
    Step 1

    Converting the user's natural language question into a vector using the same embedding model used in indexing.

  2. 2
    Step 2

    Finding the "Top K" (e.g., Top 5) chunks in the database that are mathematically closest to the query vector.

  3. 3
    Step 3

    Formatting the retrieved text snippets into a prompt template alongside the user's query.

  4. 4
    Step 4

    Sending the prompt to the LLM and receiving an answer backed by the retrieved evidence.

Knowledge Check

Question 1 of 2
Q1Single choice

Why is 'Chunking' necessary during the Indexing stage?