Retrieval-Augmented Generation (RAG) — From Fundamentals to Production-Ready Agentic RAG Systems

Implementing Embeddings with LangChain

Learn to use LangChain's embedding primitives: OpenAIEmbeddings, HuggingFaceEmbeddings, and OllamaEmbeddings. Implement embed_documents and embed_query with proper configuration.

Learning Goals

Implement embed_documents and embed_query with LangChain
Configure embedding models with API keys, dimensions, and batch sizes

Implementing Embeddings with LangChain

Theory is important, but as an AI engineer, you need to know how to implement these models in your code. LangChain provides a standardized Embeddings interface that makes it easy to switch between different model providers with minimal code changes.

To get the most up-to-date features and stability, you should always use the official LangChain Partner Packages for providers like OpenAI and Hugging Face.

Learning Goals

Implement OpenAIEmbeddings and HuggingFaceEmbeddings using partner packages.
Understand the difference between embed_documents and embed_query.
Configure embedding models with API keys and batch sizes.

Core Concepts

1. The Standard Interface

LangChain's Embeddings class is a wrapper around various model APIs. Whether you use OpenAI, Hugging Face, or a local model, the methods remain the same.

2. Core Methods

embed_documents(texts): Takes a list of strings (your chunks) and returns a list of vectors. This is used during the indexing phase.
embed_query(text): Takes a single string (the user's question) and returns a single vector. This is used during the retrieval phase.

Why two methods?

Some models use different mathematical approaches for documents vs. queries to improve accuracy (Asymmetric Embedding). LangChain handles this distinction for you.

Implementation Workflow

Step-by-Step Walkthrough: Code Implementation

1. Using OpenAI Embeddings

Install the partner package: pip install -U langchain-openai.

1from langchain_openai import OpenAIEmbeddings
2
3# Initialize the model
4embeddings = OpenAIEmbeddings(
5    model="text-embedding-3-small",
6    dimensions=768 # Optional: shorten if using v3 models
7)
8
9# Embed a single query
10query_vector = embeddings.embed_query("How do embeddings work?")
11print(f"Vector length: {len(query_vector)}")

2. Using Open-Source (Hugging Face)

Install the partner package: pip install -U langchain-huggingface sentence-transformers.

1from langchain_huggingface import HuggingFaceEmbeddings
2
3# Initialize with a model from Hugging Face
4# This downloads and runs the model on your local CPU/GPU
5embeddings = HuggingFaceEmbeddings(
6    model_name="BAAI/bge-small-en-v1.5",
7    model_kwargs={'device': 'cpu'} # Use 'cuda' for GPU
8)
9
10# Embed multiple documents
11docs = ["Embeddings are vectors.", "RAG uses retrieval."]
12vectors = embeddings.embed_documents(docs)
13print(f"Embedded {len(vectors)} documents")

Example: Batch Processing

When indexing thousands of documents, you should use the batch_size parameter in LangChain to avoid API timeouts and optimize network throughput.

Common Mistakes

Hardcoding API Keys: Always use environment variables (os.environ["OPENAI_API_KEY"]).
Mismatching Models: If you index your documents with Model A, you must use Model A to embed your queries. Switching models will break your retrieval completely.

Recap

LangChain provides a unified interface for all embedding providers.
Always prefer Partner Packages like langchain-openai and langchain-huggingface.
Use embed_documents for your knowledge base and embed_query for the user's question.

Knowledge Check

Question 1 of 3

Q1Single choice

Which method should you use to convert a list of document chunks into vectors for storage?

embed_query

embed_documents

calculate_vectors

Using the MTEB Leaderboard

Dimensionality and Storage Trade-offs