Graph RAG with Neo4j
Build Graph RAG pipelines with Neo4j for multi-hop queries. Extract entities and relationships from documents, store in a knowledge graph, and traverse relationships using Cypher queries.
Learning Goals
- Build Graph RAG pipelines with Neo4j
- Implement natural language to Cypher query translation
Graph RAG with Neo4j
Traditional RAG relies on "Vector Similarity," which finds pieces of text that look similar but doesn't understand the relationship between them. If a user asks a complex multi-hop question like "How does the CEO of Company A's interest in Bitcoin affect the supply chain of Company B?", a vector search will likely fail. Graph RAG solves this by extracting entities and their relationships into a Knowledge Graph (using Neo4j).
By combining vector search with graph traversal, we can answer deep, relational questions that standard RAG cannot.
Learning Goals
- Contrast Vector RAG with Graph RAG.
- Define a Knowledge Graph schema (Nodes and Relationships).
- Implement a basic Graph RAG retrieval loop using LangChain and Neo4j.
Core Concepts
1. The Multi-Hop Problem
Standard RAG is "Single-Hop": Query → Retrieve → Answer. Graph RAG is "Multi-Hop": Query → Find Entity A → Follow Link to Entity B → Follow Link to Entity C → Answer.
2. Entities and Relationships
- Nodes: The "Nouns" (e.g., Elon Musk, Tesla, Bitcoin).
- Relationships: The "Verbs" (e.g., CEO_OF, INVESTS_IN, PARTNER_WITH).
3. Vector + Graph (The Hybrid approach)
- Vector Search: Find the starting Node in the graph that matches the user's query.
- Cypher Query: Use the graph database's query language (Cypher) to traverse relationships from that starting node to find the actual answer.
Graph RAG Architecture
Building a Graph RAG Pipeline
- 1Step 1
Initialize the graph store using LangChain's Neo4j wrapper:
1from langchain_community.graphs import Neo4jGraph 2 3graph = Neo4jGraph( 4 url="bolt://localhost:7687", 5 username="neo4j", 6 password="password" 7) - 2Step 2
Use an LLM to extract entities and triplets from your text:
1from langchain_experimental.graph_transformers import LLMGraphTransformer 2 3llm_transformer = LLMGraphTransformer(llm=llm) 4graph_documents = llm_transformer.convert_to_graph_documents(docs) 5graph.add_graph_documents(graph_documents) - 3Step 3
LangChain can automatically convert natural language to Cypher queries:
1from langchain.chains import GraphCypherQAChain 2 3chain = GraphCypherQAChain.from_llm( 4 llm=llm, 5 graph=graph, 6 verbose=True 7) 8 9result = chain.invoke("Who is the CEO of the company that owns the most Bitcoin?")
Example: Scientific Discovery
In drug discovery, research papers are connected by chemical interactions. A vector search for "Side effects of Drug X" might miss that "Drug X inhibits Protein Y, which causes Side Effect Z." Graph RAG maps these interactions, allowing the AI to trace the chain of causality across hundreds of papers.
Common Mistakes
- Schema Explosion: If your Knowledge Graph has too many types of relationships, the LLM will struggle to write accurate Cypher queries. Keep your schema (Ontology) simple and well-defined.
- Data Inconsistency: Knowledge Graphs are hard to maintain. If a document changes, you must update the graph nodes and relationships, not just re-embed a chunk.
Recap
- Graph RAG captures relational and structural knowledge.
- Neo4j is the primary database for Graph RAG in the LangChain ecosystem.
- Combining Vector and Graph retrieval provides the highest reasoning capability for complex domains.
Knowledge Check
What is a 'Multi-Hop' query?