KEMBAR78
GraphRAG - The Definitive Guide | PDF | Artificial Intelligence | Intelligence (AI) & Semantics
0% found this document useful (0 votes)
168 views61 pages

GraphRAG - The Definitive Guide

Graph databases have become an essential component of the generative AI (GenAI) ecosystem because they complement the strengths of large language models (LLMs). LLMs excel at processing and generating natural language, while knowledge graphs are optimized for representing structured information—capturing entities, relationships, and domain-specific context. When used together, these technologies are capable of solving problems that neither could address effectively on their own.

Uploaded by

allianztrading
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views61 pages

GraphRAG - The Definitive Guide

Graph databases have become an essential component of the generative AI (GenAI) ecosystem because they complement the strengths of large language models (LLMs). LLMs excel at processing and generating natural language, while knowledge graphs are optimized for representing structured information—capturing entities, relationships, and domain-specific context. When used together, these technologies are capable of solving problems that neither could address effectively on their own.

Uploaded by

allianztrading
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

GraphRAG: The Definitive

Guide
Patterns and Techniques for GenAI Knowledge
Graph Retrieval

With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take
advantage of these technologies long before the official release of these
titles.

Stephen Chin, Michael Hunger, and Jesús Barrasa


GraphRAG: The Definitive Guide
by Stephen Chin, Michael Hunger, and Jesús Barrasa
Copyright © 2026 Stephen Chin, Michael Hunger, and Jesús Barrasa. All
rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc. , 1005 Gravenstein Highway North,
Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales
promotional use. Online editions are also available for most titles (
http://oreilly.com ). For more information, contact our
corporate/institutional sales department: 800-998-9938 or
corporate@oreilly.com .

Editors: Jeff Bleiel and Nicole Butterfield

Production Editor: Katherine Tozer

Interior Designer: David Futato

Cover Designer: Karen Montgomery

Illustrator: Kate Dullea

August 2026: First Edition

Revision History for the Early Release


2025-10-01: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9798341630154 for release
details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc.
GraphRAG, the cover image, and related trade dress are trademarks of
O’Reilly Media, Inc.
The views expressed in this work are those of the authors and do not
represent the publisher’s views. While the publisher and the authors have
used good faith efforts to ensure that the information and instructions
contained in this work are accurate, the publisher and the authors disclaim
all responsibility for errors or omissions, including without limitation
responsibility for damages resulting from the use of or reliance on this
work. Use of the information and instructions contained in this work is at
your own risk. If any code samples or other technology this work contains
or describes is subject to open source licenses or the intellectual property
rights of others, it is your responsibility to ensure that your use thereof
complies with such licenses and/or rights.
979-8-341-63010-9
[FILL IN]
Brief Table of Contents (Not Yet
Final)

Chapter 1: Introduction to GraphRAG (available)


Chapter 2: A Basic RAG Assistant (available)
Chapter 3: Better Answers with GraphRAG (unavailable)
Chapter 4: Knowledge Graph Concepts (unavailable)
Chapter 5: Basic GraphRAG Patterns (unavailable)
Chapter 6: Putting It In Practice: Use Case (unavailable)
Chapter 7: Knowledge Graph Construction (unavailable)
Chapter 8: Agentic GraphRAG (unavailable)
Chapter 9: Integrating Advanced Graph Analytics (unavailable)
Chapter 10: Advanced GraphRAG Patterns (unavailable)
Chapter 11: Explainability and Governance (unavailable)
Chapter 12: Conclusion (unavailable)
Chapter 1. Introduction to
GraphRAG

A NOTE FOR EARLY RELEASE READERS


With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take
advantage of these technologies long before the official release of these
titles.
This will be the 1st chapter of the final book. If you’d like to be actively
involved in reviewing and commenting on this draft, please reach out to
the editor at jbleiel@oreilly.com.

Graph databases have become an essential component of the generative AI


(GenAI) ecosystem because they complement the strengths of large
language models (LLMs). LLMs excel at processing and generating natural
language, while knowledge graphs are optimized for representing structured
information—capturing entities, relationships, and domain-specific context.
When used together, these technologies are capable of solving problems
that neither could address effectively on their own.
Early GenAI systems focused heavily on prompt engineering and vector
search to improve model performance. While effective for general-purpose
queries, these approaches struggle in enterprise settings where questions
require domain understanding, relationship reasoning, or adherence to
business rules. Without structured context, LLMs often produce answers
that are vague, inconsistent, or incorrect. Knowledge graphs address this
gap by encoding domain knowledge in a structured, queryable form that can
be used to guide retrieval and enrich model responses.
This convergence has led to a new architectural approach known as
GraphRAG. The term GraphRAG is a combination of “Graph” and “RAG”
(Retrival Augmented Generation). GraphRAG systems use the semantic
structure of a knowledge graph to retrieve context that is relevant, complete,
and aligned with domain logic. This results in answers that are not only
more accurate, but also easier to explain and verify. Beyond basic retrieval,
LLMs can also use graphs as tools—issuing Cypher queries, traversing
subgraphs, or interacting with memory-like graph structures through
protocols such as MCP. As GenAI capabilities expand, the synergy between
LLMs and knowledge graphs is becoming a foundational element of
modern AI architectures.
This chapter will begin with an introduction to knowledge graphs. We’ll
look at their basic components and explain why they have become a key
tool for organizing enterprise knowledge and reasoning over structured and
unstructured data.
With this foundation in place, we will introduce the term GraphRAG and
define its scope within the context of generative AI applications. While the
term originated as a shorthand for combining knowledge graphs with
retrieval-augmented generation, it has grown to encompass a range of
system architectures and interaction patterns. This section will clarify what
is and is not considered a GraphRAG system, providing a shared
vocabulary that will be useful in the chapters that follow. Establishing this
definition will help frame the discussion and ensure that readers understand
how these systems differ from standard RAG implementations.
Finally, the chapter will present several core architectural patterns used in
GraphRAG systems, each designed to solve different types of retrieval,
ranking, and reasoning challenges. These patterns range from query
generation approaches like Text2Cypher to hybrid retrieval models that
combine vector similarity with graph-based ranking or filtering. This will
help you gain a deeper understanding of the technical decisions involved in
building GraphRAG systems and how to apply the appropriate design
depending on the use case and data structure.
Knowledge Graph Primer
To understand GraphRAG, it is important to begin with a foundational
understanding of what a knowledge graph is and why it is particularly well-
suited for representing complex, connected data. Knowledge graphs are not
just another data format; they offer a flexible, intuitive way to model
relationships and contextual meaning between entities. This structural
richness provides a natural complement to generative AI systems, enabling
more accurate, explainable, and relevant responses.
Knowledge graphs are a natural way of expressing information and have
existed since humans have been able to turn ideas into pictures. A
knowledge graph can be as simple as two words (nodes) with a line
(relationship) between them. As you add more nodes and relationships the
natural shape of a graph comes out. Properties allow you to store
information within each node, which lets you explain even more complex
business scenarios.
An example of a very simple knowledge graph is shown in Figure 1-1. The
nodes are two people and a car. To describe the nodes, properties are
attached such as the names of the people, Dan and Ann, and the brand and
model of the car. Then relationships are created between the nodes to
describe how they interact such “DRIVES” to describe the fact that Dan is
the car driver or “OWNS” to describe that Ann is the registered owner of
the car. And just like nodes, relationships can also have properties, for
example, here we see the date which Dan started driving this car.
Figure 1-1. A simple knowledge graph showing now nodes, relationships, and properties are used

The term knowledge graph was coined by Austrian linguist Edgar W.


Schneider in 1972 and has been the subject of innumerable academic
discussions and research papers since. It came into general popularity in
2012 when Google created a searchable Knowledge Graph representation of
the web that provided advantages over basic text search. Google’s
PageRank algorithm also drew inspiration from graph-based proximity
metrics, reinforcing the value of connected data.
The rise of knowledge graphs led to the development of general-purpose
graph databases, which are optimized for storing and querying highly
connected data. Unlike relational databases, which require expensive joins
to navigate relationships, graph databases traverse connections directly.
This allows for faster, more efficient querying of deep, complex data. Most
graph databases support Cypher, a declarative graph query language, and
many now adopt the ratified GQL standard, an ISO-certified language
designed specifically for graph querying.
Graph databases are widely used in domains where connections are central
to understanding. Fraud detection is a prominent example, where
identifying patterns across accounts, devices, and transactions requires
rapid traversal of multi-hop relationships. In life sciences, graphs are used
to model the complex interactions between drugs, proteins, clinical trials,
and regulations. Other common use cases include supply chain tracking,
recommendation systems, financial transaction monitoring, and social
network analysis. The ability to evolve graph schemas flexibly also makes
this technology especially useful for data science and exploratory analytics.
A visual example of a supply chain graph is shown in Figure 1-2. While the
graph is simple in design with only 12 nodes, it carries a lot of logistical
information about the movement of shipments, delivery modes, material
composition, and transport mode. This type of contextual modeling is
difficult to achieve with traditional data tables but comes naturally with a
graph.
Figure 1-2. Supply chain graph showing the logistics movement of shipments
Now that we have covered the basics of knowledge graphs, we can move on
to how they are used in generative AI systems.

GraphRAG Primer
The term GraphRAG originated as a portmanteau of “graph” and “retrieval-
augmented generation (RAG).” RAG is a popular architecture in which a
large language model (LLM) is enhanced with external knowledge through
a retrieval component. In a typical RAG system, the user’s query is used to
retrieve relevant information, often from a vector database, and that
information is passed as context to the LLM to generate a response. While
powerful, traditional RAG pipelines suffer from a number of limitations.
They often rely solely on vector similarity, struggle with noisy or
ambiguous matches, and provide little transparency into how responses are
constructed.

Beyond RAG
GraphRAG extends the standard RAG pattern by integrating a knowledge
graph into the retrieval and reasoning pipeline. Graphs offer rich, structured
representations of nodes and their relationships, allowing the system to
surface more accurate, contextual, and explainable results. Instead of
treating retrieved documents as isolated chunks of text, GraphRAG can
trace how nodes are connected, apply semantic constraints, follow
dependencies, and even reason across multiple hops. This enables more
precise and trustworthy responses, especially in domains where
relationships matter, such as enterprise knowledge management, life
sciences, and supply chains.
Over the past few years, GraphRAG has evolved far beyond its original
conception as “RAG with a graph.” It now encompasses a wide range of
architectures, tools, and design patterns, each tailored to different
information access and reasoning needs. This includes approaches that use
graphs to refine vector search results, generate structured queries from
natural language, or even perform dynamic subgraph retrieval based on
query semantics. The field has gained significant traction in both industry
and academia, with hundreds of research papers exploring various retrieval
and integration strategies.
For the purposes of this book, we adopt a broad but practical definition:

GraphRAG is defined as any generative AI system that uses a knowledge


graph to aid in information storage and retrieval.
This definition intentionally captures a spectrum of GraphRAG designs
from basic augmentation of retrieval with graph lookups to more advanced
multistep reasoning over dynamic subgraphs. The following subsection
introduces the most common GraphRAG patterns in use today, highlighting
how they differ in approach, capability, and application fit.

Core GraphRAG Patterns


Now that we have a working definition of GraphRAG, the next step is to
explore how these systems are actually implemented in practice. There are
multiple architectural patterns, each with different tradeoffs depending on
the type of data, use case, and retrieval goals. Here are a few of the core
GraphRAG patterns that have emerged in both industry and research to
improve the precision, context, and explainability of generative AI outputs.
Text2Cypher - In this pattern, the user prompt and a schema or
ontology describing the graph are passed to a language model,
which generates a Cypher query to retrieve relevant data. The
results of the query are then converted into a natural language
response. This approach is especially useful when the graph is the
authoritative source of truth and contains structured answers to
well-defined queries. (See Chapter 4 for a deeper look at
Text2Cypher.)
Graph Enhanced Retrieval - This pattern begins with a vector
similarity search to identify semantically relevant nodes or
documents. Once candidates are selected, the graph is used to
expand the context by retrieving related nodes, relationships, or
subgraphs. The enriched context is then passed to the LLM for
response generation. This approach adds semantic grounding and
improves relevance. (See Chapter 4 for details.)
Ranking of Vector Search - In this approach, an initial set of results
is retrieved via vector search, and then reranked using additional
signals from the knowledge graph—such as node centrality,
relationship depth, or metadata constraints. Reranking can be
applied before or after retrieval, and helps improve precision by
filtering or reordering results based on domain-specific logic. (See
Chapter 8 for more information.)

These are just a few of the many patterns where knowledge graphs and
LLMs can be used together to aid in retrieval. In the next section we will
delve into some of the limitations of LLMs and RAG, and how GraphRAG
can improve on the accuracy, explainability, and relevancy of results.

Limitations of LLMs
While LLMs have unlocked powerful new capabilities in natural language
understanding and generation, they also introduce a set of well-known
limitations that impact the accuracy, reliability, and trustworthiness of their
outputs. These limitations are especially important to understand when
building production-grade AI systems that need to deliver factual, precise,
and context-aware responses. Without the right safeguards and supporting
data infrastructure, LLMs often generate responses that appear confident
but are incorrect or misleading.
In this section, we look at how LLMs are trained, how they represent
knowledge using word embeddings, and why this leads to hallucinations
when precise or domain-specific information is required. This foundation
will help explain the shortcomings of standard RAG systems and set the
stage for how GraphRAG addresses these issues with structured, contextual
data.
LLMs are very good at responding to general questions, but struggle to
answer correctly when a specialized knowledge set is required or the
answer needs to be highly precise and factual. However, LLMs are eager to
provide a response even without the factual knowledge base to answer
accurately and this produces hallucinations in their response.
An LLM hallucination is any time that the model answers a question with a
response that is not grounded in information, facts, and relationships. This
is a common occurrence when building systems with LLMs, so why does
this phenomenon occur, and how can we prevent it? To better understand
why LLMs hallucinate we first need to understand how they work.
LLMs are based on a very simple algorithm that helps them find
associations in their training set and to answer questions. The first thing to
understand is the LLMs do not think in words, but rather in word vectors.
They decompose text into semantic chunks called word vectors and then
map a statistical model of the relationship between different word vectors in
multidimensional space. A good way to understand how word vectors work
is to use a tool like the nlpl.eu website to see how word vectors relate to
each other.
For example, if we take a cat as an example input, we can get a graph back
of the statistical similarity between a cat and other word vectors as shown in
Figure 1-3. In vector space a cat is similar to domestic animals like a dog or
puppy, but far away from wild animals like pigs or ferrets. However, this is
just one of many relationships that LLMs encode into word vectors. They
may also look at the size of the animals, the species of the animal, and even
how commonly they are referred to in literature. And this process is
repeated between each of the other word vectors creating a huge set of
statistical relationships that the LLM can use to evaluate text. This
multidimensional set of vector embeddings creates the LLMs’ corpus of
knowledge.
Figure 1-3. Word vectors similar to the word cat from the NLPL word embeddings online service

To further explain this, it is helpful to understand the algorithm by which


these embeddings are created. The training text is fed into a transformer that
finds the relationship between different word vectors in the input and
encodes that in vector space. Each transformer stage is specialized and
looks at a different aspect of the relationship between the input word
vectors. By passing through multiple transformer stages insights about the
text are encoded to build semantic meaning.
To illustrate this concept we can take a simple sentence and explain how
each transformer stage hypothetically could encode additional information.
For example, take the sentence “A developer’s best friend to code is AI.”
The first transformer stage may identify parts of speech like subjects and
verbs. The second transformer stage could build on this by relating the
subjects to what actions they are taking. In this example the friend refers to
“AI” and the coding is being done by both the developer and the AI. This is
shown visually in Figure 1-4.
Figure 1-4. Example of the different transformer stages that are used to build embeddings

And finally we need to explain how the embeddings are used to generate a
response to an input question. When you ask the LLM a question this is
converted to a set of corresponding word vectors. The word vectors are fed
into a neural network that uses the embeddings to guess the most likely next
word vector in the series. As illustrated in Figure 1-5 this process is
repeated to continually guess the next word vector until a full answer is
created.
Figure 1-5. Simplified diagram of the feed forward layer of an LLM that calculates the next output
word vector

With such a simple algorithm backing LLMs, how do they seem to have
human-like thinking and reasoning capabilities? There is clearly no
reasoning or intelligence in this algorithm that compares to how humans
think. The surprisingly good results that come out of LLMs, because they
have a huge training set. The GPT-3 model was trained on a corpus of 500
million words while the average human is only exposed to 100,000 words
by the age of 10. This is huge advantage in trained knowledge and the
reason why LLMs can respond to almost any topic with a seemingly
intelligent answer.
As a result of this discrepancy in training data and mastery of natural
language, LLMs appear much more capable than they actually are in
domain specific tasks and reasoning. However, they are very useful as a
communication agent with the end user since they can pull statistically
related information to respond to almost any question. So what if we could
combine the natural language and conversational fluency of LLMs with a
curated database of unstructured documents that fill the knowledge gap.
This is where retrieval augmented generation (RAG) comes into the picture.

Limitations of RAG
RAG stands for retrieval augmented generation. It is a technique that allows
developers to build applications using general purpose LLMs that are able
to answer questions outside the data set they are trained on. This makes it
possible to take advantage of the language capabilities and large general
knowledge pool of LLMs to solve domain specific business problems.
The basic flow of a RAG application is that the user prompt is intercepted
and used to query an external data source. The external data source
responds with information related to the user’s question. This additional
context plus the user’s original query is passed to the LLM and used to
compose the response to the question as shown in Figure 1-6 Because the
LLM has the additional information from the external data source it is now
able to answer questions and topics that it was never trained on.
Figure 1-6. Diagram of a RAG system showing flow from user prompt to a complete response

The most common implementation of RAG is to use vector similarity


search to look up relevant results. Vector databases are a popular choice for
this workflow, but vector search has become a standard feature of most
databases including graph databases, so a dedicated vector database is no
longer required.
The way vector similarity search works is that the data you want search on
is run through an embedding model to create embeddings for vector
similarity. This becomes the vector dataset than can be searched for
additional context. When the question comes in the from the user this is also
run through the embedding model and used to search for related data via
vector similarity algorithms. The most relevant embeddings are returned
and combined with the prompt to pass into the LLM to answer the question
as shown in Figure 1-7.
Figure 1-7. Embedding and vector search to answer a user question with relevant data

Just like the capabilities of LLMs are easily overestimated, RAG systems
appear promising at first, but often fail to deliver on business value. There
are several reasons for this highlighted in the following sections.

Limited Data Context


RAG systems search and return only a fraction of the data that is related to
the topic as context to the LLM. This is both a technical limitation of the
vector search process, but also a limitation of LLMs that have finite context
windows and also anchor on the first results presented.
As a result of the limited relevant information, the LLM often struggles to
put together a coherent response and will fall back its large and general
purpose training set in order to fill in the answer. This results in
hallucinations where the answer is not accurate since it lacks grounding in
domain specific knowledge. At best the answer appears useful and is
harmless, but often the answer can be inaccurate and mislead the user.
Some of the GraphRAG patterns we will discuss later in this title address
the context window issues by using graph topology algorithms to produce
relevant content and/or to rerank the order the results are presented to the
LLM.

Lack of Maturity
Vector databases are not as mature as traditional databases, so features
required to run a production instance such as security, compliance,
robustness, and high availability are often missing. Fortunately, vector
search has become a standard feature of most databases including graph
databases. This means you can use a reliable and high performance database
and still get the benefits of vector similarity search to give the LLM
contextual information.
Vector Similarity ≠ Relevance
The core assumption of RAG is that comparing vector similarity will result
in relevant results being returned. This is not true in general, because there
are many terms in vector space that are related, but given context are not
relevant. For example, when discussing company performance it is obvious
that references to Apple are for the company and not the fruit. Vectors are
simply encoded statistical models of closeness, so without the context of the
question and conversation it is impossible to reliably provide relevant
embeddings.
Graphs solve this problem by relationships that are relevant for the domain.
This means the application of graph similarity and graph data science
algorithms can be used to extract conceptually similar nodes and data,
greatly improving the accuracy and quality of the results.

Lack of Explainability
Vector search operates on statistical mappings that the model uses to encode
relationships and information, but is lacking in transparency and
explainability. This means that the output of the LLM becomes a black box
that is impossible to understand and explain.
In contrast, knowledge graphs are a representation that works equally well
for humans to understand as well as for machines to process. For the
example in Figure 1-8 an apple is represented in both a knowledge graph
and a vector representation. The vector embedding is simply a collection of
indecipherable probabilities in multidimensional space. However, the
knowledge graph has node names and relationship labels that make it
obvious which parts of the graph refer to the structure of the apple versus
the origin or type of fruit. Combining these two representations together
with links between the graph and embeddings gives us GraphRAG, which
is a very powerful representation that can be navigated based on both vector
similarity and also graph closeness.
Figure 1-8. Comparison of the knowledge graph and vector index view encoding of data

GraphRAG to the Rescue


GraphRAG improves on the RAG model by using the context and data
structure of a knowledge graph to retrieve more complete and relevant data
than vector similarity search can return. This helps to address the main
challenges with RAG systems, which struggle with getting highly accurate
and relevant answer.

GraphRAG Architecture
The core architecture of a GraphRAG system is very similar to the RAG
architecture we discussed earlier. The user question goes through a GenAI
application that performs data retrieval and returns additional context to
provide to the LLM. However, in addition to being able to look up context
by vector similarity, graph databases can also retrieve information by graph
closeness and traversal algorithms based on the additional context and
structure contained in the knowledge graph as shown in Figure 1-9. This
provides more relevant and accurate context to the LLM, which allows for a
higher quality answer to the end user.
Figure 1-9. Diagram showing the general architecture of a GraphRAG application

There are several different approaches and patterns for the graph database
retrieval step, but a good starting place is the graph enhanced retrieval
pattern discussed in Chapter 4, which can be summarized with the
following high-level steps:
1. Do a vector search on the input question to retrieve embeddings
and nodes
2. Retrieve the nodes in the graph that are close and related to the
embeddings
3. Pass the related nodes and embeddings to the LLM in the context
By using this approach you start with a very high response rate by the LLM
since the vector search has a wider definition of what is similar. But by
looking up related nodes from the graph you are improving the contextual
accuracy and relevancy of the results. And it is easy to layer community
groupings and more advanced graph algorithms on top of this to further
improve accuracy.

GraphRAG Performance Studies


To evaluate the effectiveness of GraphRAG in real-world settings, several
recent studies have benchmarked its performance against traditional RAG
systems across a range of domains. These include enterprise data analysis,
large-scale summarization, and customer service automation. Each study
examines how using GraphRAG by integrating a knowledge graph into the
retrieval process can improve the accuracy, efficiency, and relevance of
responses generated by LLMs.
The following sections summarize three such studies from data.world,
Microsoft Research, and LinkedIn. Together, they demonstrate that
GraphRAG is not just a conceptual improvement over standard retrieval
methods, but a practical solution to well-known challenges in applying
generative AI to structured and semi-structured data. These examples offer
concrete evidence of the advantages GraphRAG brings in terms of
accuracy, token efficiency, and performance across complex query
scenarios.
Data.world Knowledge Graph Performance
A benchmark study by data.world1 compared the effectiveness of LLMs in
answering natural language queries about a dataset in the insurance
industry, with and without the support of a knowledge graph. The study
focused on evaluating how well user questions were translated into accurate
responses when working with different types of underlying data structures.
When the LLM was used without any knowledge graph support and relied
solely on SQL-based retrieval, it achieved an overall accuracy of 16.7%.
With the addition of a knowledge graph that modeled the structure and
relationships within the data, accuracy increased to 54.2%. For queries that
involved high schema complexity, accuracy without a knowledge graph was
0%, while the graph-assisted approach reached accuracy levels between
36% and 39%.
These results demonstrate that standard RAG systems are prone to fall short
when the underlying data is highly structured or requires understanding of
complex relationships. Knowledge graphs provide an additional layer of
semantic organization that helps LLMs retrieve and interpret relevant
information more effectively. In this benchmark, knowledge graph-based
systems proved to be approximately three times more accurate than SQL-
only RAG, reinforcing their value in enterprise settings where precision and
reliability are critical.

Microsoft Research GraphRAG


Microsoft Research conducted a study on graph-based RAG2 that showed
strong evidence for the advantages of GraphRAG over traditional RAG
architectures. The research addresses a core limitation in RAG systems:
their reliance on retrieving semantically similar but often redundant or
narrowly scoped text chunks. In contrast, GraphRAG organizes retrieved
information into a structured knowledge graph that captures both the
entities and the relationships within the source corpus. These graphs are
clustered into communities using algorithms, allowing for the generation of
high-level community summaries. When a user submits a query,
GraphRAG generates partial responses from the most relevant communities
and synthesizes them into a comprehensive answer.
By using the LLM-generated knowledge graph, GraphRAG vastly
improves the “retrieval” portion of RAG, populating the context window
with higher relevance content, resulting in better answers and capturing
evidence provenance.3
—Jonathan Larson, Partner Data Architect
This structure offers two primary advantages: improved accuracy and
reduced cost. First, the quality of responses is significantly higher for
complex or broad-scope queries that require synthesis across multiple
information clusters. By reasoning over communities rather than isolated
text snippets, GraphRAG delivers answers that are more complete and
contextually relevant. Second, GraphRAG is more efficient in terms of
token usage. Rather than passing multiple large documents into the
language model, GraphRAG feeds compact, targeted summaries derived
from the graph structure. This reduces the token load at inference time,
lowering both latency and cost—particularly important when working with
large-scale datasets that exceed typical LLM context windows.
Overall, the Microsoft study demonstrates that GraphRAG is not simply an
enhancement to traditional retrieval workflows, but a foundational
improvement. By structuring data before retrieval and using the graph
topology to guide reasoning and summarization, GraphRAG systems can
deliver more accurate results with less computational overhead, making
them well-suited for enterprise-scale applications.

LinkedIn Customer Service


Traditional RAG systems trained off of historical support data struggle in
customer service applications. These systems ingest past tickets,
resolutions, and inquiries as flat, unstructured text, missing the deeper
structure and interconnections between different types of issues. As a result,
retrieval quality is limited, and generated responses often lack precision,
contextual understanding, or explainability—especially when similar issues
are expressed in different ways or span multiple problem categories.
A recent study by LinkedIn researchers:footnote[Retrieval-Augmented
Generation with Knowledge Graphs for Customer Service Question
Answering, https://arxiv.org/pdf/2404.17723] addresses these shortcomings
by applying GraphRAG to customer service question answering. Their
approach involves constructing a knowledge graph from historical support
interactions, encoding both the structure within each support case and the
relationships across similar or related cases. When a new customer query is
received, the system retrieves a subgraph relevant to the issue and uses this
structured context to inform the LLM’s response generation. The result is a
system that not only understands isolated facts but also reasons over
connections between them.
When deployed in a live setting at LinkedIn, the system reduced median
issue resolution time by 28.6% over six months. These results demonstrate
that GraphRAG is not simply an enhancement to RAG, but a necessary
evolution for domains like customer service, where efficiency, accuracy,
and relevance depend on understanding structured relationships. By
leveraging knowledge graphs, organizations can build support systems that
scale more intelligently, provide faster issue resolution, and deliver
responses grounded in institutional knowledge rather than surface-level
vector similarity.

GraphRAG in Practice: Case Studies


As organizations adopt generative AI technologies, it has become clear that
model performance alone is not enough to deliver meaningful business
outcomes. LLMs can generate fluent responses, but without access to well-
organized, context-rich data, they often produce results that lack accuracy
or relevance. To address this limitation, companies are beginning to focus
on how data is structured, connected, and made accessible to AI systems.
In this section, we will examine two case studies that illustrate the practical
application of GraphRAG. These examples from Data2 and Klarna
demonstrate how GraphRAG can be used to unify structured and
unstructured data, surface hidden relationships, and deliver context-aware
answers to complex questions. This section will help you to gain a clearer
understanding of the architectural patterns, technical components, and
organizational benefits of using knowledge graphs to improve the reasoning
and reliability of generative AI systems.

Deeper Insights at Data2


Jeff and the leadership team at Data² recognized early on that traditional
data architectures were falling short in industries like energy and defense,
domains where the most critical information is buried deep within technical
documents, operational reports, and siloed systems. These industries are
inherently complex, with dense networks of people, processes, and
infrastructure that don’t lend themselves well to relational models.
To solve this, they turned to Knowledge Graphs and Generative AI, creating
one of the first real-world implementations of GraphRAG. They constructed
a domain-specific knowledge graph in a Neo4j Graph Database that could
unify structured and unstructured data—linking entities like facilities,
suppliers, geopolitical risks, technical specifications, and regulatory
frameworks into a semantically rich graph.
“We think of the knowledge graph as a dynamic, evolving brain that
captures the full scope of an organization’s operating knowledge. Our
generative AI agent learns and reasons over top of this brain to deliver
context-relevant insights and recommendations grounded in data.”
—Jeff Dalgliesh, Chief Technology Officer of Data^2^
Their approach combined deep domain expertise in energy and defense with
cutting-edge AI techniques. They enriched their graph with insights
extracted from unstructured sources like PDFs, maintenance logs, and
operational plans using NLP and entity extraction pipelines. This allowed
them to capture relationships and dependencies that were previously
invisible.
On top of this graph, they layered a generative AI agent capable of
interacting with users through natural language. Using an LLM-to-Cypher
translation pipeline, the system enabled non-technical analysts to ask
complex questions, such as “Which energy facilities are vulnerable due to
supplier delays in conflict zones?” and receive structured, grounded, and
explainable answers.
Their GraphRAG powered cloud application, shown in Figure 1-10,
empowered analysts to explore complex “what-if” scenarios and uncover
hidden patterns across multi-hop relationships, while ensuring that every
insight remained traceable back to authoritative data.
Figure 1-10. Data2’s GraphRAG powered knowledge graph and GenAI system

Some of the ways that combining Knowledge Graphs and GenAI provided
value to Data2 include:
1. Graph as the Data Backbone: By using Neo4j as their graph
database, the team was able to organize both structured and
unstructured data into a single, semantically rich knowledge graph.
This graph provided a more natural way to represent real-world
relationships compared to traditional relational databases, making
it easier to model complex business scenarios.
2. GenAI as the Natural Language Interface: The system used large
language models to translate user questions into Cypher queries,
enabling natural language access to the knowledge graph. This
made the platform usable by non-technical users, who could ask
questions in plain English and receive accurate, structured answers
from the underlying data.
3. Contextual Reasoning Through the Graph: The knowledge graph
provided grounding and context that improved the quality of LLM
responses. By including relevant nodes and relationships in the
model’s context window, the system increased accuracy and
ensured that generated answers were based on reliable and
interpretable information.
Data²’s early investment in GraphRAG positioned them as a pioneer in AI-
assisted intelligence systems. By building a connected graph of knowledge
they moved from simple automating of answers to augmenting human
decision-making with a graph-native foundation.

Connecting Data Silos at Klarna


As Klarna scaled beyond 5,000 employees globally, it encountered a
growing challenge familiar to most modern tech companies: institutional
knowledge fragmentation. Due to a proliferation of SaaS tools for sales,
customer support, HR, finance, and engineering, all of their organizational
knowledge was locked in data silos. This made it difficult to impossible to
make data-driven decisions at scale and lead to wasted effort and time
navigating a maze of disconnected sources.
Seeing the strategic cost of this fragmentation, Klarna’s Co-Founder and
CEO, Sebastian Siemiatkowski, championed a bold solution: combine the
contextual reasoning power of LLMs with the semantic structure of
knowledge graphs. This vision led to the creation of Kiki, Klarna’s internal
AI assistant.
Built on a GraphRAG architecture, Kiki ingests structured metadata and
relationships across systems, teams, and projects into a domain-specific
knowledge graph. On top of this graph, Kiki leverages the natural language
understanding of large language models to deliver answers that are
grounded, contextual, and explainable.

Kiki brings together information across multiple disparate and siloed


systems, improves the quality of that information, and explores it,
enabling our teams to ask Kiki anything from resource needs to internal
processes to how teams should work …​It’s having a huge impact on
productivity in ways that were not possible to imagine before without
graph and Neo4j.
—Sebastian Siemiatkowski, Co-Founder and CEO of
Klarna
Rather than replacing people, Kiki augments them, enabling any employee
to access institutional knowledge previously hidden behind departmental
boundaries and SaaS data silos.
In the first year of operation, Kiki has shown impressive results with:
1200 SaaS applications eliminiated: By integrating structured and
unstructured data into a unified knowledge graph, Kiki has reduced
Klarna’s dependency on thousands of siloed tools
250K employee questions answered: Since launching in June 2023,
Kiki has become a go-to resource across the organization,
streamlining everything from onboarding to internal support
2000 daily queries processed: Kiki continues to scale with growing
demand, supported by the performance and flexibility of graph-
native architecture
87% employee adoption: 9 out of 10 employees use Klarna in
order to gain insights into the business and augment their daily
work

Klarna’s investment in Knowledge Graphs and GenAI wasn’t a proof of


concept; it was a foundational shift in how knowledge moves inside the
company. By replacing scattered data silos with a semantic interface to
organizational intelligence, Klarna has improved productivity and redefined
how modern enterprises can work at scale.

Summary
In this chapter, you have learned how knowledge graphs provide structure,
semantics, and relationships that can enhance the capabilities of LLMs. We
explored how knowledge graphs differ from traditional data representations
and why they are uniquely suited to address the limitations of standard
retrieval-augmented generation. With this understanding, you now have the
tools to recognize where structured knowledge and graph-based modeling
can add value to GenAI systems.
You have also seen how GraphRAG works in practice through a set of core
design patterns, including Text2Cypher, graph-enhanced retrieval, and
graph-based reranking. Each of these patterns provides a different path for
integrating graph databases into your GenAI architecture, depending on
your domain, data structure, and retrieval goals. Understanding these
options gives you the flexibility to tailor a solution that fits your
organization’s needs, whether that means improving the accuracy of
customer service answers, enhancing internal search, or enabling AI agents
to interact with business data more intelligently.
Finally, the case studies and benchmark results provided a view into how
organizations are already applying GraphRAG to real-world problems.
Whether you are working with structured data, a large corpus of
unstructured documents, or fragmented enterprise knowledge, GraphRAG
delivers higher accuracy, lower operational cost, and greater explainability.

1 A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model’s


Accuracy for Question Answering on Enterprise SQL Databases,
https://data.world/mstatic/assets/pdf/kg_llm_accuracy_benchmark_11132023_public.pdf
2 From Local to Global: A GraphRAG Approach to Query-Focused Summarization,
https://arxiv.org/pdf/2404.16130
3 https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-
narrative-private-data/
Chapter 2. A Basic RAG
Assistant

A NOTE FOR EARLY RELEASE READERS


With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take
advantage of these technologies long before the official release of these
titles.
This will be the 2nd chapter of the final book. If you’d like to be
actively involved in reviewing and commenting on this draft, please
reach out to the editor at jbleiel@oreilly.com.

In today’s enterprise landscape, the ability to navigate and extract


meaningful answers from organizational knowledge is becoming a core
differentiator. With GenAI systems increasingly used for decision support,
employee enablement, and intelligent search, companies are quickly
realizing that quality information retrieval is even more important than
model performance.
The next two chapters follow OMG Consulting, a fictional yet relatable
firm, as it journeys from a basic vector-search based assistant that helps
them to manage their employee skills and assignments to a knowledge-
graph-augmented retrieval system using GraphRAG. Through their story,
we’ll uncover the practical steps, technical architecture, and mindset shift
needed to move from text-chunks to connections, and from isolated facts to
contextual insights.
NOTE
Note that we’re not using any framework or library for the RAG and GraphRAG
implementation in these chapters, as those evolve quickly and we want to focus on the
concepts and techniques and show what’s happening under the hood.

Introducing OMG and the initial simple RAG


system
To improve staffing efficiency, OMG Consulting, a small consulting firm,
wanted to better match projects to consultants based on their skills, past
experiences, and industry knowledge. To support this, the team initially
implemented a basic Retrieval-Augmented Generation (RAG) chatbot that
searched across CVs, project documents, and internal write-ups to answer
staffing queries.
The team identified key use cases and questions that they wanted to cover:

Staffing queries: “Who has worked on JavaScript-heavy healthcare


projects?”
Capability mapping: “Which consultants know AWS and have
experience with insurance clients?”
Internal search: “Who can skill up on LangChain as a GenAI
framework, and has worked on data projects?”

While promising at first, their basic RAG system quickly showed


limitations. Those issues can be improved with advanced RAG techniques.
Throughout the rest of this chapter, we’ll explore:
How OMG thought about creating an internal assistant to help
consultants find relevant projects
The data sources they ingested, including CVs, project documents,
and internal documentation
The initial RAG setup using OpenAI models and ChromaDB for
vector storage
The Streamlit app they built to allow users to ask questions and
retrieve information
What questions the initial RAG system could answer and where it
struggled
How OMG iterated on their RAG system to improve retrieval and
answer quality

Example 2-1.
General RAG architecture
As discussed in Chapter 1, Retrieval Augmented Generation (RAG)
grounds the answers of an Large Language Model (LLM) in facts retrieved
from a trustworthy datasource. The most common approach for an RAG
setup relies on the assumption that relevant data has similar meaning to the
user’s question.
That’s why most basic RAG approaches, use text embedding vector search
for that kind of data retrieval.
Vector embeddings use a machine learning model - an embedding model -
to convert text (or other modalities like images, audio) into high
dimensional (e.g. 1536) vectors (arrays of floating point numbers) that
represent the meaning (or essence) of the content. Similiar phrases or
concepts are close by by distance (euclidean) or angle (cosine) in that
vector space.
The RAG setup first splits up and encodes the source document fragments
with vector embeddings (and often stores them in a vector index). At
runtime it encodes each user question into the same embedding space as the
document parts before.
And then uses vector distance to
1. R etrieve the most similar text fragments, which
2. A ugment the main instruction-prompt and the user question for
the LLM to
3. G enerate the final answer.

Data Sources
OMG had two primary types of data sources:
Unstructured documents
Resumes (PDFs, DOCX, text files)
Project tenders (PDFs)
Project reports (PDFs, DOCX, text files)
Internal documentation (Wiki pages, Confluence)

Structured sources
employee database, with skills
project database
client database
To move quickly and because most GenAI projects center on this type of
data, the team focused first on the unstructured content, ingesting resumes
and project documents to provide the data sources for their RAG assistant.
Here are two example documents from OMG’s consulting praxis that will
be ingested, one CV from “Emily Johnson” and one project report from
“NationWide Insurance”.
Both are in Markdown format, you can find them with all the other CV’s
and project reports the data/ch02/cv and data/ch02/projects
folders.
The CV contains relevant information about the consultants’ skills,
experiences, and project assignments.

# Emily Johnson

## Professional Summary
Conversational AI and Intelligent Automation Leader with 9+ years
of experience designing and implementing AI-powered conversation
systems, virtual assistants, and process automation solutions.
Expertise in natural language processing, dialogue management,
and AI governance. Proven track record of delivering innovative
conversational experiences that enhance customer engagement and
operational efficiency.

## Contact Information
- **Email:** emily.johnson@omgconsulting.com
- **Phone:** (617) 555-3976
- **Location:** Boston, MA
- **LinkedIn:** linkedin.com/in/emilyjohnson
- **GitHub:** github.com/emily-johnson-ai

## Education
- **Master of Science in Computer Science**, Carnegie Mellon
University (2014)
- Specialization in Artificial Intelligence and Natural
Language Processing
- Thesis: "Context-Aware Dialogue Systems for Customer Service
Applications"
- **Bachelor of Science in Computer Science and Linguistics**,
University of Massachusetts Amherst (2011)
- Double Major
- Graduated Summa Cum Laude

...
The project report provides details about the client, project scope, team
members, and technologies used.

# Insurance Claims Automation

## Project Overview

Intelligent automation of claims processing using AI, machine


learning, and robotic process automation. This strategic
initiative for NationWide Insurance aimed to transform their
operations in the Insurance sector, leveraging cutting-edge
technologies and methodologies to achieve significant business
impact.

## Client Profile

NationWide Insurance is a leading organization in the Insurance


industry that partnered with OMG Consulting to intelligent
automation. The project was conducted between 2020-10-01 and
2021-08-31 with a budget of 2300000-3500000.

## Team Composition

OMG Consulting deployed a cross-functional team of 4 consultants


to execute this project.

### Key Team Members and Roles

- **Project Lead**: Olivia Martinez (Supply Chain & Operations) -


Process optimization, automation strategy, operational efficiency
- **AI Implementation**: Emily Johnson (Conversational AI) - NLP
development, document processing, conversational interfaces
- **Data Science**: Sarah Johnson (Data Platforms & Analytics) -
Predictive modeling, anomaly detection, algorithm development
- **Development Lead**: Aisha Patel (Enterprise Applications) -
Automation development, system integration, workflow
implementation
...

RAG Ingestion Process


OMG implemented a prototype GenAI assistant using OpenAI models for
text embeddings (text-embedding-3-small) and as LLM (gpt-
4.1) to generate answers, and ChromaDB as a vector store.

NOTE
The models used in this chapter are OpenAI’s text-embedding-3-small and
gpt-4.1, which were available at the time of writing and might have been updated or
replaced by the time you read this.

The ingestion process followed these key steps:


1. Load the text files from various document sources
2. Split their content into manageable chunks with consistent
semantic meaning
3. Create text vector embeddings for each the chunks using an
embedding model
4. Store both embeddings and chunks in a vector database

Let’s look at the code they used for this initial implementation of the
ingestion process.
First we load the different document sources into lists.

def load_documents(base_path):
documents = []

# Load resumes from cv/*.md


resume_path = os.path.join(base_path, "cv", "*.md")
for file in glob.glob(resume_path):
try:
with open(file, "r", encoding="utf-8") as f:
documents.append({"content": f.read(), "source":
file})
print(f"Loaded {file}")
except Exception as e:
print(f"Error loading {file}: {e}")

# Load project descriptions from projects/prj_*.md


projects_path = os.path.join(base_path, "projects",
"prj_*.md")
for file in glob.glob(projects_path):
try:
with open(file, "r", encoding="utf-8") as f:
documents.append({"content": f.read(), "source":
file})
print(f"Loaded {file}")
except Exception as e:
print(f"Error loading {file}: {e}")

return documents

Then these document lists are processed and split them into text chunks. For
each text chunk a text embedding is created using OpenAI’s embedding
model, and then the text, filename, chunk position and embedding are
stored in a ChromaDB vector database.

# Text splitting - creating semantically meaningful chunks


# In the real world, you would use a more sophisticated text
splitter,
# like splitting on headers or paragraphs
def split_documents(documents, chunk_size=500, chunk_overlap=50):
chunks = []
for doc in documents:
text = doc["content"]
source = doc["source"]
for i in range(0, len(text), chunk_size - chunk_overlap):
chunk = text[i:i + chunk_size]
chunks.append({"content": chunk, "source": source})
return chunks

# Create embeddings and store in Chroma


def create_vector_store(chunks, persist_directory="./chroma_db"):
client = chromadb.PersistentClient(path=persist_directory)
collection = client.get_or_create_collection("documents")

for i, chunk in enumerate(chunks):


response = openai.embeddings.create(
input=chunk["content"],
model="text-embedding-3-small"
)
embedding = response.data[0].embedding
collection.add(
documents=[chunk["content"]],
metadatas=[{"source": chunk["source"]}],
ids=[f"chunk_{i}"],
embeddings=[embedding]
)
return collection

You can run the ingestion script to process the documents and create the
vector store.

uv run python code/ch02/rag_ingest.py data/ch02/

Next, OMG built a simple Streamlit application to serve as the user


interface for their RAG system.

Streamlit App for the OMG RAG Assistant


They created a Streamlit app to allow users to ask questions, retrieve
relevant information from the vector store and have the LLM generate the
answer.

NOTE
Streamlit is a popular framework for quickly building interactive web applications in
Python, especially for data science and machine learning projects. It works in a
straightforward way - you write a Python script that defines the UI and logic togeher,
and Streamlit automatically generates a web app from it. The script is executed in a
loop, and observes changes to the variables and UI elements, updating the web app
accordingly.

OMG’s assistant, followed the same patterns as msot RAG based chatbots:
1. User inputs a question.
2. The app generates a text embedding for the question using the
same embedding model.
3. The most releveant text chunks are retrieved by embedding
similarity search from the vector store.
4. The context is assembled from system prompt, user question and
the retrieved chunks.
5. The OpenAI LLM generates a final response based on that context.
6. The answer is displayed to the user.
7. The user can ask follow-up questions, and the process repeats.

import os
import streamlit as st
from openai import OpenAI

import chromadb

openai = OpenAI()
openai.api_key = os.environ["OPENAI_API_KEY"]
messages = [{"role": "developer", "content": """
You are a helpful assistant to help OMG consultants
to determine consultant placement,
you have information about the consultants, their
skills, and project experience as well as
past project details, customers and their
descriptions.
"""}]

@st.cache_resource
def load_vector_store(persist_directory="./chroma_db"):
client = chromadb.PersistentClient(path=persist_directory)
return client.get_or_create_collection("documents")

def similarity_search(vectorstore, query_embedding, top_k=5):


results = vectorstore.query(
query_embeddings=[query_embedding],
n_results=top_k
)
return results["documents"][0]

def generate_embedding(query):
response = openai.embeddings.create(
input=query,
model="text-embedding-3-small"
)
return response.data[0].embedding

# Generate a response using OpenAI LLM


def generate_response(context, query):
prompt = f"""
You are an assistant for OMG Consulting.
Answer the question based on the following context.
If the answer cannot be determined from the context, say so.

Context:
{context}

Question: {query}

Answer:
"""
messages.append({"role": "user", "content":
prompt.format(context=context, query=query)})
response = openai.chat.completions.create(
model="gpt-4.1",
messages=messages,
max_tokens=200
)
answer = response.choices[0].message.content
messages.append({"role": "assistant", "content": answer})
return answer

# UI setup
st.set_page_config(layout="wide")
st.title("OMG Consulting Assistant")
st.write("Ask questions about our consultants, their skills, and
project experience")

# Load vector store


with st.spinner("Loading knowledge base..."):
vectorstore = load_vector_store()

# User query section


query = st.text_input("Ask a question:")
if query:
with st.spinner("Searching for information..."):
# Generate query embedding
query_embedding = generate_embedding(query)

# Perform similarity search


search_results = similarity_search(vectorstore,
query_embedding, top_k=3)

# Assemble context from search results


context = "\n\n".join(search_results)
# Generate response
response = generate_response(context, query)

# Display response
st.write("### Answer")
st.write(response)

# Display sources (for transparency)


st.write("### Source Documents")
for i, doc in enumerate(search_results):
with st.expander(f"Source {i+1}"):
st.write(doc)
# st.write(f"Source:
{doc['metadata'].get('source', 'Unknown')}")

If you run the streamlit in the same folder where the vector database is
stored, it opens the web application and you can start asking it simple
questions.

uv run streamlit run code/c02/rag_app.py


If you ask for information like "Who is Emily Johnson?" or "What
is the Cybersecurity Transformation project about?", you will get
answers based on the ingested documents fragments, as you can
retrieve enough of them to stitch together a coherent answer.
Initial RAG Implementation - Uses and
Shortcomings
But if you ask slightly more complex questions like “Who has worked on
the Cybersecurity Transformation project?”, the LLM cannot provide an
answer, as the retrieved chunks don’t contain the relevant information. You
might even end up with the model hallucinating an answer, as it cannot find
the relevant information in the retrieved chunks but still tries to be helpful.
In these situation it is better to check if chunks were retrieved that contain
the information you are looking for, and if not, to ouput a message like “I
cannot answer that question” instead of trying to passing to the LLM which
might make up an answer.
The chunks often contain the information that the person or project exists -
which is also the relevant lookup phrase from the user’s question to find it
in the vector store in the first place, but often not the associated details that
are needed to answer the question.
As those pieces of text are no longer connected with each other, due to the
chunking and storage, their original correlation cannot be reconstructed by
the RAG system.
You can argue that in this case you could just store the whole (smallish)
document with each chunk, but that would not scale well with larger
documents and larger corpora.
While this initial RAG implementation worked for basic queries, OMG
quickly discovered its limitations for more complex information needs. The
vector-based approach struggled with questions whose structure and intent
were not exactly represented in the source data or that required connecting
information across documents (or even chunks) or understanding
relationships between entities.
These were questions like:

“Who has worked across multiple healthcare projects?”


“What technologies has Emily Johnson not used in her past
projects?”.
“What skills is Alexander Schmidt missing to staff this Global
Investment Partners migration project?

Advanced RAG Patterns


This realization would lead the team to explore advanced RAG patterns
like:
question rewriting
Rewrite user questions in the space and domain of the available
documents to improve vector retrieval.

relating chunks
Relate chunks to each other based on their position (siblings) and to
their parent documents.

hybrid retrieval
Combine vector search with keyword search to improve recall and
apply re-ranking to the results.

contextual embedding
Generate embeddings not just for the document chunks but a
combination of a summary of the document and a rewritten chunk in the
context of the domain.

multi-question retrieval
Retrieve the same question slightly rephrased multiple times to improve
recall and coverage of the vector search.

However, these techniques still fell short when it came to understanding the
deeper relationships between entities like people, projects, clients, and
skills. And especially the last techniques increased the complexity of the
RAG pipeline and the amount of data that needed to be processed by the
LLMs and embedding models without significant improvements for the
more complex questions.

The GraphRAG Approach


Many of these advanced RAG techniques actually represent workarounds to
restoring connections between contextual information. So they looked
suspiciously similar to what a graph database would be able to do easily
by representing both chunks and documents but also the entities like people,
projects, clients, and skills as nodes and relationships in a graph.
Some consultants with past experience in graph databases started looking in
the literature for advanced RAG patterns, that actually utilized graph
databases for retrieval and unsuprisingly to us, discovered GraphRAG as a
suitable technique.
So they untertook a second attempt of structuring their information into a
connected network and using retrievers that take much more context into
account when answering questions. They also wanted to see if it is possible
to not only answer content questions but also questions like “How many
people in our company have both GenAI and Healthcare experience?”
Summary
In this chapter, we followed OMG Consulting’s journey as they built their
first RAG-based AI assistant to help match consultants to projects.
We explored their initial setup, the challenges they faced with complex
queries, and the limitations of a basic vector search approach.
We also mentioned advanced RAG techniques they could try to improve
retrieval and answer quality.
Finally we introduced the idea of using a graph database to represent the
relationships between entities and documents, which would lead to the next
chapter’s focus on GraphRAG.
About the Authors
Stephen Chin is VP of Developer Relations at Neo4j, conference chair of
the LF AI & Data Foundation, and author of numerous titles with O’Reilly,
Apress, and McGraw Hill. He has given keynotes and main stage talks at
numerous conferences around the world including AI Engineer Summit, AI
DevSummit, Devoxx, DevNexus, JNation, JavaOne, Shift, Joker,
swampUP, and GIDS. Stephen is an avid motorcyclist who has done
evangelism tours in Europe, Japan, and Brazil, interviewing developers in
their natural habitat. When he is not traveling, he enjoys teaching kids how
to do AI, embedded, and robot programming together with his daughters.
Michael Hunger has been passionate about software development for more
than 35 years.
For the last 15 years, he has been working on the open source Neo4j graph
database filling many roles, most recently leading Product Innovation and
Developer Product Strategy. He especially loves to work with graph-related
projects, users, and contributors, his current focus is generative AI,
GraphRAG, cloud integrations and developer experience.
As a developer Michael enjoys many aspects of programming languages,
learning new things every day, participating in exciting and ambitious open
source projects and contributing and writing software and data related
books and articles. Michael spoke at numerous conferences and helped
organize several. His efforts in the Java Community got him accepted to the
JavaChampions program.
Michael helps kids to learn to program by running weekly girls-only coding
classes at local schools.
Dr. Jesús Barrasa is the Field CTO for AI at Neo4j, where he works with
organisations combining the power of LLMs with Knowledge Graphs. He
co-authored “Building Knowledge Graphs” (O’Reilly 2023) and is cohost
of the Going Meta live webcast (https://goingmeta.live/). Jesús holds a
Ph.D. in Artificial Intelligence/Knowledge Representation and is an active
thought leader in the KG and AI space.

You might also like