0% found this document useful (0 votes)

392 views12 pages

RAG Understanding PDF

The document is a comprehensive guide to Retrieval-Augmented Generation (RAG) architectures, detailing its core components, use cases, and best practices for implementation. It outlines various RAG architectures suitable for different applications, along with practical tips for document processing, embedding selection, and retrieval strategies. Additionally, it addresses common challenges, evaluation metrics, and provides a practical implementation guide for building RAG systems.

Uploaded by

elmansouri.aya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

392 views12 pages

RAG Understanding PDF

Uploaded by

elmansouri.aya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

RAG Cheat Sheet

A comprehensive visual guide to Retrieval-Augmented Generation architectures, implementation,

and best practices

What is RAG? When to Use RAG

Retrieval-Augmented Generation (RAG) is an When you need factual accuracy beyond
✓
AI architecture that enhances Large the LLM's training data
Language Models (LLMs) by combining them
When working with domain-specific or
with external knowledge retrieval systems. ✓
proprietary information
Rather than relying solely on the model's
internal parameters, RAG allows LLMs to When information needs to be up-to-date
✓
access, retrieve, and use up-to-date and verifiable
information from external databases before When transparency and citation of
generating responses. ✓
sources matters

When you need to reduce hallucinations

Core Components: ✓
in LLM outputs

📄 Basic RAG Flow:

Document Processing
Document Collection → Document
Converting documents into embeddings
1 Chunking → Embedding Generation
→ Vector Database Storage

🗄️ 2
User Query → Query Embedding →
Similarity Search → Relevant
Vector Database
Document Retrieval
Storing embedded documents

🔍
Retrieved Documents + Original
3 Query → LLM Generation →
Response

Retriever
Finding relevant documents
🤖
Generator
Creating accurate responses

10 RAG Architectures Compared

Beginner Intermediate

1. Standard RAG 2. Corrective RAG

User Query → Query Processing → User Query → Initial Response →

Retrieval → Document Selection → Error Detection → Retrieval →
Context Integration → LLM Response Response Correction → Final
Response
When to Use: For basic question-answering
systems needing external knowledge When to Use: High-precision use cases
where accuracy is critical
Real-World Example: Customer support
chatbots that access product documentation Real-World Example: Medical information
systems, legal documentation assistance
Implementation Tip: Start with smaller
chunk sizes (512-1024 tokens) and Implementation Tip: Implement a
adjust based on performance feedback loop with multiple verification
passes

Intermediate Intermediate

3. Speculative RAG 4. Fusion RAG

User Query → Small Model Draft → User Query → Multiple Retrieval

Parallel Retrievals → Methods →
Large Model Verification → Results Fusion → Aggregated
Response Context →
LLM Response
When to Use: When balancing speed and
accuracy is important When to Use: When dealing with multiple
data sources of varying formats
Real-World Example: Real-time customer
service where response time impacts
satisfaction Real-World Example: Research assistants
accessing articles, patents, and databases
Implementation Tip: Use a specialized
domain-specific small model for draft Implementation Tip: Weight different
generation sources based on their reliability and
relevance

Advanced Intermediate

5. Agentic RAG 6. Self RAG

User Query → Intent Analysis → User Query → Initial Generation →

Agent Selection → Parallel Self-Critique → Additional
Retrievals → Retrieval →
Strategy Coordination → Response Refined Response

When to Use: Complex queries requiring When to Use: For conversational systems
multiple types of information requiring consistency

Real-World Example: Financial analysis Real-World Example: Educational tutoring

tools accessing market data, company systems that build on previous explanations
reports, and news
Implementation Tip: Store conversation
Implementation Tip: Design specialized history as retrievable context
agents for different query types and data
sources

Advanced Advanced

7. Hierarchical RAG 8. Multi-modal RAG

User Query → Top-Level Retrieval → User Query → Cross-modal

Sub-document Identification → Understanding →
Focused Retrieval → Response Multi-format Retrieval →
Format Integration → Response
When to Use: With large, structured
documents or knowledge bases When to Use: When information spans text,
images, audio, or video
Real-World Example: Enterprise search
across documentation hierarchies Real-World Example: E-commerce search
using both product descriptions and images
Implementation Tip: Create multi-level
embedding indexes for efficient
navigation
Implementation Tip: Use specialized
embeddings for each modality and create
bridging mechanisms

Advanced Intermediate

9. Adaptive RAG 10. Fine-tuned RAG

User Query → Query Analysis → User Query → Domain-Specific

Retrieval Strategy Selection → Processing →
Dynamic Parameter Adjustment → Specialized Retrieval →
Response Context-Aware LLM → Response

When to Use: For systems facing diverse When to Use: For specialized domains
query types and user needs requiring expert-level responses

Real-World Example: Academic research Real-World Example: Technical support

assistants handling various disciplines systems for complex products

Implementation Tip: Implement real- Implementation Tip: Fine-tune both

time feedback to tune retrieval embeddings and LLM on domain-specific
parameters per query data

Best Practices for RAG Implementation

Document Processing Embedding Selection

Chunking Strategy General Purpose

1 Balance between semantic 1 OpenAI ada-002, BERT-based

coherence and retrieval granularity models

Chunk Size Specialized Domains

2 256-1024 tokens depending on 2 Consider domain-specific

content complexity embedding models
Overlap Dimensions

3 10-20% chunk overlap to maintain 3 Higher dimensions (768+) for

context across chunks complex information

Retrieval Prompt Engineering

Top-k Selection Template

1 Usually 3-5 chunks for typical 1 "Based on the following information:

queries {context}, please answer: {query}"

Re-ranking Instruction

2 Consider adding a re-ranking step 2 "Use only the provided information.

after initial retrieval If you don't know, say so."

Hybrid Search Source Attribution

3 Combine semantic and keyword 3 "For each point in your answer,

search for better results indicate which source it came from."

Common RAG Challenges & Solutions

Challenge Solution

Hallucination Implement fact-checking and source validation

Retrieval Latency Use approximate nearest neighbor algorithms

Context Length Limits Implement recursive summarization of retrieved chunks

Irrelevant Retrieval Add filtering and pre-processing of documents

Response Consistency Implement conversation history as part of the context

Evaluating RAG Systems

Metric Description Target

Answer Relevance How well the response answers the query >85%

Factual Accuracy Correctness of facts in the response >95%

Retrieval Precision Relevance of retrieved documents >80%

Response Time Time from query to response <2s

Source Coverage Using multiple relevant sources ≥2 sources

Real-World Use Cases

Enterprise Customer Support Research Assistant

🏢 Knowledge Base
Connect employees to
internal documentation
🛎️ Provide accurate
product information and
troubleshooting
🔬 Aid in literature review
and information
synthesis

📋 🎓
Compliance Monitoring Educational Tutoring
Ensure responses adhere to regulatory Provide accurate explanations with cited
guidelines sources

Advanced RAG Optimizations

Query Reformulation Retrieval Augmentation

Rewrite user queries for better retrieval Enhance retrieved context with related
performance information

// Example Query Reformulation // Retrieved Context Augmentation

User: "Tell me about rockets" 1. Original: "SpaceX Falcon 9
Reformulated: "What are rockets, specifications"
their history, types, and 2. Augmented: + "Rocket
applications in space propulsion systems"
exploration?" 3. Augmented: + "Comparison with
other launch vehicles"

Contextual Compression Adaptive RAG

Condense retrieved information to focus on Dynamically adjust retrieval parameters
relevance based on query type

// Context Compression Pipeline // Adaptive Parameter Selection

1. Retrieve k=10 documents if query_is_factual():
2. Generate summary for each k = 3 # fewer, precise
document documents
3. Rank summaries by relevance similarity_threshold = 0.8
4. Select top n=3 summaries elif query_is_exploratory():
k = 7 # more, diverse
documents
similarity_threshold = 0.6

Continuous Learning
Update vector stores and embeddings as
new information arrives

// Incremental Updating
1. Monitor for new documents
2. Process and embed new content
3. Merge into existing vector
store
4. Periodically re-index for
optimization
Practical RAG Implementation Guide

1 Setting Up Your Document Processing Pipeline

import os
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load documents
loader = DirectoryLoader('./documents/', glob="**/*.pdf")
documents = loader.load()

# Split into chunks

text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
length_function=len,
)
chunks = text_splitter.split_documents(documents)

2 Creating and Storing Embeddings

from langchain.embeddings import OpenAIEmbeddings

from langchain.vectorstores import Chroma

# Initialize embeddings
embeddings = OpenAIEmbeddings()

# Create vector store

vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)

# Persist to disk
vectorstore.persist()
3 Building the Retrieval System

from langchain.retrievers import ContextualCompressionRetriever

from langchain.retrievers.document_compressors import LLMChainExtractor

# Create base retriever

retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 4}
)

# Add compression for better context

llm = ChatOpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=retriever
)

4 Implementing the RAG Chain

from langchain.chat_models import ChatOpenAI

from langchain.chains import RetrievalQA

# Initialize LLM
llm = ChatOpenAI(temperature=0, model="gpt-4")

# Create RAG chain

qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=compression_retriever,
return_source_documents=True,
chain_type_kwargs={
"prompt": CUSTOM_PROMPT_TEMPLATE
}
)

# Query the system

result = qa_chain({"query": "How do rockets work?"})
print(result["result"])

Popular Vector Database Comparison

🔍 🗄️ 🔮
Pinecone Weaviate Chroma
Fully managed service Open-source Open-source
Low latency queries GraphQL API Easy Python integration
Scales to billions of vectors Classification support Simple deployment
High availability Multi-tenancy support Good for prototyping

🪢 🌐
Milvus Qdrant
Open-source Open-source
Distributed architecture Filtering capabilities
Hybrid search On-prem deployment
High scalability REST and gRPC APIs

Comparing RAG Performance Metrics

Response Memory Implementation

Architecture Accuracy
Time Usage Complexity

Standard RAG ⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐

Corrective
RAG
⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐
Speculative
RAG
⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐⭐
Fusion RAG ⭐⭐ ⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐⭐
Agentic RAG ⭐⭐ ⭐⭐⭐⭐⭐ ⭐ ⭐⭐⭐⭐⭐

Standard RAG Implementation Flowchart

Document Document Embedding Vector

→ → →
Collection Chunking Generation Database

User Query Vector Similarity Retrieve

→ → →
Query Embedding Search Documents

Format Generate LLM Final

→ → →
Context Prompt Processing Response

Popular RAG Tools and Frameworks

LangChain LlamaIndex Haystack

Framework for connecting Data framework for End-to-end framework for
LLMs with external data augmenting LLMs with building NLP pipelines
sources private data
⭐⭐⭐⭐
⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
Semantic Kernel txtai
Microsoft's SDK for All-in-one embeddings
integrating LLMs with code database with search

⭐⭐⭐ capabilities

⭐⭐⭐

The Ultimate RAG Visual Cheat Sheet | Updated April 2025

26 RAG Concepts in Alphabetical Order
No ratings yet
26 RAG Concepts in Alphabetical Order
15 pages
Implementing A Retrieval-Augmented Generation System
No ratings yet
Implementing A Retrieval-Augmented Generation System
3 pages
Advanced Retrieval-Augmented Generation (RAG) With LangChain, LangGraph, and AI Agents - by Manoj Mukherjee - Oct, 2024 - Medium
No ratings yet
Advanced Retrieval-Augmented Generation (RAG) With LangChain, LangGraph, and AI Agents - by Manoj Mukherjee - Oct, 2024 - Medium
15 pages
Key Challenges in RAG Systems
No ratings yet
Key Challenges in RAG Systems
43 pages
The New Stack and Ops For AI - LLMOps
No ratings yet
The New Stack and Ops For AI - LLMOps
12 pages
Multi-Agent Agentic RAG Systems - Prashant Sahu
No ratings yet
Multi-Agent Agentic RAG Systems - Prashant Sahu
10 pages
Fast GraphRAG: Scalable RAG Innovation
No ratings yet
Fast GraphRAG: Scalable RAG Innovation
7 pages
LLM Evaluation
No ratings yet
LLM Evaluation
1 page
Agentic RAGs 1740054167
No ratings yet
Agentic RAGs 1740054167
10 pages
Knowledge Graphs V Vector Databases and When Not To Use Them!
100% (1)
Knowledge Graphs V Vector Databases and When Not To Use Them!
3 pages
Agentic AI
No ratings yet
Agentic AI
10 pages
Intro To Gen AI PDF
No ratings yet
Intro To Gen AI PDF
6 pages
AIOS: LLM Agent OS for Developers
No ratings yet
AIOS: LLM Agent OS for Developers
14 pages
Mastering Chunking in RAG - Techniques and Strategies
No ratings yet
Mastering Chunking in RAG - Techniques and Strategies
12 pages
How Build A RAG Agent With LlamaIndex
No ratings yet
How Build A RAG Agent With LlamaIndex
4 pages
Advanced RAG Techniques Guide
No ratings yet
Advanced RAG Techniques Guide
16 pages
The 10 Generic Kinds of Agents 1730948119
100% (1)
The 10 Generic Kinds of Agents 1730948119
17 pages
Generative AI - Use Cases and Tools (Tanmay Wankhede)
No ratings yet
Generative AI - Use Cases and Tools (Tanmay Wankhede)
4 pages
GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp
No ratings yet
GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp
108 pages
Generative AI for Chart Creation
100% (1)
Generative AI for Chart Creation
37 pages
Fine Tuning Techniques For Large Language Models LLMs
100% (4)
Fine Tuning Techniques For Large Language Models LLMs
15 pages
How Master Data Management Can Enable AI Adoption
No ratings yet
How Master Data Management Can Enable AI Adoption
43 pages
Generative AI APIs For Practical Applications
No ratings yet
Generative AI APIs For Practical Applications
27 pages
Accenture Generative AI Sourcing and Procurement
No ratings yet
Accenture Generative AI Sourcing and Procurement
5 pages
Conversational AI with GraphRAG
100% (1)
Conversational AI with GraphRAG
23 pages
Building Intelligent Agents With Semantic Kernel: A Comprehensive Guide
No ratings yet
Building Intelligent Agents With Semantic Kernel: A Comprehensive Guide
16 pages
Building A Streamlit Chatbot With LangChain and Llama 3.1 - Exploring LLMs - 3 - by Abou Zuhayr - Sep, 2024 - GoPenAI
No ratings yet
Building A Streamlit Chatbot With LangChain and Llama 3.1 - Exploring LLMs - 3 - by Abou Zuhayr - Sep, 2024 - GoPenAI
15 pages
The Complete LangGraph Blueprint Build 50+ AI Agents For Business Success (Karanja Maina, James) (Z-Library)
No ratings yet
The Complete LangGraph Blueprint Build 50+ AI Agents For Business Success (Karanja Maina, James) (Z-Library)
568 pages
Emerging Patterns For Building LLM-Based AI Agents
No ratings yet
Emerging Patterns For Building LLM-Based AI Agents
59 pages
RAG App Guide for Beginners
100% (1)
RAG App Guide for Beginners
16 pages
AI Agent Architectures
No ratings yet
AI Agent Architectures
12 pages
MCP
No ratings yet
MCP
9 pages
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
No ratings yet
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
61 pages
Deep Learning: Transformers & LLMs
100% (1)
Deep Learning: Transformers & LLMs
163 pages
Architecting Agentic Apps
100% (1)
Architecting Agentic Apps
10 pages
Vector Database in LLMs
No ratings yet
Vector Database in LLMs
14 pages
A Developer's Guide To Building AI Applications: Second Edition
100% (5)
A Developer's Guide To Building AI Applications: Second Edition
46 pages
The AI Agent Market Map - CB Insights - 173180
No ratings yet
The AI Agent Market Map - CB Insights - 173180
8 pages
Innovate With AI Future Enterprise
No ratings yet
Innovate With AI Future Enterprise
227 pages
Abhiyanth Workshop Building Virtual AI Agents On Google Cloud
No ratings yet
Abhiyanth Workshop Building Virtual AI Agents On Google Cloud
3 pages
Why The Shift To A Modern Data and Analytics Strategy Is Critical For Business Success
100% (1)
Why The Shift To A Modern Data and Analytics Strategy Is Critical For Business Success
40 pages
Valentina - Alto - Getting Started With AI Agents
No ratings yet
Valentina - Alto - Getting Started With AI Agents
19 pages
Generative AI: Creative Chaos Unleashed
No ratings yet
Generative AI: Creative Chaos Unleashed
1 page
Evolving LLOMPS For RAG
No ratings yet
Evolving LLOMPS For RAG
6 pages
AI Reasoning with KAG & RAG
No ratings yet
AI Reasoning with KAG & RAG
13 pages
Generative AI Market Report
No ratings yet
Generative AI Market Report
53 pages
Intro Gen AI 6p
100% (1)
Intro Gen AI 6p
6 pages
Vector Databases
100% (1)
Vector Databases
35 pages
AgenticAi Roadmap
No ratings yet
AgenticAi Roadmap
9 pages
Agentic Deep Graph Reasoning 1739950593
No ratings yet
Agentic Deep Graph Reasoning 1739950593
102 pages
Agents in LangChain
100% (2)
Agents in LangChain
11 pages
Generative AI With LangChain Build Production-Ready LLM Applications and Advanced Agents Using Python, LangChain, and LangGraph
No ratings yet
Generative AI With LangChain Build Production-Ready LLM Applications and Advanced Agents Using Python, LangChain, and LangGraph
477 pages
Tech Guide AI Agents SHAPE EN
No ratings yet
Tech Guide AI Agents SHAPE EN
27 pages
RAG and AI Agents Simplified
0% (1)
RAG and AI Agents Simplified
14 pages
GenerativeAIBootcamp Presentation
No ratings yet
GenerativeAIBootcamp Presentation
50 pages
Aios LLM As Os
100% (2)
Aios LLM As Os
35 pages
S73042 Dynamo Tutorial GTC 2025
No ratings yet
S73042 Dynamo Tutorial GTC 2025
79 pages
AI Agent Engineering Syllabus
100% (1)
AI Agent Engineering Syllabus
9 pages
1GitHub - Modelcontextprotocol - Python-Sdk - The Official Python SDK For Model Context Protocol Servers and Clients
No ratings yet
1GitHub - Modelcontextprotocol - Python-Sdk - The Official Python SDK For Model Context Protocol Servers and Clients
9 pages
RAG Cheat Sheet-2
No ratings yet
RAG Cheat Sheet-2
29 pages
FST Tanger 2012 - EEA - GOI - Physique
No ratings yet
FST Tanger 2012 - EEA - GOI - Physique
3 pages
Satatistiques (Notes de Cours)
No ratings yet
Satatistiques (Notes de Cours)
7 pages
FST Tanger 2012 - GOI - Mathématiques
No ratings yet
FST Tanger 2012 - GOI - Mathématiques
1 page
Time Series - Arch & Garch
No ratings yet
Time Series - Arch & Garch
30 pages
Documentation
No ratings yet
Documentation
12 pages
Paper 15-Comprehensive Analysis For Sensor Based Hydraulic System
No ratings yet
Paper 15-Comprehensive Analysis For Sensor Based Hydraulic System
9 pages
Evaluation of Guidance Run Life
No ratings yet
Evaluation of Guidance Run Life
15 pages
Structure de Données Enc
No ratings yet
Structure de Données Enc
12 pages
Understanding Solar Plant Design ParametersSolar Irradiance, Tilt Angle, Azimuth, Efficiency Factors and Shading Analysis
No ratings yet
Understanding Solar Plant Design ParametersSolar Irradiance, Tilt Angle, Azimuth, Efficiency Factors and Shading Analysis
46 pages
101 Romantic Messages
100% (3)
101 Romantic Messages
8 pages
Reflection On Sports and Exercise Psychology
No ratings yet
Reflection On Sports and Exercise Psychology
2 pages
Multicap Fund - One Pager
No ratings yet
Multicap Fund - One Pager
2 pages
Ame 8800
No ratings yet
Ame 8800
20 pages
Books List
No ratings yet
Books List
3 pages
Gmail - 答复 - Jomil - completion of works
No ratings yet
Gmail - 答复 - Jomil - completion of works
10 pages
Situating Uncertainty in Clinical Decisi
No ratings yet
Situating Uncertainty in Clinical Decisi
7 pages
Corruption Analysis in Nigeria
No ratings yet
Corruption Analysis in Nigeria
12 pages
BTD-300 Software Manual - 106546.04C
No ratings yet
BTD-300 Software Manual - 106546.04C
46 pages
Prospective Board Member Questionnaire
No ratings yet
Prospective Board Member Questionnaire
2 pages
Charter of UN PDF
No ratings yet
Charter of UN PDF
188 pages
PGBC Summary
100% (2)
PGBC Summary
36 pages
Evolution100-200 - Overviewbrochure TK Elevator Belt
No ratings yet
Evolution100-200 - Overviewbrochure TK Elevator Belt
19 pages
Sydney Airport Airside Driving Pocket Book Jul 2018
No ratings yet
Sydney Airport Airside Driving Pocket Book Jul 2018
70 pages
Assam Project List & Name of The Contractor - Email
No ratings yet
Assam Project List & Name of The Contractor - Email
4 pages
Full Download The Subject of Coexistence Otherness in International Relations Borderlines Series 1st Edition Louiza Odysseos PDF
100% (13)
Full Download The Subject of Coexistence Otherness in International Relations Borderlines Series 1st Edition Louiza Odysseos PDF
84 pages
Advanced EDM Die Sinking
No ratings yet
Advanced EDM Die Sinking
1 page
Women Travellers
No ratings yet
Women Travellers
76 pages
Seismic Analysis and Retrofitting of R.C.C Structure
No ratings yet
Seismic Analysis and Retrofitting of R.C.C Structure
5 pages
Ahmed Radwan 1-1
No ratings yet
Ahmed Radwan 1-1
3 pages
Copper Concentrate
No ratings yet
Copper Concentrate
5 pages
Linear Equation in Two Unknowns PDF
No ratings yet
Linear Equation in Two Unknowns PDF
16 pages
2 Bontrager
No ratings yet
2 Bontrager
1 page
Williams Poems
100% (1)
Williams Poems
3 pages
Parallel
No ratings yet
Parallel
8 pages
MPLS L2VPN Config Commands Guide
No ratings yet
MPLS L2VPN Config Commands Guide
28 pages
Zimbabwe School Examinations Council: Accounting 9197/3
50% (2)
Zimbabwe School Examinations Council: Accounting 9197/3
8 pages
The Level of Interest Between Small Scale and Large Scale
No ratings yet
The Level of Interest Between Small Scale and Large Scale
30 pages
TR 28
100% (1)
TR 28
4 pages

RAG Understanding PDF

Uploaded by

RAG Understanding PDF

Uploaded by

RAG Cheat Sheet

A comprehensive visual guide to Retrieval-Augmented Generation architectures, implementation,

What is RAG? When to Use RAG

When you need to reduce hallucinations

📄 Basic RAG Flow:

10 RAG Architectures Compared

1. Standard RAG 2. Corrective RAG

User Query → Query Processing → User Query → Initial Response →

3. Speculative RAG 4. Fusion RAG

User Query → Small Model Draft → User Query → Multiple Retrieval

5. Agentic RAG 6. Self RAG

User Query → Intent Analysis → User Query → Initial Generation →

Real-World Example: Financial analysis Real-World Example: Educational tutoring

7. Hierarchical RAG 8. Multi-modal RAG

User Query → Top-Level Retrieval → User Query → Cross-modal

9. Adaptive RAG 10. Fine-tuned RAG

User Query → Query Analysis → User Query → Domain-Specific

Real-World Example: Academic research Real-World Example: Technical support

Implementation Tip: Implement real- Implementation Tip: Fine-tune both

Best Practices for RAG Implementation

Document Processing Embedding Selection

Chunking Strategy General Purpose

1 Balance between semantic 1 OpenAI ada-002, BERT-based

Chunk Size Specialized Domains

2 256-1024 tokens depending on 2 Consider domain-specific

3 10-20% chunk overlap to maintain 3 Higher dimensions (768+) for

Retrieval Prompt Engineering

Top-k Selection Template

1 Usually 3-5 chunks for typical 1 "Based on the following information:

2 Consider adding a re-ranking step 2 "Use only the provided information.

Hybrid Search Source Attribution

3 Combine semantic and keyword 3 "For each point in your answer,

Common RAG Challenges & Solutions

Hallucination Implement fact-checking and source validation

Retrieval Latency Use approximate nearest neighbor algorithms

Context Length Limits Implement recursive summarization of retrieved chunks

Irrelevant Retrieval Add filtering and pre-processing of documents

Evaluating RAG Systems

Metric Description Target

Factual Accuracy Correctness of facts in the response >95%

Retrieval Precision Relevance of retrieved documents >80%

Response Time Time from query to response <2s

Source Coverage Using multiple relevant sources ≥2 sources

Real-World Use Cases

Enterprise Customer Support Research Assistant

Advanced RAG Optimizations

Query Reformulation Retrieval Augmentation

// Example Query Reformulation // Retrieved Context Augmentation

Contextual Compression Adaptive RAG

// Context Compression Pipeline // Adaptive Parameter Selection

1 Setting Up Your Document Processing Pipeline

# Split into chunks

2 Creating and Storing Embeddings

from langchain.embeddings import OpenAIEmbeddings

# Create vector store

from langchain.retrievers import ContextualCompressionRetriever

# Create base retriever

# Add compression for better context

4 Implementing the RAG Chain

from langchain.chat_models import ChatOpenAI

# Create RAG chain

# Query the system

Popular Vector Database Comparison

Comparing RAG Performance Metrics

Response Memory Implementation

Standard RAG ⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐

Standard RAG Implementation Flowchart

Document Document Embedding Vector

User Query Vector Similarity Retrieve

Format Generate LLM Final

Popular RAG Tools and Frameworks

LangChain LlamaIndex Haystack

The Ultimate RAG Visual Cheat Sheet | Updated April 2025

You might also like