KEMBAR78
RAG Understanding PDF | PDF | Information Retrieval | Databases
0% found this document useful (0 votes)
392 views12 pages

RAG Understanding PDF

The document is a comprehensive guide to Retrieval-Augmented Generation (RAG) architectures, detailing its core components, use cases, and best practices for implementation. It outlines various RAG architectures suitable for different applications, along with practical tips for document processing, embedding selection, and retrieval strategies. Additionally, it addresses common challenges, evaluation metrics, and provides a practical implementation guide for building RAG systems.

Uploaded by

elmansouri.aya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
392 views12 pages

RAG Understanding PDF

The document is a comprehensive guide to Retrieval-Augmented Generation (RAG) architectures, detailing its core components, use cases, and best practices for implementation. It outlines various RAG architectures suitable for different applications, along with practical tips for document processing, embedding selection, and retrieval strategies. Additionally, it addresses common challenges, evaluation metrics, and provides a practical implementation guide for building RAG systems.

Uploaded by

elmansouri.aya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

RAG Cheat Sheet

A comprehensive visual guide to Retrieval-Augmented Generation architectures, implementation,


and best practices

What is RAG? When to Use RAG


Retrieval-Augmented Generation (RAG) is an When you need factual accuracy beyond

AI architecture that enhances Large the LLM's training data
Language Models (LLMs) by combining them
When working with domain-specific or
with external knowledge retrieval systems. ✓
proprietary information
Rather than relying solely on the model's
internal parameters, RAG allows LLMs to When information needs to be up-to-date

access, retrieve, and use up-to-date and verifiable
information from external databases before When transparency and citation of
generating responses. ✓
sources matters

When you need to reduce hallucinations


Core Components: ✓
in LLM outputs

📄 Basic RAG Flow:

Document Processing
Document Collection → Document
Converting documents into embeddings
1 Chunking → Embedding Generation
→ Vector Database Storage

🗄️ 2
User Query → Query Embedding →
Similarity Search → Relevant
Vector Database
Document Retrieval
Storing embedded documents

🔍
Retrieved Documents + Original
3 Query → LLM Generation →
Response

Retriever
Finding relevant documents
🤖
Generator
Creating accurate responses

10 RAG Architectures Compared

Beginner Intermediate

1. Standard RAG 2. Corrective RAG

User Query → Query Processing → User Query → Initial Response →


Retrieval → Document Selection → Error Detection → Retrieval →
Context Integration → LLM Response Response Correction → Final
Response
When to Use: For basic question-answering
systems needing external knowledge When to Use: High-precision use cases
where accuracy is critical
Real-World Example: Customer support
chatbots that access product documentation Real-World Example: Medical information
systems, legal documentation assistance
Implementation Tip: Start with smaller
chunk sizes (512-1024 tokens) and Implementation Tip: Implement a
adjust based on performance feedback loop with multiple verification
passes

Intermediate Intermediate

3. Speculative RAG 4. Fusion RAG

User Query → Small Model Draft → User Query → Multiple Retrieval


Parallel Retrievals → Methods →
Large Model Verification → Results Fusion → Aggregated
Response Context →
LLM Response
When to Use: When balancing speed and
accuracy is important When to Use: When dealing with multiple
data sources of varying formats
Real-World Example: Real-time customer
service where response time impacts
satisfaction Real-World Example: Research assistants
accessing articles, patents, and databases
Implementation Tip: Use a specialized
domain-specific small model for draft Implementation Tip: Weight different
generation sources based on their reliability and
relevance

Advanced Intermediate

5. Agentic RAG 6. Self RAG

User Query → Intent Analysis → User Query → Initial Generation →


Agent Selection → Parallel Self-Critique → Additional
Retrievals → Retrieval →
Strategy Coordination → Response Refined Response

When to Use: Complex queries requiring When to Use: For conversational systems
multiple types of information requiring consistency

Real-World Example: Financial analysis Real-World Example: Educational tutoring


tools accessing market data, company systems that build on previous explanations
reports, and news
Implementation Tip: Store conversation
Implementation Tip: Design specialized history as retrievable context
agents for different query types and data
sources

Advanced Advanced

7. Hierarchical RAG 8. Multi-modal RAG

User Query → Top-Level Retrieval → User Query → Cross-modal


Sub-document Identification → Understanding →
Focused Retrieval → Response Multi-format Retrieval →
Format Integration → Response
When to Use: With large, structured
documents or knowledge bases When to Use: When information spans text,
images, audio, or video
Real-World Example: Enterprise search
across documentation hierarchies Real-World Example: E-commerce search
using both product descriptions and images
Implementation Tip: Create multi-level
embedding indexes for efficient
navigation
Implementation Tip: Use specialized
embeddings for each modality and create
bridging mechanisms

Advanced Intermediate

9. Adaptive RAG 10. Fine-tuned RAG

User Query → Query Analysis → User Query → Domain-Specific


Retrieval Strategy Selection → Processing →
Dynamic Parameter Adjustment → Specialized Retrieval →
Response Context-Aware LLM → Response

When to Use: For systems facing diverse When to Use: For specialized domains
query types and user needs requiring expert-level responses

Real-World Example: Academic research Real-World Example: Technical support


assistants handling various disciplines systems for complex products

Implementation Tip: Implement real- Implementation Tip: Fine-tune both


time feedback to tune retrieval embeddings and LLM on domain-specific
parameters per query data

Best Practices for RAG Implementation

Document Processing Embedding Selection

Chunking Strategy General Purpose

1 Balance between semantic 1 OpenAI ada-002, BERT-based


coherence and retrieval granularity models

Chunk Size Specialized Domains

2 256-1024 tokens depending on 2 Consider domain-specific


content complexity embedding models
Overlap Dimensions

3 10-20% chunk overlap to maintain 3 Higher dimensions (768+) for


context across chunks complex information

Retrieval Prompt Engineering

Top-k Selection Template

1 Usually 3-5 chunks for typical 1 "Based on the following information:


queries {context}, please answer: {query}"

Re-ranking Instruction

2 Consider adding a re-ranking step 2 "Use only the provided information.


after initial retrieval If you don't know, say so."

Hybrid Search Source Attribution

3 Combine semantic and keyword 3 "For each point in your answer,


search for better results indicate which source it came from."

Common RAG Challenges & Solutions

Challenge Solution

Hallucination Implement fact-checking and source validation

Retrieval Latency Use approximate nearest neighbor algorithms

Context Length Limits Implement recursive summarization of retrieved chunks

Irrelevant Retrieval Add filtering and pre-processing of documents


Response Consistency Implement conversation history as part of the context

Evaluating RAG Systems

Metric Description Target

Answer Relevance How well the response answers the query >85%

Factual Accuracy Correctness of facts in the response >95%

Retrieval Precision Relevance of retrieved documents >80%

Response Time Time from query to response <2s

Source Coverage Using multiple relevant sources ≥2 sources

Real-World Use Cases

Enterprise Customer Support Research Assistant

🏢 Knowledge Base
Connect employees to
internal documentation
🛎️ Provide accurate
product information and
troubleshooting
🔬 Aid in literature review
and information
synthesis

📋 🎓
Compliance Monitoring Educational Tutoring
Ensure responses adhere to regulatory Provide accurate explanations with cited
guidelines sources

Advanced RAG Optimizations

Query Reformulation Retrieval Augmentation


Rewrite user queries for better retrieval Enhance retrieved context with related
performance information

// Example Query Reformulation // Retrieved Context Augmentation


User: "Tell me about rockets" 1. Original: "SpaceX Falcon 9
Reformulated: "What are rockets, specifications"
their history, types, and 2. Augmented: + "Rocket
applications in space propulsion systems"
exploration?" 3. Augmented: + "Comparison with
other launch vehicles"

Contextual Compression Adaptive RAG


Condense retrieved information to focus on Dynamically adjust retrieval parameters
relevance based on query type

// Context Compression Pipeline // Adaptive Parameter Selection


1. Retrieve k=10 documents if query_is_factual():
2. Generate summary for each k = 3 # fewer, precise
document documents
3. Rank summaries by relevance similarity_threshold = 0.8
4. Select top n=3 summaries elif query_is_exploratory():
k = 7 # more, diverse
documents
similarity_threshold = 0.6

Continuous Learning
Update vector stores and embeddings as
new information arrives

// Incremental Updating
1. Monitor for new documents
2. Process and embed new content
3. Merge into existing vector
store
4. Periodically re-index for
optimization
Practical RAG Implementation Guide

1 Setting Up Your Document Processing Pipeline

import os
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load documents
loader = DirectoryLoader('./documents/', glob="**/*.pdf")
documents = loader.load()

# Split into chunks


text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
length_function=len,
)
chunks = text_splitter.split_documents(documents)

2 Creating and Storing Embeddings

from langchain.embeddings import OpenAIEmbeddings


from langchain.vectorstores import Chroma

# Initialize embeddings
embeddings = OpenAIEmbeddings()

# Create vector store


vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)

# Persist to disk
vectorstore.persist()
3 Building the Retrieval System

from langchain.retrievers import ContextualCompressionRetriever


from langchain.retrievers.document_compressors import LLMChainExtractor

# Create base retriever


retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 4}
)

# Add compression for better context


llm = ChatOpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=retriever
)

4 Implementing the RAG Chain

from langchain.chat_models import ChatOpenAI


from langchain.chains import RetrievalQA

# Initialize LLM
llm = ChatOpenAI(temperature=0, model="gpt-4")

# Create RAG chain


qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=compression_retriever,
return_source_documents=True,
chain_type_kwargs={
"prompt": CUSTOM_PROMPT_TEMPLATE
}
)

# Query the system


result = qa_chain({"query": "How do rockets work?"})
print(result["result"])

Popular Vector Database Comparison

🔍 🗄️ 🔮
Pinecone Weaviate Chroma
Fully managed service Open-source Open-source
Low latency queries GraphQL API Easy Python integration
Scales to billions of vectors Classification support Simple deployment
High availability Multi-tenancy support Good for prototyping

🪢 🌐
Milvus Qdrant
Open-source Open-source
Distributed architecture Filtering capabilities
Hybrid search On-prem deployment
High scalability REST and gRPC APIs

Comparing RAG Performance Metrics

Response Memory Implementation


Architecture Accuracy
Time Usage Complexity

Standard RAG ⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐


Corrective
RAG
⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐
Speculative
RAG
⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐⭐
Fusion RAG ⭐⭐ ⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐⭐
Agentic RAG ⭐⭐ ⭐⭐⭐⭐⭐ ⭐ ⭐⭐⭐⭐⭐

Standard RAG Implementation Flowchart

Document Document Embedding Vector


→ → →
Collection Chunking Generation Database

User Query Vector Similarity Retrieve


→ → →
Query Embedding Search Documents

Format Generate LLM Final


→ → →
Context Prompt Processing Response

Popular RAG Tools and Frameworks

LangChain LlamaIndex Haystack


Framework for connecting Data framework for End-to-end framework for
LLMs with external data augmenting LLMs with building NLP pipelines
sources private data
⭐⭐⭐⭐
⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
Semantic Kernel txtai
Microsoft's SDK for All-in-one embeddings
integrating LLMs with code database with search

⭐⭐⭐ capabilities

⭐⭐⭐

The Ultimate RAG Visual Cheat Sheet | Updated April 2025

You might also like