INDUSTRIAL TRAINING
REPORT
Submitted in partial fulfilment of the
Requirements for the award of the degree
of
Bachelor of Technology
in
Computer Science and Engineering
By:
Jesmeek Singh Bhugra (055/CSE1/2021)
Department of Computer Science and Engineering
Guru Tegh Bahadur Institute of Technology
Guru Gobind Singh Indraprastha University
Dwarka, New Delhi
Year 2021-2025
RAG MODEL USING LLAMA3, LANGCHAIN &
FAISS VECTORSTORE
Duration
10th July 2024 – 11th August 2024
By:
Jesmeek Singh Bhugra (055/CSE1/2021)
At
LLM Mastery: ChatGPT, Gemini, Claude, Llama3, OpenAI & APIs
UDEMY
i
DECLARATION
I hereby declare that all the work presented in this Industrial Training Report for the partial
fulfillment of the requirements for the award of the degree of Bachelor of Technology in
Artificial Intelligence & Data Science, Guru Tegh Bahadur Institute of Technology,
affiliated to Guru Gobind Singh Indraprastha University Delhi is an authentic record of our
own work carried out at LLM Mastery: ChatGPT, Gemini, Claude, Llama3, OpenAI & APIs
Course at Udemy from 11th July, 2024 to 11th August 2024.
Date:
Jesmeek Singh Bhugra (055/CSE1/2021)
ii
CERTIFICATE
iii
ACKNOWLEDGEMENT
I would like to express our great gratitude towards Ms. Divya who has given us support
and suggestions. Without their help we could not have presented this work upto the
present standard. We also take this opportunity to give thanks to all others who gave us
support for the project or in other aspects of our study at Guru Tegh Bahadur Institute
of Technology.
Date:
Jesmeek Singh Bhugra (055/CSE1/2021)
jesmeeksingh@gmail.com
iv
ABSTRACT
This report presents an in-depth analysis of the Retrieval Augmented Generation
(RAG) model, a state-of-the-art approach that enhances the performance of
Large Language Models (LLMs) by integrating information retrieval with
generative capabilities. Traditional LLMs, while powerful, often lack access to
real-time and domain-specific knowledge, leading to limitations in the accuracy
of their responses. RAG addresses these issues by combining a vector-based
retrieval mechanism with text generation, enabling models to produce
contextually relevant and factually grounded outputs.
RAG leverages vector databases, which store information in high-dimensional
vector representations, enabling efficient similarity searches for relevant data.
When a user query is processed, RAG retrieves relevant documents or
information from these vector databases, embedding them into the model's input
context. This retrieval step significantly improves the relevance and factual
correctness of the model's output by providing up-to-date, domain-specific
information that complements the LLM's pre-trained knowledge.
The RAG process is structured through a RAG chain, which includes a retriever
and a prompt mechanism. The retriever searches for documents based on the
query, while the prompt integrates this information into the model's input,
allowing the LLM to generate an informed and context-aware response. This
structured process enhances the robustness of the model in various applications,
particularly those requiring accurate and detailed information retrieval, such as
customer support, content generation, and research.
The findings of this report underscore the significance of RAG in improving the
overall performance of LLMs. By seamlessly combining retrieval with
generation, RAG offers a more dynamic and adaptable model capable of
addressing real-world challenges.
v
LIST OF FIGURES AND TABLES
Fig No Figure Name Page
1. UDEMY 1
vi
CONTENTS
Chapter Page No.
Title Page i
Declaration ii
Certificate iii
Acknowledgement iv
Abstract v
Tables and figures vi
1. 1. Introduction 1
1.1 About UDEMY 1
1.2 Services 2
2. Contents
3. Results 4
4. Summary & Conclusions 6
5. References 8
6. Appendix 10
INTRODUCTION
1.1 About UDEMY
Udemy is a global online learning platform aimed at providing accessible and
flexible education for individuals, professionals, and organizations. Established in
2010, Udemy offers a wide range of courses across numerous categories including
technology, business, personal development, and more. The platform enables
industry professionals, educators, and experts to create and deliver courses,
catering to the ever-evolving needs of modern learners.
With over 200,000 courses available in multiple languages, Udemy has emerged as
a leader in online education. It facilitates learning for millions of students
worldwide, providing an affordable and flexible way to acquire new skills and
knowledge. Udemy's approach to democratizing education has made it an attractive
choice for both learners and organizations looking for comprehensive upskilling
solutions.
In addition to individual learners, Udemy for Business offers corporate training
solutions, allowing companies to upskill their workforce by leveraging a vast array
of learning resources. The platform's ease of access, continuous updates, and
relevant industry-focused content make it a valuable tool for lifelong learning.
This report delves into Udemy’s role in shaping the future of learning and how its
courses, particularly in fields like technology and artificial intelligence, are
contributing to the development of essential skills among students and
professionals.
1
1.2 Services
Individual Learning: Udemy offers a vast library of over 200,000 courses across a
wide range of categories, including technology, business, personal development, and
more. These courses are designed to be accessible and affordable, allowing learners to
acquire new skills at their own pace. Courses typically include video lectures, quizzes,
and assignments to facilitate learning. The platform supports various learning styles
and provides lifetime access to purchased courses, enabling learners to revisit the
material whenever needed.
Udemy for Business: Udemy for Business is tailored for corporate training and
development. It offers a curated selection of courses relevant to business skills,
including leadership, project management, and technical skills. Organizations can
create customized learning paths for their employees, track progress with detailed
analytics, and integrate the platform with existing learning management systems
(LMS).
Instructor Platform: Udemy provides a platform for educators, industry experts, and
professionals to create and sell their own courses. Instructors can use Udemy’s tools
to design course content, create engaging videos, and set up quizzes and assignments.
They also have access to marketing tools to promote their courses and track
performance through detailed analytics.
Mobile App: The Udemy mobile app extends the learning experience beyond
desktops and laptops. It allows users to access courses on their smartphones and
tablets, providing flexibility for learning on-the-go. The app includes features such as
offline access, so learners can download course materials and watch them without an
internet connection.
Certificates of Completion: Upon finishing a course, learners receive a Certificate of
Completion from Udemy. This certificate can be added to resumes or LinkedIn
profiles as proof of acquired skills and knowledge. While not accredited by
educational institutions, these certificates can demonstrate commitment to
professional development and expertise in specific areas.
2
Udemy for Government: Udemy for Government offers tailored training solutions
for public sector organizations. This service provides access to a broad selection of
courses aimed at improving skills within government agencies. It includes features
such as custom course recommendations and performance tracking to meet the
specific needs of government employees and agencies.
Course Customization: Udemy allows users and organizations to customize
learning experiences by creating personalized learning paths. This includes selecting
specific courses or modules relevant to their needs and integrating them into a
structured learning plan. This service is particularly useful for organizations looking
to address specific skills gaps or for learners with particular learning goals.
Q&A and Peer Learning: Courses on Udemy often include Q&A sections where
learners can ask questions and interact with instructors or other students. This feature
fosters a collaborative learning environment, allowing learners to seek clarification,
share insights, and engage with the course content more deeply. Peer learning
enhances the educational experience by providing additional perspectives and
support.
Lifetime Access: One of Udemy’s key features is lifetime access to purchased
courses. Once a course is bought, learners can return to it anytime to review
materials, watch updates, or refresh their knowledge. This ensures that learners can
continually benefit from the course content even after completing their initial studies.
Interactive Learning: Udemy courses often incorporate interactive elements such as
quizzes, assignments, and practice tests. These tools are designed to reinforce
learning, assess understanding, and provide practical application of knowledge.
Interactive learning helps engage students more actively and improves retention of
the material.
3
CONTENT
-Retrieval-Augmented Generation (RAG) Model Based on Langchain & Llama3:
Your implementation of the RAG (Retrieval-Augmented Generation) model integrates
document retrieval with a large language model to enable accurate and context-driven
responses. By using Langchain, the model combines document processing, retrieval, and
generation to allow users to interact with files such as PDFs, CSVs, and Text files in a
conversational manner. Below is a breakdown of how RAG is applied in your system.
Overview of the RAG Model
The RAG model operates by: Retrieving Relevant Documents: Using a document
retrieval mechanism based on vector embeddings (FAISS) to find the most relevant
chunks of data.
Generating Contextual Responses: Utilizing a large language model (Llama3) to generate
text that is informed by the retrieved documents.
Key Components of Your RAG Model
Document Processing & Embedding
Retrieval Mechanism: Your system reads documents (PDF, Text, CSV) using tools like
PyPDF2 and Lang chain loaders. These documents are then converted into text chunks
using Recursive-Character-TextSplitter.
Text chunks are embedded using the Ollama-Embeddings model (Llama3), which
encodes the text into embeddings. These embeddings represent the semantic meaning and
enable efficient document search.
FAISS Vector Store: The FAISS engine stores these embeddings, allowing efficient
retrieval of relevant documents based on user queries.
Generation with Llama3
Generative Model: The retrieved document chunks serve as input to the Llama3 model.
This model augments the information in these chunks with its generative capabilities to
produce responses.
The response generation is not limited to what the Llama3 model has seen during
training. It utilizes real-time document data retrieved by FAISS to ensure the responses
are grounded in the provided information.
Workflow of Your RAG Implementation
User Query: A user asks a question or inputs a task related to the documents.
Document Retrieval:The system retrieves relevant documents based on the query, using
similarity search through FAISS. The retrieved documents are ranked and selected based
on relevance to the query.
Response Generation: The Llama3 model processes the query and retrieved documents,
generating a contextual and accurate response.
Conversation History: The system saves each conversation (question, answer, timestamp)
along with the source documents, ensuring persistent data that can be reviewed later.
Advantages of Your RAG Model Implementation
a. Enhanced Document Querying
Multi-Format Support: By supporting PDF, CSV, and Text file formats, your model
allows users to query multiple types of data sources seamlessly.
Efficient Retrieval: FAISS ensures that relevant chunks are retrieved efficiently, even
when dealing with large volumes of data.
b. Improved Answer Accuracy
Grounded Responses: Unlike purely generative models, your RAG system provides
answers based on retrieved documents, reducing hallucination and ensuring that
responses are backed by real data.
c. User-Friendly Interaction
Interactive Interface: The Streamlit-based frontend allows users to upload documents, ask
questions, and view responses in an intuitive format.
Conversation Tracking: The system logs every interaction, enabling users to track
previous queries and answers along with their source documents.
Industrial Applications of Your RAG Model
This RAG model, as implemented, is highly suitable for industries that rely on large
document repositories for decision-making:
Legal Research: It can help lawyers and professionals search through contracts, case files,
and legal documents efficiently.
5
Healthcare: Clinicians can use the system to retrieve patient records, medical literature, or
clinical guidelines to make informed decisions.
Corporate & Finance: The model can assist in generating financial reports, auditing
documents, or analyzing large datasets for insights.
Challenges and Considerations
Document Quality: The accuracy of the generated responses heavily depends on the
quality and structure of the retrieved documents.
Response Time: As retrieval and generation involve multiple steps, there can be slight
latency, especially with large document sets.
Scaling: As the knowledge base grows, ensuring retrieval efficiency while maintaining
response quality becomes crucial.
Example Use Case
Suppose a user uploads a large PDF containing legal contracts and asks, "What is the penalty
clause in this contract?" The system will:
Retrieve relevant text chunks that mention penalty clauses using FAISS.
Generate a coherent response by processing the retrieved clauses with Llama3, providing
the user with a concise answer based on the exact document content.
Display the sources (e.g., contract pages) from which the information was retrieved.
Conclusion
Your implementation of the RAG model using Langchain, FAISS, and Llama3 is a powerful
tool for document-based question answering. It combines the strengths of retrieval-based
systems with generative AI, making it highly adaptable for various industrial applications like
legal research, healthcare, and finance.
6
RESULTS
When discussing the results of the Retrieval-Augmented Generation (RAG) model, it's
crucial to focus on various aspects such as performance metrics, effectiveness in specific
applications, and comparisons with other models. Here is a comprehensive overview of the
results and performance outcomes associated with the RAG model:
1. Performance Metrics
The efficacy of the RAG model is evaluated through several performance metrics, each
providing insights into different aspects of model performance:
Accuracy: This metric assesses the correctness of the model’s responses or outputs
compared to a predefined ground truth. In the context of question-answering tasks,
accuracy is often expressed as the percentage of correct answers provided by the
model.
F1 Score: Specifically used for question answering, the F1 score measures the
model's precision and recall. Precision denotes the proportion of relevant instances
among the retrieved results, while recall indicates the proportion of relevant instances
retrieved out of all relevant instances.
ROUGE Score: In summarization tasks, ROUGE (Recall-Oriented Understudy for
Gisting Evaluation) scores evaluate the quality of the generated summaries by
measuring the overlap with reference summaries. ROUGE-N (for n-grams) and
ROUGE-L (for longest common subsequences) are commonly employed variants.
BLEU Score: The BLEU (Bilingual Evaluation Understudy) score is utilized to
evaluate the quality of generated text, particularly in machine translation and content
generation. It measures the overlap between generated text and reference text,
focusing on n-gram precision.
2. Effectiveness in Specific Applications
The RAG model has demonstrated notable improvements in several key NLP applications:
Question Answering: RAG models excel in question-answering tasks by integrating
relevant information retrieved from large corpora. Empirical results indicate that RAG
models surpass traditional generative models and retrieval-only models in providing
accurate and contextually relevant answers.
Dialogue Systems: In conversational AI, RAG models enhance the performance of
dialogue systems by generating responses that are informed by retrieved information.
This results in more contextually appropriate and engaging interactions compared to
models relying solely on pre-trained knowledge.
Content Generation: For content generation tasks, such as writing articles or creating
detailed summaries, RAG models have shown the ability to produce coherent and
informative content. The retrieval mechanism facilitates the inclusion of relevant
information, leading to more accurate and comprehensive generated texts.
Summarization: RAG models have proven effective in summarizing lengthy
documents by retrieving key information and generating concise summaries. This
approach improves the informativeness and alignment of the summaries with the
original content.
8
SUMMARY & CONCLUSION:
This project is centered around building a Retrieval-Augmented Generation (RAG) model
leveraging Langchain and Llama3 to facilitate intelligent document querying and
conversation. It allows users to upload various types of files and ask questions about them,
with answers generated by combining the retrieval of relevant document segments and the
power of large language models (LLMs).
1. Document Upload and Preprocessing.
The system enables the upload of files in PDF, Text, or CSV formats. It then
processes these files using custom functions:
File Loading: The read_data function handles loading the file contents, differentiating
between PDF, text, and CSV formats.
PDFs are parsed page by page using PyPDF2, extracting text from each page.
Text and CSV files are loaded using TextLoader and CSVLoader from the Langchain
community module.
Text Splitting: Documents are split into manageable chunks using the
RecursiveCharacterTextSplitter method. This step is essential for making document
search efficient and precise. The text is split based on character length (configurable
chunk size) and includes overlap between chunks to ensure that no important
information is lost at chunk boundaries.
2. Embedding Creation and Vector Store Setup.
To enable efficient and relevant retrieval of document chunks, the system uses
embeddings:
Embedding Model: The system uses OllamaEmbeddings, which generates
embeddings (vector representations) for each text chunk. Embeddings convert the text
into dense vectors that capture the semantic meaning of the content, enabling
similarity-based retrieval.
Vector Store: The embeddings are stored in a FAISS (Facebook AI Similarity Search)
vector store. FAISS allows for fast nearest-neighbor search, enabling the system to
quickly retrieve the document chunks most relevant to user queries.
The vector_store function is responsible for creating the vector store and storing the
embeddings locally, so they can be efficiently accessed later.
9
Users can query the system by searching the vector store, which retrieves the most
relevant chunks of information based on the similarity of the user query and the
embedded text.
3. Question Answering and Conversational AI.
The primary function of the model is to allow users to ask questions about the
uploaded documents. The process works as follows:
Query Handling: The user's input is processed through the user_input function, which
uses a retriever to search the vector store and find the most relevant document chunks
based on the user's query.
LLM Integration: Once relevant chunks are retrieved, the system utilizes the Llama3
model for generating responses. The integration is done using Langchain's
RetrievalQA module, where the model provides detailed and natural language answers
based on the document chunks retrieved.
A system prompt is also provided, ensuring that the Llama3 model generates polite,
accurate, and user-friendly responses.
This architecture ensures that the answers are grounded in the actual document
content, reducing hallucinations and improving the factual accuracy of responses.
Conversation History: All user interactions (queries and answers) are saved in a JSON
file to provide a persistent record of the conversation. This history can be reviewed
later, allowing users to revisit previous responses and documents.
4. User Interface
The entire system is built with Streamlit, offering a simple and interactive web
interface:
File Upload Interface: Users can upload multiple documents via a drag-and-drop
interface, selecting between PDF, Text, and CSV formats.
Question Input: A text box is provided for the user to ask questions related to the
uploaded documents.
Response Display: The retrieved answers are displayed along with the relevant
document source, including metadata like the file name and page number.
Sidebar Options: Users can configure several settings from the sidebar, such as chunk
size, chunk overlap, and the embedding model used. These controls allow for fine-
tuning the document retrieval and response generation process.
10
Conclusion:
The RAG model developed in this project integrates document retrieval with large
language models (LLMs) to answer user queries based on uploaded documents. The
key success of this system lies in its ability to combine two powerful techniques:
Information Retrieval: The use of embeddings and vector search ensures that the
system retrieves the most relevant information from the document, allowing for
precise answers.
Generative Language Models: By using Llama3, the system can take the retrieved
information and generate coherent, contextually relevant answers in a conversational
style.
Advantages of the System:
Accurate and Contextual Responses: The model provides grounded answers, as the
LLM generates responses based on real document content, avoiding the issue of
generating misleading or factually incorrect information.
Multi-File and Multi-Format Support: The system supports a variety of document
formats (PDF, Text, CSV), making it versatile and applicable in different domains.
Modularity: The system is highly modular, allowing for future integration with other
LLMs, different vector stores, or enhanced embedding techniques.
User-Friendly Interface: The Streamlit interface provides a seamless experience for
uploading files, asking questions, and viewing answers, making the system accessible
to non-technical users.
Future Improvements:
Support for Additional File Formats: While the system handles PDF, Text, and CSV
files, adding support for formats like DOCX or web scraping could further extend its
applicability.
Advanced LLM Models: Although Llama3 performs well, exploring newer and more
advanced models like GPT-4 or Claude could improve the quality and depth of
responses.
Scaling for Large Datasets: As the number of documents increases, performance
optimizations in vector storage and retrieval may be necessary to handle larger
datasets efficiently.
11
REFERENCES
1. Wen, G. (2023, July 18). Experimenting with a Locally Deployed Llama 3 Model for
RAG Using Ollama and Streamlit. Medium. Retrieved from
https://medium.com/@georgewen7/experimenting-with-a-locally-deployed-llama-3-
model-for-rag-using-ollama-and-streamlit-183fd937f047
2. Abhyuday, T. (2023, June 25). Retrieval-Augmented Generation (RAG): From Basics
to Advanced. Medium. Retrieved from
https://medium.com/@tejpal.abhyuday/retrieval-augmented-generation-rag-from-
basics-to-advanced-a2b068fd576c
3. LangChain. (2023). Local Retrieval-Augmented Generation (RAG). LangChain
Documentation. Retrieved from
https://python.langchain.com/v0.2/docs/tutorials/local_rag/
4. Rahasak. (2023, May 30). Build RAG Application Using a LLM Running on Local
Computer with GPT4All and LangChain. Medium. Retrieved from
https://medium.com/rahasak/build-rag-application-using-a-llm-running-on-local-
computer-with-gpt4all-and-langchain-13b4b8851db8
5. Streamlit Documentation. (2023). Streamlit: Documentation. Streamlit. Retrieved
from https://docs.streamlit.io/
6. YPredOfficial. (2023, April 10). FAISS Vector Database. Medium. Retrieved from
https://medium.com/@ypredofficial/faiss-vector-database-be3a9725172f
12
APPENDIX A
(Screenshots Results)
RAG Model :-
13
APPENDIX B
(Source Code)
RAG Model:-
14
15
16
17