KEMBAR78
Model Training and Fine Tuning | PDF | Databases | Artificial Intelligence
0% found this document useful (0 votes)
14 views11 pages

Model Training and Fine Tuning

The document outlines the agenda for a Natural Language Processing course, covering model training, fine-tuning, data cleaning, vector databases, and the Retrieval-Augmented Generation (RAG) method. It emphasizes the importance of customizing AI models for specific domains to improve accuracy and relevance, and details the steps involved in fine-tuning and data cleaning processes. Additionally, it discusses the benefits of using vector databases and RAG in enhancing AI responses and applications across various industries.

Uploaded by

fayazullah775
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views11 pages

Model Training and Fine Tuning

The document outlines the agenda for a Natural Language Processing course, covering model training, fine-tuning, data cleaning, vector databases, and the Retrieval-Augmented Generation (RAG) method. It emphasizes the importance of customizing AI models for specific domains to improve accuracy and relevance, and details the steps involved in fine-tuning and data cleaning processes. Additionally, it discusses the benefits of using vector databases and RAG in enhancing AI responses and applications across various industries.

Uploaded by

fayazullah775
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

19/06/2025

Natural Language Processing


Spring 2025
Prof. Dr. M. Fasih Uddin Butt

Agenda
Overview of topics to be covered:
1. The Purpose of Model Training and Fine-Tuning
2. Cleaning Input Data
3. Using Vector Databases
4. Implementing RAG

1
19/06/2025

The main difference between model training and fine-tuning is that training builds
a model from scratch, while fine-tuning adjusts an existing model for specific
needs

CNN
CNN stands for Convolutional Neural Network, a class of deep learning
models used primarily for processing data that has a grid pattern, such as
images. CNNs are particularly effective for tasks like image classification,
object detection, and segmentation.

2
19/06/2025

2. Introduction to Model Training and Fine-Tuning


(i) Why Train and Fine-Tune Generative AI Models?
● Explain the need for customization to suit domain-specific
needs.
● Benefits: Accuracy, relevance, and improved performance.
● Real-world examples (e.g., chatbots, content generation,
personalized recommendations).

(ii) The Need for Customization


Pre-trained models are trained on diverse and generic datasets. While this makes
them versatile, they lack accuracy when applied to specific domains or unique
tasks. Fine-tuning adapts the model to a particular dataset, language style, or
business requirement, ensuring it generates more relevant, accurate, and targeted
outputs.
For example:
● A general chatbot trained on public conversations may struggle to respond
accurately to medical inquiries.
● A text-to-image model like Stable Diffusion may not create realistic industrial
equipment images without domain-specific fine-tuning.

3
19/06/2025

Real-World Examples
a. Chatbots
● General-purpose chatbots (e.g., ChatGPT) can be fine-tuned for:
○ Healthcare support: Responding to patient queries with precise medical answers.
○ Banking support: Providing information on account balances, fraud detection, etc.
● Example: A chatbot for a hospital fine-tuned to handle medical appointments and FAQs.
b. Content Generation
● Generative AI models like GPT or DALL·E can be customized to produce:
○ Marketing content tailored to a brand's tone and audience.
○ E-learning materials in a specific teaching style or language.(video generation,sora )

2. Introduction to Model Training and Fine-Tuning

(iii) Overview of the Fine-Tuning Process


● Steps:
1. Selecting the model.
2. Preparing the dataset.
3. Training and validation.
4. Deployment.

4
19/06/2025

3. Cleaning Input Data


(i) Why is Data Cleaning Important?
● The impact of clean data on model performance.
● Risks of poor-quality data (bias, errors).
(ii) Steps to Clean Input Data
● Removing duplicates.
● Handling missing values.
● Normalizing data (text/token standardization).
(iii) Tools for Data Cleaning
● Examples: Python libraries (pandas, NLTK, spaCy).

Domains Where Data Cleaning Required


1. Data Science
2. Machine Learning (ML)
3. Big Data
4. Database Management Systems (DBMS)
5. Data Engineering
6. Information Retrieval (IR)
7. Natural Language Processing (NLP)
8. Health Informatics
9. Business Intelligence (BI)

5
19/06/2025

4. Using Vector Databases for Efficient Training

(i) What Are Vector Databases?


A vector database is a collection of data that stores and manages high-dimensional
vector data
(ii) How it works
Vector databases store data as mathematical representations called "vectors". These
vectors are clustered based on similarity, which allows for low-latency queries.
Examples:
Chroma, Pinecone, Weaviate, Faiss, Qdrant, Milvus, and pgvector.

Benefits
Vector databases enable machine learning models to identify similar
objects, which can be used for:
● Search
● Recommendations
● Text generation
● Creating advanced AI programs like LLMs

6
19/06/2025

Comparison to other databases


Vector databases are optimized for storing and retrieving vector data, while SQL
and NoSQL databases are optimized for storing and retrieving structured and
unstructured data, respectively.

Comparison to other databases

7
19/06/2025

Comparison to other databases

Applications
Vectors Represent Semantic Information:
Generative AI transforms inputs (text, images, etc.) into vector embeddings using
models trained on large datasets.
● Example: The words dog and wolf are closer in meaning, so their embeddings
(vectors) will be close in vector space.
Similarity Search Enables RAG:
When you query with a vector, a vector database quickly retrieves similar vectors.
This enables AI to enhance generation with relevant, existing content.
● Example: Generative AI can generate answers or images based on similar past
knowledge retrieved from a vector database.

8
19/06/2025

Definition
It works by retrieving relevant documents or data from a knowledge base or
external source and then using that information to generate more accurate,
contextually aware responses.
Here’s a breakdown of how RAG works:
1. Retrieval: A query or input is processed, and the model retrieves relevant
documents from a large database or knowledge base.
2. Augmentation: The retrieved documents are used to augment the model's input,
providing more context or information.
3. Generation: The augmented input is passed through a generative model (like GPT) to
produce a final, contextually enriched output.

Simple Definition RAG

RAG is a method in artificial intelligence that helps computers give better


answers by combining two steps:
1. Retrieval: First, it searches for useful information from a database or
knowledge base (like looking up facts).
2. Generation: Then, it uses this information to create a clear and
accurate response using a language model (like ChatGPT).

9
19/06/2025

Simple Example

Imagine you’re asking a computer, "Why is the sky blue?"


● The retrieval step is like the computer finding a book about the sky
in a library.
● The generation step is like the computer reading that book and
writing a simple answer for you:
"The sky is blue because of how sunlight interacts with the air."
By combining these steps, the computer gives a smarter answer than
guessing on its own!

Mature Examples of RAG Applications


Customer Support:
● A chatbot retrieves FAQs, policy documents, or troubleshooting guides to
answer customer questions accurately.
○ Query: "How can I reset my account password?"
○ Response: "To reset your password, go to [Settings], click on
[Password Reset], and follow the emailed instructions."
Legal Assistance:
● Retrieving legal statutes and case law to generate summaries or draft
documents.
○ Query: "What does Section 123 of the XYZ Act say about employment
contracts?"
○ Response: "Section 123 outlines the required clauses for valid
employment contracts, including terms for termination and dispute
resolution."

10
19/06/2025

Benefits:

1. Up-to-Date Information
2. Reduced Hallucinations
3. Improved Accuracy for Domain-Specific Tasks
4. Scalability and Efficiency
5. Contextual and Custom Responses
6. Enhanced Transparency and Interpretability
7. Cost-Effective Solution
8. Adaptability Across Industries
9. Combines Generative and Search Power
10. Personalization and Context Management

11

You might also like