KEMBAR78
Unit-I - Machine Learning Concepts | PDF | Machine Learning | Artificial Intelligence
0% found this document useful (0 votes)
48 views135 pages

Unit-I - Machine Learning Concepts

Uploaded by

aboobackera839
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views135 pages

Unit-I - Machine Learning Concepts

Uploaded by

aboobackera839
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 135

DEPARTMENT OF STATISTICS & OPERATIONS RESEARCH

AMU ALIGARH -202002 , U. P. (INDIA)

MACHINE LEARNING (DSM 2002)


M.SC. II SEMESTER (DATA SCIENCE)
2023-24

DR ZAHID AHMED ANSARI


2

COURSE OBJECTIVE

• To introduce the basic concepts of machine learning.

Dr. Zahid Ahmed Ansari 1/30/2024


3

COURSE OUTCOME

• On successful completion of this course, the students will be able to


1. Describe the concepts of machine learning
2. Apply the machine learning tools in data science

Dr. Zahid Ahmed Ansari 1/30/2024


4

SYLLABUS
Contents
Unit-I Machine Learning: Concept and issues, Supervised versus unsupervised learning,
Regression versus Classification problem, Algorithms versus Models, Model training:
regression and classification models, , model assessment , bias-variance trade-off, hyper
parameter tuning, cross validation, ROC curves

Unit-II Tree based methods: Basics of decision trees, a simple tree, tree entropy and information
gain, Trees versus linear models, pros and cons of trees, overfitting, pruning a tree, Trees
versus linear models, bagging, random forests, boosting, fitting of classification and
regression trees
Unit-III Support vector machines (SVMs): Overview, separating hyperplane, maximal margin
classifier, support vector classifier (SVC): linear classification and classification with non-
linear decision boundaries, SVM versus SVC, SVM with more than 2 classes: One-versus-One
and One-versus- All case, kernel functions
Unit-IV Neural Networks: Overview, single and multilayer neural networks, neural networks for
regression and classification. kNN classifier and k means clustering as machine learning
tools.
5

RECOMMENDED BOOKS

No Title Author
1 Machine Learning Made Easy with R: An Intuitive Step by Lewis, N.D. (2017) CreateSpace
Step Blueprint for Beginners Independent Publishing Platform
2 Introduction to Machine Learning with R: Rigorous Burger, S.V. (2018)
mathematical modeling O Reilly.
3 Machine Learning with R: Expert Techniques for Predictive Lantz, B. (2019)
Modeling. Packt Publications, 3rd edition.
4 Hands-On Machine Learning with Scikit-Learn, Keras, and Aurélien Géron
TensorFlow: Concepts, Tools, and Techniques to Build
Intelligent Systems
5 Machine Learning For Dummies John Paul Mueller, Luca Massaron

Dr. Zahid Ahmed Ansari 1/30/2024


6

MACHINE LEARNING INTRODUCTION


• Concept and issues
• Supervised versus unsupervised learning
• Regression versus Classification problem
• Algorithms versus Models
• Model training: regression and classification models
• Model assessment
• bias-variance trade-off
• hyper parameter tuning
• cross validation
• ROC curves
Dr. Zahid Ahmed Ansari 1/30/2024
7

Artificial Intelligence
and
Machine Learning
Dr. Zahid Ahmed Ansari 1/30/2024
8

ARTIFICIAL INTELLIGENCE (AI)


• Artificial intelligence (AI) is a field of computer science which makes a computer system
that can mimic human intelligence. AI means "a human-made thinking power." We can
define it as:
• Artificial intelligence is a technology using which we can create intelligent systems that
can simulate human intelligence
• The Artificial intelligence system does not require to be pre-programmed, instead of that,
they use such algorithms which can work with their own intelligence.
• It involves machine learning algorithms such as Reinforcement learning algorithm and deep
learning neural networks.
• AI is being used in multiple places such as Siri, Google’s AlphaGo, AI in Chess playing, etc.
• Based on capabilities, AI can be classified into three types: Weak AI, General AI & Strong AI

1/30/2024
9

AI AND ML

• AI is a bigger concept to create intelligent


machines that can simulate human
thinking capability and behavior
• Machine learning (ML) is a subset of AI
that allows machines to learn from data
without being programmed explicitly.

Dr. Zahid Ahmed Ansari 1/30/2024


10

AI VS ML
Artificial Intelligence Machine Learning

Artificial intelligence is a technology which enables Machine learning is a subset of AI which allows a
a machine to simulate human behavior. machine to automatically learn from past data
without programming explicitly.
The goal of AI is to make a smart computer system The goal of ML is to allow machines to learn from
like humans to solve complex problems. data so that they can give accurate output.
In AI, we make intelligent systems to perform any In ML, we teach machines with data to perform a
task like a human. particular task and give an accurate result.
Machine learning and deep learning are the two Deep learning is a main subset of machine learning.
main subsets of AI.
AI has a very wide range of scope. Machine learning has a limited scope.
AI is working to create an intelligent system which Machine learning is working to create machines
can perform various complex tasks. that can perform only those specific tasks for which
they are trained.
1/30/2024
11

AI VS ML
Artificial Intelligence Machine Learning

AI system is concerned about maximizing the Machine learning is mainly concerned about
chances of success. accuracy and patterns.
The main applications of AI are Siri, customer The main applications of ML are Online
support using catboats, Expert System, Online recommender system, Google search
game playing, intelligent humanoid robot, etc. algorithms, Facebook auto friend tagging
suggestions, etc.
On the basis of capabilities, AI can be divided into ML can also be divided into mainly three types that
three types, which are, Weak AI, General AI, are Supervised learning, Unsupervised learning,
and Strong AI. and Reinforcement learning.
It includes learning, reasoning, and self-correction. It includes learning and self-correction when
introduced with new data.
AI completely deals with Structured, semi- Machine learning deals with Structured and semi-
structured, and unstructured data. structured data.
AI system is concerned about maximizing the Machine learning is mainly concerned about
1/30/2024
chances of success. accuracy and patterns.
12

WHAT IS MACHINE LEARNING?

• Machine Learning (ML) is basically that field of computer science with the help of
which computer systems can provide sense to data in much the same way as human
beings do.
• In simple words, ML is a type of artificial intelligence that extract patterns out of
raw data by using an algorithm or method.
• The key focus of ML is to allow computer systems to learn from experience without
being explicitly programmed or human intervention.

Dr. Zahid Ahmed Ansari 1/30/2024


13

DEFINITION OF MACHINE LEARNING

• Arthur Samuel, an early American leader in the field of computer gaming and
artificial intelligence, coined the term “Machine Learning ” in 1959 while at IBM. He
defined machine learning as:
• The field of study that gives computers the ability to learn without being
explicitly programmed .
• However, there is no universally accepted definition for machine learning. Different
authors define the term differently. Another definition is:
• The field of study known as machine learning is concerned with the
question of how to construct computer programs that automatically
improve with experience.

1/30/2024
15

FORMAL DEFINITION OF ML
BY PROFESSOR MITCHELL

• A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by
P, improves with experience E.”
• The above definition is basically focusing on three parameters namely Task(T),
Performance(P) and experience (E).

• We can simplify this definition as:


• ML is a field of AI consisting of learning algorithms that
• Improve their performance (P)
• At executing some task (T)
• Over time with experience (E)
Dr. Zahid Ahmed Ansari 1/30/2024
16

TASK, EXPERIENCE AND PERFORMANCE


• Task(T): An ML based task is based on the process which operates on data points. The
examples of ML based tasks are Classification, Regression, Structured annotation,
Clustering, Transcription etc.
• Experience (E): It is the knowledge gained from data points provided to the algorithm or
model.
• Once provided with the dataset, the model will run iteratively and will learn some
inherent pattern. The learning thus acquired is called experience (E).
• Supervised, unsupervised and reinforcement learning are some ways to learn or gain
experience. The experience gained by out ML model or algorithm will be used to solve
the task T.
• Performance (P): An ML algorithm is supposed to perform task and gain experience with
the passage of time. The measure which tells whether ML algorithm is performing as per
expectation or not is its performance (P).
• P is basically a quantitative metric that tells how a model is performing the task, T, using
its experience, E. There are many metrics that help to understand the ML performance,
such as accuracy score, F1 score, confusion matrix, precision, recall, sensitivity etc.
1/30/2024
19

TERMINOLOGIES OF MACHINE LEARNING


• Model: A model is a specific representation learned from data by applying some machine
learning algorithm. A model is also called hypothesis.
• Feature: A feature is an individual measurable property of our data. A set of numeric
features can be conveniently described by a feature vector. Feature vectors are fed as input
to the model.
• For example, in order to predict a fruit, there may be features like color, smell, taste, etc.
• Choosing informative, discriminating and independent features is a crucial step for
effective algorithms.
• Generally, a feature extractor is employed to extract the relevant features from the raw
data.
• Target (Label): A target variable or label is the value to be predicted by our model.
• For the fruit example, the label with each set of input would be the name of the fruit like
apple, orange, banana, etc.

Dr. Zahid Ahmed Ansari 1/30/2024


20

TERMINOLOGIES OF MACHINE LEARNING

• Training: The idea is to give a set of inputs


(features) and it’s expected outputs (labels), so
after training, we will have a model
(hypothesis) that will then map new data to
one of the categories trained on.
• Prediction: Once the model is ready, it can be
fed a set of inputs to which it will provide a
predicted output (label). But make sure if the
machine performs well on unseen data, then
only we can say the machine performs well.

Dr. Zahid Ahmed Ansari 1/30/2024


23

NEED FOR MACHINE LEARNING

• Human beings, at this moment, are the most intelligent and advanced species on earth
because they can think, evaluate and solve complex problems.
• On the other side, AI is still in its initial stage and haven’t surpassed human intelligence in
many aspects.
• Then the question is that what is the need to make machine learn? The most suitable reason
for doing this is:
• To make decisions, based on data, with efficiency and scale.
• Organizations are investing heavily in technologies like AI, ML and Deep Learning to get
the key information from data to perform several real-world tasks and solve problems.
• We all need to solve real-world problems with efficiency at a huge scale. That is why the
need for machine learning arises

Dr. Zahid Ahmed Ansari 1/30/2024


A tag is a special kind of link. When you
tag someone, you create a link to their
timeline. The post you tag the person in
may also be added to that person’s
timeline. For example, you can tag a
photo to show who’s in the photo or post
a status update and say who you’re with.
Alexa is capable of voice interaction,
music playback, making to-do
lists, setting alarms, streaming
podcasts, playing audiobooks, and
providing weather, traffic, sports, and
other real-time information, such
as news.

Alexa can also control several smart


devices using itself as a home
automation system.
Siri can make
calls or send
texts for you
whether you
Siri is an
are driving,
intelligent
have your
assistant that
hands full or are
offers a faster,
simply on the
easier way to get
go. It can even
things done on
announce your
your Apple
messages on
devices. Even
your AirPods. Siri
before you ask.
also offers
proactive
suggestions so
you can stay in
touch
effortlessly.
32

MACHINE LEARNING – APPLICATIONS

1/30/2024
33

MACHINE LEARNING – APPLICATIONS


1. Image Recognition:
• Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image recognition
and face detection is, Automatic friend tagging suggestion:
• Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo
with our Facebook friends, then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and recognition algorithm.
• It is based on the Facebook project named "Deep Face," which is responsible for face recognition
and person identification in the picture.
2. Speech Recognition
• While using Google, we get an option of "Search by voice" it comes under speech recognition, and
it's a popular application of machine learning.
• Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition." At present, machine learning algorithms are
widely used by various applications of speech recognition. Google assistant, Siri, Cortana,
and Alexa are using speech recognition technology to follow the voice instructions.
1/30/2024
34

MACHINE LEARNING – APPLICATIONS


3. Traffic prediction:
• If we want to visit a new place, we take help of Google Maps, which shows us the correct path with the
shortest route and predicts the traffic conditions.
• It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested
with the help of two ways:
• Real Time location of the vehicle form Google Map app and sensors
• Average time has taken on past days at the same time.
• Everyone who is using Google Map is helping this app to make it better. It takes information from the
user and sends back to its database to improve the performance.
4. Product recommendations:
• Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some
product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning.
• Google understands the user interest using various machine learning algorithms and suggests the
product as per customer interest.
• As similar, when we use Netflix, we find some recommendations for entertainment series, movies, etc.,
and this is also done with the help of machine learning. 1/30/2024
35

MACHINE LEARNING – APPLICATIONS

5. Self-driving cars:
• One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars. Tesla, the most popular car
manufacturing company is working on self-driving car. It is using unsupervised learning
method to train the car models to detect people and objects while driving.
6. Email Spam and Malware Filtering:
• Whenever we receive a new email, it is filtered automatically as important, normal, and
spam. We always receive an important mail in our inbox with the important symbol and
spam emails in our spam box, and the technology behind this is Machine learning. Below are
some spam filters used by Gmail:
• Content Filter, Header filter, General blacklists filter, Rules-based filters, Permission filters
• Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve
Bayes classifier are used for email spam filtering and malware detection.

Dr. Zahid Ahmed Ansari 1/30/2024


36

MACHINE LEARNING – APPLICATIONS


7. Virtual Personal Assistant:
• We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the
name suggests, they help us in finding the information using our voice instruction. These
assistants can help us in various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.
• These virtual assistants use machine learning algorithms as an important part.
• These assistant record our voice instructions, send it over the server on a cloud, and decode it
using ML algorithms and act accordingly.
8. Online Fraud Detection:
• Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is a
genuine transaction or a fraud transaction.
• For each genuine transaction, the output is converted into some hash values, and these values
become the input for the next round. For each genuine transaction, there is a specific pattern
which gets change for the fraud transaction hence, it detects it and makes our online transactions
more secure.
1/30/2024
37

MACHINE LEARNING – APPLICATIONS


9. Stock Market trading:
• Machine learning is widely used in stock market trading. In the stock market, there is always a
risk of up and downs in shares, so for this machine learning's long short term memory neural
network is used for the prediction of stock market trends.
10. Medical Diagnosis:
• In medical science, machine learning is used for diseases diagnoses. With this, medical technology
is growing very fast and able to build 3D models that can predict the exact position of lesions in
the brain.
• It helps in finding brain tumors and other brain-related diseases easily.
11. Automatic Language Translation:
• Nowadays, if we visit a new place and we are not aware of the language then it is not a problem at
all, as for this also machine learning helps us by converting the text into our known languages.
Google's GNMT (Google Neural Machine Translation) provide this feature, which is a Neural
Machine Learning that translates the text into our familiar language, and it called as automatic
translation.
• The technology behind the automatic translation is a sequence to sequence learning algorithm,
which translates the text from one language to another language
1/30/2024
38

MACHINE LEARNING – APPLICATIONS

12. Web Search Engine: One of the reasons why search engines like google, bing etc work so
well is because the system has learnt how to rank pages through a complex learning
algorithm.
13. Automation: Machine learning, which works entirely autonomously in any field without
the need for any human intervention. For example, robots performing the essential
process steps in manufacturing plants.
14. Computer vision: Machine learning algorithms can be used to recognize objects, people,
and other elements in images and videos.
15. Natural language processing: Machine learning algorithms can be used to understand
and generate human language, including tasks such as translation and text classification.

Dr. Zahid Ahmed Ansari 1/30/2024


39

MACHINE LEARNING – APPLICATIONS

16. Finance Industry: Machine learning is growing in popularity in the finance industry.
Banks are mainly using ML to find patterns inside the data but also to prevent fraud.
17. Government organization: The government makes use of ML to manage public safety and
utilities. Take the example of China with the massive face recognition. The government
uses Artificial intelligence to prevent jaywalker.
18. Healthcare industry: Healthcare was one of the first industry to use machine learning
with image detection.
19. Marketing: Broad use of AI is done in marketing thanks to abundant access to data. Before
the age of mass data, researchers develop advanced mathematical tools like Bayesian
analysis to estimate the value of a customer. With the boom of data, marketing department
relies on AI to optimize the customer relationship and marketing campaign.

Dr. Zahid Ahmed Ansari 1/30/2024


40

TYPES OF MACHINE LEARNING PROBLEMS

• Types of Machine Learning based on the nature of the learning “signal” or “feedback”
available to a learning system
1. Supervised learning:
2. Unsupervised learning:
3. Reinforcement learning:

Dr. Zahid Ahmed Ansari 1/30/2024


45

SUPERVISED MACHINE LEARNING

• Supervised learning is the type of machine learning in which machines are trained using
well "labelled" training data, and on basis of that data, machines predict the output. The
labelled data means some input data is already tagged with the correct output.
• In supervised learning, the training data provided to the machines work as the supervisor
that teaches the machines to predict the output correctly. It applies the same concept as a
student learns in the supervision of the teacher.
• Supervised learning is a process of providing input data as well as correct output data to the
machine learning model. The aim of a supervised learning algorithm is to find a mapping
function to map the input variable(x) with the output variable(y).
• In the real-world, supervised learning can be used for Risk Assessment, Image classification,
Fraud Detection, spam filtering, etc.

Dr. Zahid Ahmed Ansari 1/30/2024


46

SUPERVISED LEARNING
• The main objective of supervised learning algorithms is to learn an association between
input data samples and corresponding outputs after performing multiple training data
instances.
• For example, we have
x: Input variables and
Y: Output variable
• Now, apply an algorithm to learn the mapping function from the input to output as follows:
Y=f(x)
• Now, the main objective would be to approximate the mapping function so well that even
when we have new input data (x), we can easily predict the output variable (Y) for that new
input data.
1/30/2024
47

HOW SUPERVISED LEARNING WORKS?


• Suppose we have a dataset of different types of shapes
which includes square, rectangle, triangle, and
Polygon. Now the first step is that we need to train the
model for each shape.
• If the given shape has four sides, and all the sides are
equal, then it will be labelled as a Square.
• If the given shape has three sides, then it will be labelled
as a triangle.
• If the given shape has six equal sides then it will be
labelled as hexagon.
• Now, after training, we test our model using the test
set, and the task of the model is to identify the shape.
• The machine is already trained on all types of shapes,
and when it finds a new shape, it classifies the shape
on the bases of a number of sides, and predicts the
output.
1/30/2024
48

STEPS INVOLVED IN SUPERVISED LEARNING

• First Determine the type of training dataset


• Collect/Gather the labelled training data.
• Split the training dataset into training dataset, test dataset, and validation dataset.
• Determine the input features of the training dataset, which should have enough knowledge so that the
model can accurately predict the output.
• Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc.
• Execute the algorithm on the training dataset. Sometimes we need validation sets as the control
parameters, which are the subset of training datasets.
• Evaluate the accuracy of the model by providing the test set. If the model predicts the correct output,
which means our model is accurate.

Dr. Zahid Ahmed Ansari 1/30/2024


49

ADVANTAGES AND DISADVANTAGES OF


SUPERVISED LEARNING
• Advantages of Supervised learning:
• With the help of supervised learning, the model can predict the output on the basis of prior
experiences.
• In supervised learning, we can have an exact idea about the classes of objects.
• Supervised learning model helps us to solve various real-world problems such as fraud
detection, spam filtering, etc.
• Disadvantages of supervised learning:
• Supervised learning cannot predict the correct output if the test data is different from the
training dataset.
• Training required lots of computation times.
• In supervised learning, we need enough knowledge about the classes of object.

1/30/2024
50

TYPES OF SUPERVISED LEARNING

• Supervised learning can be further divided into two types of problems:

Dr. Zahid Ahmed Ansari 1/30/2024


51

REGRESSION
• It is also a supervised learning problem, that predicts a numeric value and outputs are
continuous rather than discrete. For example, predicting the stock prices using historical
data.
• It is used for the prediction of continuous variables, such as Weather forecasting, Market
Trends, etc.
• Below are some popular Regression algorithms which come under supervised learning:
• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression

1/30/2024
52

CLASSIFICATION
• Classification: Inputs are divided into two or more classes, and the learner must
produce a model that assigns unseen inputs to one or more (multi-label
classification) of these classes and predicting whether or not something belongs to a
particular class. This is typically tackled in a supervised way.
• Classification models can be categorized in two groups: Binary classification and
Multiclass Classification. Spam filtering is an example of binary classification,
where the inputs are email (or other) messages, and the classes are “spam” and “not
spam”.
• Classification algorithms are used when the output variable is categorical, which means
there are two classes such as Yes-No, Male-Female, True-false, etc.
• Random Forest
• Decision Trees
• Logistic Regression
• Support vector Machines
1/30/2024
53

EXAMPLE OF CLASSIFICATION AND REGRESSION

• An example of classification and regression on two different datasets is shown


below:

Dr. Zahid Ahmed Ansari 1/30/2024


55

UNSUPERVISED MACHINE LEARNING

• In the previous topic, we learned supervised machine learning in which models are trained using
labeled data under the supervision of training data. But there may be many cases in which we do
not have labeled data and need to find the hidden patterns from the given dataset. So, to solve
such types of cases in machine learning, we need unsupervised learning techniques.
• Unsupervised Learning is a machine learning technique in which models are not supervised
using training dataset. Instead, models itself find the hidden patterns and insights from the given
data. It can be compared to learning which takes place in the human brain while learning new
things. It can be defined as:
• Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.

Dr. Zahid Ahmed Ansari 1/30/2024


56

UNSUPERVISED MACHINE LEARNING


• Unsupervised learning cannot be directly applied to a
regression or classification problem because unlike
supervised learning, we have the input data but no
corresponding output data. The goal of unsupervised
learning is to find the underlying structure of dataset,
group that data according to similarities, and represent
that dataset in a compressed format.
• Example: Suppose the unsupervised learning algorithm is
given an input dataset containing images of different types
of cats and dogs. The algorithm is never trained upon the
given dataset, which means it does not have any idea
about the features of the dataset. The task of the
unsupervised learning algorithm is to identify the image
features on their own. Unsupervised learning algorithm
will perform this task by clustering the image dataset into
the groups according to similarities between images. 1/30/2024
57

WHY USE UNSUPERVISED LEARNING?

• Below are some main reasons which describe the importance of Unsupervised Learning:
• Unsupervised learning is helpful for finding useful insights from the data.
• Unsupervised learning is much similar as a human learns to think by their own
experiences, which makes it closer to the real AI.
• Unsupervised learning works on unlabeled and uncategorized data which make
unsupervised learning more important.
• In real-world, we do not always have input data with the corresponding output so to
solve such cases, we need unsupervised learning.

Dr. Zahid Ahmed Ansari 1/30/2024


58

WORKING OF UNSUPERVISED LEARNING

• Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to
the machine learning model in order to train it. Firstly, it will interpret the raw data
to find the hidden patterns from the data and then will apply suitable algorithms
such as k-means clustering, Decision tree, etc.
• Once it applies the suitable algorithm, the algorithm divides the data objects into
groups according to the similarities and difference between the objects.
1/30/2024
59

COMMON UNSUPERVISED LEARNING TECHNIQUES


• Clustering: Clustering methods are one of the most useful unsupervised ML methods. These algorithms
used to find similarity as well as relationship patterns among data samples and then cluster those samples
into groups having similarity based on features. The real-world example of clustering is to group the
customers by their purchasing behavior.
• Association: Another useful unsupervised ML method is Association which is used to analyze large dataset
to find patterns which further represents the interesting relationships between various items. It is also
termed as Association Rule Mining or Market basket analysis which is mainly used to analyze customer
shopping patterns.
• Dimensionality Reduction: This unsupervised ML method is used to reduce the number of feature variables
for each data sample by selecting set of principal or representative features. A question arises here is that
why we need to reduce the dimensionality? The reason behind is the problem of feature space complexity
which arises when we start analyzing and extracting millions of features from data samples. This problem
generally refers to “curse of dimensionality”. PCA (Principal Component Analysis), K-nearest neighbors and
discriminant analysis are some of the popular algorithms for this purpose.
• Anomaly Detection: This unsupervised ML method is used to find out the occurrences of rare events or
observations that generally do not occur. By using the learned knowledge, anomaly detection methods
would be able to differentiate between anomalous or a normal data point. Some of the unsupervised
algorithms like clustering, KNN can detect anomalies based on the data and its features. 1/30/2024
60

UNSUPERVISED LEARNING ALGORITHMS

• Below is the list of some popular unsupervised learning algorithms:


• K-means clustering
• KNN (k-nearest neighbors)
• Hierarchal clustering
• Anomaly detection
• Neural Networks
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition

Dr. Zahid Ahmed Ansari 1/30/2024


61

ADVANTAGES AND DISADVANTAGES OF


UNSUPERVISED LEARNING

• Advantages of Unsupervised Learning


• Unsupervised learning is used for more complex tasks as compared to supervised learning
because, in unsupervised learning, we don't have labeled input data.
• Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled
data.
• Disadvantages of Unsupervised Learning
• Unsupervised learning is intrinsically more difficult than supervised learning as it does not have
corresponding output.
• The result of the unsupervised learning algorithm might be less accurate as input data is not
labeled, and algorithms do not know the exact output in advance.

Dr. Zahid Ahmed Ansari 1/30/2024


62

SUPERVISED VS UNSUPERVISED LEARNING

Supervised Learning Unsupervised Learning


Supervised learning algorithms are trained using Unsupervised learning algorithms are trained using
labeled data. unlabeled data.
Supervised learning model takes direct feedback to Unsupervised learning model does not take any
check if it is predicting correct output or not. feedback.
Supervised learning model predicts the output. Unsupervised learning model finds the hidden
patterns in data.
In supervised learning, input data is provided to In unsupervised learning, only input data is
the model along with the output. provided to the model.
The goal of supervised learning is to train the The goal of unsupervised learning is to find the
model so that it can predict the output when it is hidden patterns and useful insights from the
given new data. unknown dataset.
Supervised learning needs supervision to train the Unsupervised learning does not need any
model. supervision to train the model.
Dr. Zahid Ahmed Ansari 1/30/2024
63

SUPERVISED VS UNSUPERVISED LEARNING

Supervised Learning Unsupervised Learning


Supervised learning can be categorized Unsupervised Learning can be classified
in Classification and Regression problems. in Clustering and Associations problems.
Supervised learning can be used for those cases Unsupervised learning can be used for those cases
where we know the input as well as corresponding where we have only input data and no
outputs. corresponding output data.
Supervised learning model produces an accurate Unsupervised learning model may give less
result. accurate result as compared to supervised learning.
Supervised learning is not close to true Artificial Unsupervised learning is more close to the true
intelligence as in this, we first train the model for Artificial Intelligence as it learns similarly as a child
each data, and then only it can predict the correct learns daily routine things by his experiences.
output.
It includes various algorithms such as Linear It includes various algorithms such as Clustering,
Regression, Logistic Regression, Support Vector KNN, and Apriori algorithm.
Machine, Multi-class Classification, Decision tree,
Bayesian Logic, etc. 1/30/2024
64

SUPERVISED VS UNSUPERVISED LEARNING


• The data in supervised learning is • The data in unsupervised learning is
labelled unlabelled.

• The most basic disadvantage of any Supervised Learning algorithm is that the dataset has to be hand-
labeled either by a Machine Learning Engineer or a Data Scientist. This is a very costly process,
especially when dealing with large volumes of data. The most basic disadvantage of any Unsupervised
Learning is that it’s application spectrum is limited.
65

SEMI-SUPERVISED LEARNING

• Such kind of algorithms or methods are neither fully supervised nor fully unsupervised.
They basically fall between the two i.e. supervised and unsupervised learning methods.
• These kinds of algorithms generally use small supervised learning component i.e. small
amount of pre-labeled annotated data and large unsupervised learning component i.e. lots
of unlabeled data for training. We can follow any of the following approaches for
implementing semi-supervised learning methods:
• The first and simple approach is to build the supervised model based on small amount
of labeled and annotated data and then build the unsupervised model by applying the
same to the large amounts of unlabeled data to get more labeled samples. Now, train the
model on them and repeat the process.
• The second approach needs some extra efforts. In this approach, we can first use the
unsupervised methods to cluster similar data samples, annotate these groups and then
use a combination of this information to train the model.

1/30/2024
66

SEMI-SUPERVISED LEARNING

• Semi-supervised learning: Problems where you have a large amount of input data and only some of
the data is labeled, are called semi-supervised learning problems.
• These problems sit in between both supervised and unsupervised learning.
• For example, a photo archive where only some of the images are labeled, (e.g. dog, cat, person)
and the majority are unlabeled.
• Semi-supervised learning is particularly useful when there is a large amount of unlabeled data
available, but it’s too expensive or difficult to label all of it. Some examples of semi-supervised learning
applications include:
• Text classification: In text classification, the goal is to classify a given text into one or more
predefined categories. Semi-supervised learning can be used to train a text classification model
using a small amount of labeled data and a large amount of unlabeled text data.
• Image classification: In image classification, the goal is to classify a given image into one or more
predefined categories. Semi-supervised learning can be used to train an image classification
model using a small amount of labeled data and a large amount of unlabeled image data.
• Anomaly detection: In anomaly detection, the goal is to detect patterns or observations that are
unusual or different from the norm.
1/30/2024
68

REINFORCEMENT LEARNING

• Reinforcement learning: A computer program interacts with a dynamic


environment in which it must perform a certain goal (such as driving a vehicle or
playing a game against an opponent). The program is provided feedback in terms of
rewards and punishments as it navigates its problem space.

Dr. Zahid Ahmed Ansari 1/30/2024


69

REGRESSION ANALYSIS IN ML
• Regression analysis is a statistical method to model the relationship
between a dependent (target) and independent (predictor) variables with
one or more independent variables.
• Regression analysis helps us to understand how the value of the
dependent variable is changing corresponding to an independent variable
when other independent variables are held fixed.
• It predicts continuous/real values such as temperature, age, salary,
price, etc.
• Example: Suppose there is a marketing company A, who does various
advertisement every year and get sales on that. The below list shows the
advertisement made by the company in the last 5 years and the
corresponding sales:
• Now, the company wants to do the advertisement of $200 in the year
2019 and wants to know the prediction about the sales for this year.
• So to solve such type of prediction problems in machine learning, we
need regression analysis.

1/30/2024
70

REGRESSION ANALYSIS IN ML
• Regression is a supervised learning technique which helps in finding the correlation between variables
and enables us to predict the continuous output variable based on the one or more predictor variables.
• It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect
relationship between variables.
• In Regression, we plot a graph between the variables which best fits the given datapoints, using this
plot, the machine learning model can make predictions about the data.
• In simple words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the regression line
is minimum."
• The distance between datapoints and line tells whether a model has captured a strong relationship or
not.
• Some examples of regression can be as:
• Prediction of rain using temperature and other factors
• Determining Market trends
• Prediction of road accidents due to rash driving.

1/30/2024
71

TERMINOLOGIES RELATED TO THE REGRESSION

• Dependent Variable: The main factor in Regression analysis which we want to predict or
understand is called the dependent variable. It is also called target variable.
• Independent Variable: The factors which affect the dependent variables or which are used
to predict the values of the dependent variables are called independent variable, also called
as a predictor.
• Outliers: Outlier is an observation which contains either very low value or very high value
in comparison to other observed values. An outlier may hamper the result, so it should be
avoided.
• Multicollinearity: If the independent variables are highly correlated with each other than
other variables, then such condition is called Multicollinearity. It should not be present in
the dataset, because it creates problem while ranking the most affecting variable.
• Underfitting and Overfitting: If our algorithm works well with the training dataset but not
well with test dataset, then such problem is called Overfitting. And if our algorithm does
not perform well even with training dataset, then such problem is called underfitting.
72

WHY DO WE USE REGRESSION ANALYSIS?

• As mentioned above, Regression analysis helps in the prediction of a continuous variable.


• There are various scenarios in the real world where we need some future predictions such as
weather condition, sales prediction, marketing trends, etc., for such case we need some
technology which can make predictions more accurately.
• So for such case we need Regression analysis which is a statistical method and used in
machine learning and data science.
• Below are some other reasons for using Regression analysis:
• Regression estimates the relationship between the target and the independent variable.
• It is used to find the trends in data.
• It helps to predict real/continuous values.
• By performing the regression, we can confidently determine the most important factor,
the least important factor, and how each factor is affecting the other factors.
Dr. Zahid Ahmed Ansari 1/30/2024
73

TYPES OF REGRESSION
• There are various types of regressions which are used
in data science and machine learning. Each type has its
own importance on different scenarios, but at the core,
all the regression methods analyze the effect of the
independent variable on dependent variables. Here
we are discussing some important types of regression
which are given below:
• Linear Regression
• Logistic Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression

1/30/2024
74

LINEAR REGRESSION

• Linear regression is a statistical regression method which is used for predictive analysis.
• It is one of the very simple and easy algorithms which works on regression and shows the
relationship between the continuous variables.
• It is used for solving the regression problem in machine learning.
• Linear regression shows the linear relationship between the independent variable (X-axis)
and the dependent variable (Y-axis), hence called linear regression.
• If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.

Dr. Zahid Ahmed Ansari 1/30/2024


75

LINEAR REGRESSION
• The given image explains the relationship between
variables in the linear regression model. We are predicting
the salary of an employee based on the year of experience.
• Below is the mathematical equation for Linear regression:
Y = aX + b
• Y = dependent variables (target variables),
• X= Independent variables (predictor variables),
• a and b are the linear coefficients
• Some popular applications of linear regression are:
• Analyzing trends and sales estimates
• Salary forecasting
• Real estate prediction
• Arriving at ETAs in traffic.

1/30/2024
76

LOGISTIC REGRESSION
• Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a binary or
discrete format such as 0 or 1.
• Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True
or False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept of probability.
• Logistic regression is a type of regression, but it is different from the linear regression algorithm
in the term how they are used.
• There are three types of logistic regression:
• Binary (0/1, pass/fail)
• Multi (cats, dogs, lions)
• Ordinal (low, medium, high)

Dr. Zahid Ahmed Ansari 1/30/2024


77

LOGISTIC REGRESSION

• Logistic regression uses sigmoid function or logistic


function which is a complex cost function. This sigmoid
function is used to model the data in logistic regression.
The function can be represented as:
𝟏
𝒇 𝒙 =
𝟏 + 𝒆−𝒙
• f(x)= Output between the 0 and 1 value.
• x= input to the function
• e= base of natural logarithm.
• When we provide the input values (data) to the function,
it gives the S-curve as follows:

Dr. Zahid Ahmed Ansari 1/30/2024


78

POLYNOMIAL REGRESSION

• Polynomial Regression is a type of regression which models the non-linear


dataset using a linear model.
• It is similar to multiple linear regression, but it fits a non-linear curve between the
value of x and corresponding conditional values of y.
• Suppose there is a dataset which consists of datapoints which are present in a non-
linear fashion, so for such case, linear regression will not best fit to those datapoints.
To cover such datapoints, we need Polynomial regression.
• In Polynomial regression, the original features are transformed into polynomial
features of given degree and then modeled using a linear model. Which means
the datapoints are best fitted using a polynomial line.

Dr. Zahid Ahmed Ansari 1/30/2024


79

POLYNOMIAL REGRESSION
• The equation for polynomial regression also derived
from linear regression equation that means Linear
regression equation Y= b0+ b1x, is transformed into
Polynomial regression equation Y= b0+b1x+ b2x2+
b3x3+.....+ bnxn
• Here Y is the predicted/target output, b0, b1,... bn are
the regression coefficients. x is
our independent/input variable.
• The model is still linear as the coefficients are still
linear with quadratic
• Note: This is different from Multiple Linear
regression in such a way that in Polynomial
regression, a single element has different degrees
instead of multiple variables with the same degree

Dr. Zahid Ahmed Ansari 1/30/2024


80

SUPPORT VECTOR REGRESSION

• Support Vector Machine (SVM) is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression problems, then it
is termed as Support Vector Regression (SVR).
• Support Vector Regression is a regression algorithm which works for continuous variables.
Below are some keywords which are used in Support Vector Regression:
• Kernel: It is a function used to map a lower-dimensional data into higher dimensional data.
• Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it is a
line which helps to predict the continuous variables and cover most of the datapoints.
• Boundary line: Boundary lines are the two lines apart from hyperplane, which creates a
margin for datapoints.
• Support vectors: Support vectors are the datapoints which are nearest to the hyperplane and
opposite class.
Dr. Zahid Ahmed Ansari 1/30/2024
81

SUPPORT VECTOR REGRESSION


• In SVR, we always try to determine a
hyperplane with a maximum margin, so that
maximum number of datapoints are covered in
that margin.
• The main goal of SVR is to consider the
maximum datapoints within the boundary
lines and the hyperplane (best-fit line) must
contain a maximum number of datapoints.
• Consider the image here:
• Here, the blue line is called hyperplane, and the
other two lines are known as boundary lines.

Dr. Zahid Ahmed Ansari 1/30/2024


82

DECISION TREE REGRESSION

• Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
• It can solve problems for both categorical and numerical data
• Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test, and
each leaf node represents the final decision or result.
• A decision tree is constructed starting from the root node/parent node (dataset),
which splits into left and right child nodes (subsets of dataset). These child nodes
are further divided into their children node, and themselves become the parent
node of those nodes. Consider the below image:

Dr. Zahid Ahmed Ansari 1/30/2024


83

DECISION TREE REGRESSION EXAMPLE


• This image gives the example of Decision Tee regression, here, the model is trying to predict the
number of hours game played based on various weather conditions.

1/30/2024
84

RANDOM FOREST REGRESSION

• Random forest is one of the most powerful supervised learning algorithms which is
capable of performing regression as well as classification tasks.
• The Random Forest regression is an ensemble learning method which combines
multiple decision trees and predicts the final output based on the average of each
tree output. The combined decision trees are called as base models, and it can be
represented more formally as:
• g(x)= f0(x)+ f1(x)+ f2(x)+....
• Random forest uses Bagging or Bootstrap Aggregation technique of ensemble
learning in which aggregated decision tree runs in parallel and do not interact with
each other.

Dr. Zahid Ahmed Ansari 1/30/2024


85

RANDOM FOREST REGRESSION

• With the help of Random Forest


regression, we can prevent
Overfitting in the model by creating
random subsets of the dataset.

Dr. Zahid Ahmed Ansari 1/30/2024


86

CLASSIFICATION ALGORITHM IN MACHINE


LEARNING
• The Classification algorithm is a Supervised Learning technique that is used to
identify the category of new observations on the basis of training data.
• In Classification, a program learns from the given dataset or observations and then
classifies new observation into a number of classes or groups. Such as, Yes or No, 0
or 1, Spam or Not Spam, cat or dog, etc. Classes can be called as targets/labels or
categories.
• Unlike regression, the output variable of Classification is a category, not a value,
such as "Green or Blue", "fruit or animal", etc.
• Since the Classification algorithm is a Supervised learning technique, hence it takes
labeled input data, which means it contains input with the corresponding output.

Dr. Zahid Ahmed Ansari 1/30/2024


87

CLASSIFICATION ALGORITHM

• In classification algorithm, a discrete output function(y) is


mapped to input variable(x).
y=f(x), where y = categorical output
• The best example of an ML classification algorithm is Email
Spam Detector.
• The main goal of the Classification algorithm is to identify
the category of a given dataset, and these algorithms are
mainly used to predict the output for the categorical data.
• In this diagram, there are two classes, class A and Class B.
These classes have features that are similar to each other and
dissimilar to other classes.

Dr. Zahid Ahmed Ansari 1/30/2024


88

CLASSIFICATION ALGORITHM

• The algorithm which implements the classification on a dataset is known as a


classifier. There are two types of Classifications:
• Binary Classifier: If the classification problem has only two possible outcomes, then
it is called as Binary Classifier.
• Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
• Multi-class Classifier: If a classification problem has more than two outcomes, then
it is called as Multi-class Classifier.
• Example: Classifications of types of crops, Classification of types of music.

Dr. Zahid Ahmed Ansari 1/30/2024


89

LEARNERS IN CLASSIFICATION PROBLEMS

• In the classification problems, there are two types of learners:


1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives the
test dataset. In Lazy learner case, classification is done on the basis of the most related
data stored in the training dataset. It takes less time in training but more time for
predictions.
• Example: K-NN algorithm, Case-based reasoning
2. Eager Learners: Eager Learners develop a classification model based on a training dataset
before receiving a test dataset. Opposite to Lazy learners, Eager Learner takes more time
in learning, and less time in prediction.
• Example: Decision Trees, Naïve Bayes, ANN.

Dr. Zahid Ahmed Ansari 1/30/2024


90

TYPES OF ML CLASSIFICATION ALGORITHMS

• Classification Algorithms can be further divided into the Mainly two category:
• Linear Models
• Logistic Regression
• Support Vector Machines
• Non-linear Models
• Decision Tree Classification
• Random Forest Classification
• Naïve Bayes
• K-Nearest Neighbours
• Kernel SVM

Dr. Zahid Ahmed Ansari 1/30/2024


91

USE CASES OF CLASSIFICATION ALGORITHMS

• Classification algorithms can be used in different places. Below are some popular
use cases of Classification Algorithms:
• Email Spam Detection
• Speech Recognition
• Identifications of Cancer tumor cells.
• Drugs Classification
• Biometric Identification, etc.

Dr. Zahid Ahmed Ansari 1/30/2024


92

REGRESSION VS. CLASSIFICATION


• Regression and Classification algorithms are
Supervised Learning algorithms. Both the
algorithms are used for prediction in Machine
learning and work with the labeled datasets. But
the difference between both is how they are used
for different machine learning problems.
• The main difference between Regression and
Classification algorithms that Regression
algorithms are used to predict the
continuous values such as price, salary, age, etc.
and Classification algorithms are used
to predict/Classify the discrete values such as
Male or Female, True or False, Spam or Not Spam,
etc.
1/30/2024
93

REGRESSION VS. CLASSIFICATION


Regression Algorithm Classification Algorithm
In Regression, the output variable must be of In Classification, the output variable must be a
continuous nature or real value. discrete value.
The task of the regression algorithm is to map the The task of the classification algorithm is to map the
input value (x) with the continuous output input value(x) with the discrete output variable(y).
variable(y).
Regression Algorithms are used with continuous data. Classification Algorithms are used with discrete data.
In Regression, we try to find the best fit line, which In Classification, we try to find the decision boundary,
can predict the output more accurately. which can divide the dataset into different classes.
Regression algorithms can be used to solve the Classification Algorithms can be used to solve
regression problems such as Weather Prediction, classification problems such as Identification of spam
House price prediction, etc. emails, Speech Recognition, Identification of cancer
cells, etc.
The regression Algorithm can be further divided into The Classification algorithms can be divided into
Linear and Non-linear Regression. Binary Classifier and Multi-class Classifier.
94

CLUSTERING IN MACHINE LEARNING


• Clustering or cluster analysis is a machine learning technique, which groups the unlabelled
dataset. It can be defined as "A way of grouping the data points into different clusters,
consisting of similar data points. The objects with the possible similarities remain in a
group that has less or no similarities with another group."
• It does it by finding some similar patterns in the unlabelled dataset such as shape, size,
color, behavior, etc., and divides them as per the presence and absence of those similar
patterns.
• It is an unsupervised learning method, hence no supervision is provided to the algorithm,
and it deals with the unlabeled dataset.
• After applying this clustering technique, each cluster or group is provided with a cluster-ID.
ML system can use this id to simplify the processing of large and complex datasets.
• Note: Clustering is somewhere similar to the classification algorithm, but the difference is
the type of dataset that we are using. In classification, we work with the labeled data set,
whereas in clustering, we work with the unlabelled dataset.

1/30/2024
95

CLUSTERING EXAMPLE
• Example: Let's understand the clustering technique with the real-world example of Mall:
When we visit any shopping mall, we can observe that the things with similar usage are
grouped together. Such as the t-shirts are grouped in one section, and trousers are at other
sections, similarly, at vegetable sections, apples, bananas, Mangoes, etc., are grouped in
separate sections, so that we can easily find out the things. The clustering technique also
works in the same way. Other examples of clustering are grouping documents according to
the topic.
• The clustering technique can be widely used in various tasks. Some most common uses of
this technique are:
• Market Segmentation
• Statistical data analysis
• Social network analysis
• Image segmentation
• Anomaly detection, etc. 1/30/2024
96

CLUSTERING EXAMPLE
• Apart from these general usages, it is used by the Amazon in its recommendation system to
provide the recommendations as per the past search of products. Netflix also uses this
technique to recommend the movies and web-series to its users as per the watch history.
• The below diagram explains the working of the clustering algorithm. We can see the
different fruits are divided into several groups with similar properties.

1/30/2024
97

TYPES OF CLUSTERING METHODS

• The clustering methods are broadly divided into Hard clustering (datapoint
belongs to only one group) and Soft Clustering (data points can belong to another
group also). But there are also other various approaches of Clustering exist.
• Below are the main clustering methods used in Machine learning:
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering

Dr. Zahid Ahmed Ansari 1/30/2024


98

PARTITIONING CLUSTERING

• It is a type of clustering that divides the data


into non-hierarchical groups. It is also known as
the centroid-based method. The most common
example of partitioning clustering is the K-
Means Clustering algorithm.
• In this type, the dataset is divided into a set of k
groups, where K is used to define the number
of pre-defined groups.
• The cluster center is created in such a way that
the distance between the data points of one
cluster is minimum as compared to another
cluster centroid.

Dr. Zahid Ahmed Ansari 1/30/2024


99

DENSITY-BASED CLUSTERING

• The density-based clustering method connects


the highly-dense areas into clusters, and the
arbitrarily shaped distributions are formed as
long as the dense region can be connected.
• This algorithm does it by identifying different
clusters in the dataset and connects the areas of
high densities into clusters.
• The dense areas in data space are divided from
each other by sparser areas.
• These algorithms can face difficulty in
clustering the data points if the dataset has
varying densities and high dimensions.

Dr. Zahid Ahmed Ansari 1/30/2024


100

DISTRIBUTION MODEL-BASED CLUSTERING

• In the distribution model-based clustering


method, the data is divided based on the
probability of how a dataset belongs to a
particular distribution. The grouping is
done by assuming some distributions
commonly Gaussian Distribution.
• The example of this type is
the Expectation-Maximization Clustering
algorithm that uses Gaussian Mixture
Models (GMM).

Dr. Zahid Ahmed Ansari 1/30/2024


101

HIERARCHICAL CLUSTERING

• Hierarchical clustering can be used as an


alternative for the partitioned clustering as there is
no requirement of pre-specifying the number of
clusters to be created.
• In this technique, the dataset is divided into
clusters to create a tree-like structure, which is
also called a dendrogram.
• The observations or any number of clusters can be
selected by cutting the tree at the correct level.
• The most common example of this method is
the Agglomerative Hierarchical algorithm.

Dr. Zahid Ahmed Ansari 1/30/2024


102

FUZZY CLUSTERING

• Fuzzy clustering is a type of soft method in which a data object may belong to more
than one group or cluster.
• Each dataset has a set of membership coefficients, which depend on the degree of
membership to be in a cluster.
• Fuzzy C-means algorithm is the example of this type of clustering; it is sometimes
also known as the Fuzzy k-means algorithm.

Dr. Zahid Ahmed Ansari 1/30/2024


103

CLUSTERING ALGORITHMS
• The Clustering algorithms can be divided based on their models that are explained above.
There are different types of clustering algorithms published, but only a few are commonly
used. The clustering algorithm is based on the kind of data that we are using. Such as, some
algorithms need to guess the number of clusters in the given dataset, whereas some are
required to find the minimum distance between the observation of the dataset.
• Here we are discussing mainly popular Clustering algorithms that are widely used in
machine learning:
1. K-Means algorithm: The k-means algorithm is one of the most popular clustering
algorithms. It classifies the dataset by dividing the samples into different clusters of equal
variances. The number of clusters must be specified in this algorithm. It is fast with fewer
computations required, with the linear complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth
density of data points. It is an example of a centroid-based model, that works on updating
the candidates for centroid to be the center of the points within a given region.

Dr. Zahid Ahmed Ansari 1/30/2024


104

CLUSTERING ALGORITHMS
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications
with Noise. It is an example of a density-based model similar to the mean-shift, but with
some remarkable advantages. In this algorithm, the areas of high density are separated by
the areas of low density. Because of this, the clusters can be found in any arbitrary shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can be used as an
alternative for the k-means algorithm or for those cases where K-means can be failed. In
GMM, it is assumed that the data points are Gaussian distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm
performs the bottom-up hierarchical clustering. In this, each data point is treated as a
single cluster at the outset and then successively merged. The cluster hierarchy can be
represented as a tree-structure.
6. Affinity Propagation: It is different from other clustering algorithms as it does not require
to specify the number of clusters. In this, each data point sends a message between the
pair of data points until convergence. It has O(N2T) time complexity, which is the main
drawback of this algorithm.

Dr. Zahid Ahmed Ansari 1/30/2024


105

APPLICATIONS OF CLUSTERING
• Identification of Cancer Cells: The clustering algorithms are widely used for the
identification of cancerous cells. It divides the cancerous and non-cancerous data sets into
different groups.
• Search Engines: Search engines also work on the clustering technique. The search result
appears based on the closest object to the search query. It does it by grouping similar data
objects in one group that is far from the other dissimilar objects. The accurate result of a
query depends on the quality of the clustering algorithm used.
• Customer Segmentation: It is used in market research to segment the customers based on
their choice and preferences.
• Biology: It is used in the biology stream to classify different species of plants and animals
using the image recognition technique.
• Land Use: The clustering technique is used in identifying the area of similar lands use in the
GIS database. This can be very useful to find that for what purpose the particular land
should be used, that means for which purpose it is more suitable.
1/30/2024
106

SIMPLE MACHINE LEARNING EXAMPLE IN PYTHON


# Load the necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

# Load the iris dataset


df = pd.read_csv('iris.csv’)

# Split the data into features and labels


X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = df['species’]

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

1/30/2024
107

SIMPLE MACHINE LEARNING EXAMPLE IN PYTHON

# Create an SVM model and train it


model = SVC()
model.fit(X_train, y_train)

# Evaluate the model on the test data


accuracy = model.score(X_test, y_test)

print('Test accuracy:', accuracy)

1/30/2024
108

CHALLENGES IN MACHINES LEARNING

• While Machine Learning is rapidly evolving, this segment of AI as whole still has a
long way to go. The reason behind is that ML has not been able to overcome number
of challenges. The challenges that ML is facing currently are −
• Quality of data − Having good-quality data for ML algorithms is one of the biggest
challenges. Use of low-quality data leads to the problems related to data
preprocessing and feature extraction.
• Time-Consuming task − Another challenge faced by ML models is the consumption
of time especially for data acquisition, feature extraction and retrieval.
• Lack of specialist persons − As ML technology is still in its infancy stage,
availability of expert resources is a tough job.
• No clear objective for formulating business problems − Having no clear objective
and well-defined goal for business problems is another key challenge for ML
because this technology is not that mature yet.
1/30/2024
109

CHALLENGES IN MACHINES LEARNING

• Issue of overfitting & underfitting − If the model is overfitting or underfitting, it


cannot be represented well for the problem.
• Curse of dimensionality − Another challenge ML model faces is too many features
of data points. This can be a real hindrance.
• Difficulty in deployment − Complexity of the ML model makes it quite difficult to
be deployed in real life.

Dr. Zahid Ahmed Ansari 1/30/2024


110

WHAT IS AN “ALGORITHM” IN ML

• An “algorithm” in machine learning is a procedure that is run on data to create a


machine learning “model.”
• Machine learning algorithms perform “pattern recognition.” Algorithms “learn”
from data or are “fit” on a dataset.
• There are many machine learning algorithms. For example, we have algorithms for
classification, such as k-nearest neighbors. We have algorithms for regression, such
as linear regression, and we have algorithms for clustering, such as k-means.
• Examples of machine learning algorithms:
• Linear Regression, Logistic Regression Decision Tree, Artificial Neural Network,
k-Nearest Neighbors, k-Means etc
• As such, machine learning algorithms have a number of properties:
• Machine learning algorithms can be described using math and pseudocode.
• The efficiency of machine learning algorithms can be analyzed and described.
• Machine learning algorithms can be implemented with any one of a range of
modern programming languages.
1/30/2024
111

WHAT IS A “MODEL” IN ML

• A “model” in machine learning is the output of a machine learning algorithm run


on data.
• A model represents what was learned by a machine learning algorithm.
• The model is the “thing” that is saved after running a machine learning algorithm
on training data and represents the rules, numbers, and any other algorithm-
specific data structures required to make predictions.
• Some examples might make this clearer:
• The linear regression algorithm results in a model comprised of a vector of coefficients
with specific values.
• The decision tree algorithm results in a model comprised of a tree of if-then statements
with specific values.
• The neural network / backpropagation / gradient descent algorithms together result in a
model comprised of a graph structure with vectors or matrices of weights with specific
values.

Dr. Zahid Ahmed Ansari 1/30/2024


112

ALGORITHM VS. MODEL


• So now we are familiar with a machine learning “algorithm” vs. a machine learning “model.”
• Specifically, an algorithm is run on data to create a model.
• Machine Learning => Machine Learning Model
• We also understand that a model is comprised of both data and a procedure for how to use the
data to make a prediction on new data. You can think of the procedure as a prediction algorithm
if you like.
• Machine Learning Model == Model Data + Prediction Algorithm
• For example, most algorithms have all of their work in the “algorithm” and the “prediction
algorithm” does very little.
• The linear regression algorithm performs an optimization process to find a set of weights that
minimize the sum squared error on the training dataset.
• Linear Regression:
• Algorithm: Find set of coefficients that minimize error on training dataset
• Model:
• Model Data: Vector of coefficients
• Prediction Algorithm: Multiple and sum coefficients with input row

1/30/2024
113

MACHINE LEARNING MODEL ASSESSMENT/EVALUATION

• Model evaluation is the process that uses some metrics which help us to analyze the performance
of the machine learning model.
• As we all know that model development is a multi-step process and a check should be kept on
how well the model generalizes future predictions.
• Evaluating a model plays a vital role so that we can judge the performance of our model.
• The evaluation also helps to analyze a model’s key weaknesses.
• There are many metrics like Accuracy, Precision, Recall, F1 score, Area under Curve, Confusion
Matrix, and Mean Square Error.
• Cross Validation is one technique that is followed during the training phase, and it is a model
evaluation technique as well.

Dr. Zahid Ahmed Ansari 1/30/2024


114

BIAS-VARIANCE TRADE OFF

• It is important to understand prediction errors (bias and variance) when it comes to


accuracy in any machine learning algorithm.
• There is a tradeoff between a model’s ability to minimize bias and variance which is
referred to as the best solution for selecting a value of Regularization constant.
• Proper understanding of these errors would help to avoid the overfitting and
underfitting of a data set while training the algorithm.

Dr. Zahid Ahmed Ansari 1/30/2024


115

CROSS VALIDATION AND HOLDOUT


• Cross Validation is a method in which we do not use the whole dataset for training. In this
technique, some part of the dataset is reserved for testing the model.
• There are many types of Cross-Validation out of which K Fold Cross Validation is mostly
used.
• In K Fold Cross Validation the original dataset is divided into k subsets. The subsets are
known as folds. This is repeated k times where 1 fold is used for testing purposes. Rest k-1
folds are used for training the model. So each data point acts as a test subject for the model
as well as acts as the training subject. It is seen that this technique generalizes the model well
and reduces the error rate
• Holdout is the simplest approach. It is used in neural networks as well as in many
classifiers. In this technique, the dataset is divided into train and test datasets. The dataset is
usually divided into ratios like 70:30 or 80:20. Normally a large percentage of data is used
for training the model and a small portion of the dataset is used for testing the model.

1/30/2024
116

PERFORMANCE METRICS FOR CLASSIFICATION

• In a classification problem, the category or classes of data is identified based on training


data. The model learns from the given dataset and then classifies the new data into classes
or groups based on the training. It predicts class labels as the output, such as Yes or No, 0 or
1, Spam or Not Spam, etc.
• To evaluate the performance of a classification model, different metrics are used, and some
of them are as follows:
• Accuracy
• Confusion Matrix
• Precision
• Recall
• F-Score
• AUC (Area Under the Curve)-ROC

Dr. Zahid Ahmed Ansari 1/30/2024


117

ACCURACY
• Accuracy: The accuracy metric is one of the simplest Classification metrics to implement, and it
can be determined as the number of correct predictions to the total number of predictions.
𝐍𝐨.𝐨𝐟 𝐂𝐨𝐫𝐫𝐞𝐜𝐭 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧𝐬
Accuracy =
𝐓𝐨𝐭𝐚𝐥 𝐍𝐨.𝐨𝐟 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧𝐬

• It is good to use the Accuracy metric when the target variable classes in data are approximately
balanced. For example, if 60% of classes in a fruit image dataset are of Apple, 40% are Mango. In this
case, if the model is asked to predict whether the image is of Apple or Mango, it will give a prediction
with 97% of accuracy.
• It is recommended not to use the Accuracy measure when the target variable majorly belongs to one
class. For example, Suppose there is a model for a disease prediction in which, out of 100 people, only
five people have a disease, and 95 people don't have one. In this case, if our model predicts every
person with no disease (which means a bad prediction), the Accuracy measure will be 95%, which is
not correct.
1/30/2024
118

CONFUSION MATRIX
• A confusion matrix is a tabular representation of
prediction outcomes of any binary classifier, which is
used to describe the performance of the classification
model on a set of test data when true values are known.
• In the matrix, columns are for the prediction values, and
rows specify the Actual values. Here Actual and
prediction give two possible classes, Yes or No. So, if we
are predicting the presence of a disease in a patient, the
Prediction column with Yes means, Patient has the
disease, and for NO, the Patient doesn't have the
disease.
• In this example, the total number of predictions are 165,
out of which 110 time predicted yes, whereas 55 times
predicted No.
• However, in reality, 60 cases in which patients don't
have the disease, whereas 105 cases in which patients
have the disease. 1/30/2024
119

CONFUSION MATRIX
• In general, the table is divided into four
terminologies, which are as follows:
• True Positive(TP): In this case, the prediction
outcome is true, and it is true in reality, also.
• True Negative(TN): in this case, the prediction
outcome is false, and it is false in reality, also.
• False Positive(FP): In this case, prediction
outcomes are true, but they are false in
actuality.
• False Negative(FN): In this case, predictions are
false, and they are true in actuality.

1/30/2024
120

PRECISION

• Precision: The precision metric is used to overcome the limitation of Accuracy. The precision
determines the proportion of positive prediction that was actually correct. It can be
calculated as the True Positive or predictions that are actually true to the total positive
predictions (True Positive and False Positive)
𝐓𝐏
Precision =
𝐓𝐏+𝐅𝐏

Dr. Zahid Ahmed Ansari 1/30/2024


121

RECALL OR SENSITIVITY

• Recall: It is also similar to the Precision metric; however, it aims to calculate the proportion
of actual positive that was identified incorrectly.
• It can be calculated as True Positive or predictions that are actually true to the total number
of positives, either correctly predicted as positive or incorrectly predicted as negative (true
Positive and false negative).
• The formula for calculating Recall is given below
TP
Recall =
TP+FN

Dr. Zahid Ahmed Ansari 1/30/2024


122

WHEN TO USE PRECISION AND RECALL?

• From the above definitions of Precision and Recall, we can say that recall determines the
performance of a classifier with respect to a false negative, whereas precision gives
information about the performance of a classifier with respect to a false positive.
• So, if we want to minimize the false negative, then, Recall should be as near to 100%, and if
we want to minimize the false positive, then precision should be close to 100% as possible.
• In simple words, if we maximize precision, it will minimize the FP errors, and if we
maximize recall, it will minimize the FN error.

Dr. Zahid Ahmed Ansari 1/30/2024


123

F-SCORES

• F-score or F1 Score is a metric to evaluate a binary classification model on the basis of


predictions that are made for the positive class.
• It is calculated with the help of Precision and Recall. It is a type of single score that
represents both Precision and Recall. So, the F1 Score can be calculated as the harmonic
mean of both precision and Recall, assigning equal weight to each of them.
• The formula for calculating the F1 score is given below:
Precision ∗Recall
F1 Score = 2 ∗
Precision+ Recall

Dr. Zahid Ahmed Ansari 1/30/2024


124

AUC-ROC
• Sometimes we need to visualize the performance of the classification model on charts; then, we
can use the AUC-ROC curve. It is one of the popular and important metrics for evaluating the
performance of the classification model.
• ROC (Receiver Operating Characteristic curve) curve represents a graph to show the
performance of a classification model at different threshold levels. The curve is plotted between
two parameters, which are:
• True Positive Rate
• False Positive Rate
TP
• TPR or true Positive rate is a synonym for Recall, hence can be calculated as: TPR = TP+FP
FP
• FPR or False Positive Rate can be calculated as: FPR = FP+TN
• To calculate value at any point in a ROC curve, we can evaluate a logistic regression model
multiple times with different classification thresholds, but this would not be much efficient. So,
for this, one efficient method is used, which is known as AUC

Dr. Zahid Ahmed Ansari 1/30/2024


125

AUC: AREA UNDER THE ROC CURVE

• AUC is known for Area Under the ROC curve. As its


name suggests, AUC calculates the two-dimensional
area under the entire ROC curve, as shown in image.
• AUC calculates the performance across all the
thresholds and provides an aggregate measure.
• The value of AUC ranges from 0 to 1. It means a
model with 100% wrong prediction will have an
AUC of 0.0, whereas models with 100% correct
predictions will have an AUC of 1.0.

1/30/2024
126

PERFORMANCE METRICS FOR REGRESSION

• Regression is a supervised learning technique that aims to find the relationships between the
dependent and independent variables. A predictive regression model predicts a numeric or
discrete value.
• The metrics used for regression are different from the classification metrics. It means we cannot
use the Accuracy metric (explained above) to evaluate a regression model; instead, the
performance of a Regression model is reported as errors in the prediction.
• Following are the popular metrics that are used to evaluate the performance of Regression
models.
• Mean Absolute Error
• Mean Squared Error
• R2 Score
• Adjusted R2

Dr. Zahid Ahmed Ansari 1/30/2024


127

MEAN ABSOLUTE ERROR (MAE)

• Mean Absolute Error or MAE measures the absolute difference between actual and predicted values,
where absolute means taking a number as Positive.
• To understand MAE, let's take an example of Linear Regression, where the model draws a best fit line
between dependent and independent variables. To measure the MAE or error in prediction, we need
to calculate the difference between actual values and predicted values. But in order to find the absolute
error for the complete dataset, we need to find the mean absolute of the complete dataset.
𝟏
• The formula to calculate MAE: 𝑴𝑨𝑬 = 𝑵 ∑|𝒀 − 𝒀′|

• Here, Y is the Actual outcome, Y' is the predicted outcome, and N is the total number of data points.
• MAE is much more robust for the outliers. One of the limitations of MAE is that it is not differentiable,
so for this, we need to apply different optimizers such as Gradient Descent. However, to overcome this
limitation, another metric can be used, which is Mean Squared Error or MSE.
Dr. Zahid Ahmed Ansari 1/30/2024
128

MEAN SQUARED ERROR (MSE)

• Mean Squared error or MSE is one of the most suitable metrics for Regression evaluation. It
measures the average of the Squared difference between predicted values and the actual value
given by the model.
• Since in MSE, errors are squared, therefore it only assumes non-negative values, and it is usually
positive and non-zero.
• Moreover, due to squared differences, it penalizes small errors also, and hence it leads to over-
estimation of how bad the model is.
• MSE is a much-preferred metric compared to other regression metrics as it is differentiable and
hence optimized better
𝟏
• The formula to calculate MSE: 𝑴𝑺𝑬 = 𝑵 ∑ 𝒀 − 𝒀′ 𝟐

• Here, Y is the Actual outcome, Y' is the predicted outcome, and N is the total number of data
points.
Dr. Zahid Ahmed Ansari 1/30/2024
129

R SQUARED SCORE

• R squared error is also known as Coefficient of Determination, which is another popular metric
used for Regression model evaluation. The R-squared metric enables us to compare our model
with a constant baseline to determine the performance of the model. To select the constant
baseline, we need to take the mean of the data and draw the line at the mean.
• The R squared score will always be less than or equal to 1 without concerning if the values are
too large or small
𝟐
𝑴𝑺𝑬(𝑴𝒐𝒅𝒆𝒍)
𝑹 =𝟏 −
𝑴𝑺𝑬(𝑩𝒂𝒔𝒆𝒍𝒊𝒏𝒆)

Dr. Zahid Ahmed Ansari 1/30/2024


130

ADJUSTED R SQUARED

• Adjusted R squared, as the name suggests, is the improved version of R squared error. R square
has a limitation of improvement of a score on increasing the terms, even though the model is not
improving, and it may mislead the data scientists.
• To overcome the issue of R square, adjusted R squared is used, which will always show a lower
value than R². It is because it adjusts the values of increasing predictors and only shows
improvement if there is a real improvement.
• We can calculate the adjusted R squared as follows:
𝒏−𝟏
𝟐
𝐑𝐚 = 𝟏 − × 𝟏 − 𝐑𝟐
𝒏−𝒌−𝟏
• Here, 𝒏 is the number of observations
• 𝒌 denotes the number of independent variables
• 𝐑𝐚𝟐 denotes the adjusted 𝐑𝟐

Dr. Zahid Ahmed Ansari 1/30/2024


131

BIAS
• The bias is known as the difference between the
prediction of the values by the ML model and the
correct value.
• Being high in biasing gives a large error in
training as well as testing data.
• Its recommended that an algorithm should always
be low biased to avoid the problem of
underfitting. By high bias, the data predicted is in
a straight line format, thus not fitting accurately in
the data in the data set. Such fitting is known
as Underfitting of Data.

1/30/2024
132

VARIANCE
• The variability of model prediction for a given data
point which tells us spread of our data is called the
variance of the model.
• The model with high variance has a very complex fit
to the training data and thus is not able to fit
accurately on the data which it hasn’t seen before.
• As a result, such models perform very well on
training data but has high error rates on test data.
• When a model is high on variance, it is then said to
as Overfitting of Data. Overfitting is fitting the
training set accurately via complex curve and high
order hypothesis but is not the solution as the error
with unseen data is high.
• While training a data model variance should be kept
1/30/2024
low. The high variance data looks like follows.
133

BIAS VARIANCE TRADEOFF


• If the algorithm is too simple (hypothesis with linear
eq.) then it may be on high bias and low variance
condition and thus is error-prone.
• If algorithms fit too complex ( hypothesis with high
degree eq.) then it may be on high variance and low
bias.
• In latter case, the new entries will not perform well.
• There is something between both of these conditions,
known as Trade-off or Bias Variance Trade-off.
• This tradeoff in complexity is why there is a tradeoff
between bias and variance. For the graph, the perfect
tradeoff will be like.
• This is referred to as the best point chosen for the
training of the algorithm which gives low error in
training as well as testing data.
1/30/2024
134

HYPERPARAMETER TUNING
• A Machine Learning model is defined as a mathematical model with a number of parameters that need
to be learned from the data.
• By training a model with existing data, we are able to fit the model parameters.
• However, there is another kind of parameter, known as Hyperparameters, that cannot be directly
learned from the regular training process. They are usually fixed before the actual training process
begins. These parameters express important properties of the model such as its complexity or how fast
it should learn.
• Some examples of model hyperparameters include:
1. The learning rate for training a neural network.
2. The k in k-nearest neighbors.
3. The penalty in Logistic Regression Classifier i.e. L1 or L2 regularization
4. The C and sigma hyperparameters for support vector machines.

Dr. Zahid Ahmed Ansari 1/30/2024


135

THANK YOU!

Dr. Zahid Ahmed Ansari 1/30/2024

You might also like