0% found this document useful (0 votes)

5 views74 pages

ML - Mod2 Classification

The document provides an overview of classification in machine learning, detailing its types such as binary, multiclass, and multilabel classification, along with examples. It discusses the MNIST dataset, performance measures like confusion matrices, precision, recall, and the ROC curve, as well as strategies for multiclass classification. Additionally, it touches on logistic regression and softmax regression as methods for classification tasks.

Uploaded by

Prrishitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views74 pages

ML - Mod2 Classification

Uploaded by

Prrishitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

Classification

• As the name suggests, Classification is the task of “classifying things”

into sub-categories. Classification is part of supervised machine
learning in which we put labeled data for training.
• Classification is a process of categorizing data or objects into
predefined classes or categories based on their features or attributes.
• The main objective of classification machine learning is to build a
model that can accurately assign a label or category to a new
observation based on its features.
Types of classification
1. Binary Classification:
• Definition: Involves two classes or categories. The model predicts one of two
possible outcomes.
• Examples:
• Email Classification: Spam or Not Spam
2. Multiclass Classification:
• Definition: Involves more than two classes. Each instance is classified into one of the
three or more categories.
• Examples:
• Handwritten Digit Recognition: Classifying digits from 0 to 9
3. Multilabel Classification:
• Definition: Each instance can belong to multiple classes simultaneously.
• Examples:
• Text Categorization: A news article can belong to multiple categories such as "Sports" and
"Politics."
Types of classification
Classification

MNIST
• MNIST dataset, which is a set of 70,000 small images of
digits handwritten by high school students and employees
of the US Census Bureau. Each image is labeled with the
digit it represents.
The sklearn.datasets package contains mostly three types of
functions:
• fetch_* functions such as fetch_openml() to download real-life
datasets,
• load_* functions to load small toy datasets bundled with Scikit-
Learn and
• make_* functions to generate fake datasets, useful for tests

Generated datasets are usually returned as an (X, y) tuple containing

the input data and the targets, both as 1 NumPy arrays.
There are 70,000 images, and each image has 784 features. This is because
each image is 28×28 pixels, and each feature simply represents one pixel’s
intensity, from 0 (white) to 255 (black)
• We should always create a test set and set it aside before inspecting
the data closely. The MNIST dataset is actually already split into a
training set (the first 60,000 images) and a test set (the last 10,000
images):
Training a Binary Classifier
• Let’s simplify the problem for now and only try to identify one digit—
for example, the number 5.
• This “5-detector” will be an example of a binary classifier, capable of
distinguishing between just two classes, 5 and not-5.
• Let’s create the target vectors for this classification task:

• now let’s pick a classifier and train it

Training a Binary Classifier
Performance Measures
Measuring Accuracy Using Cross-Validation
• Let’s use the cross_val_score() function to evaluate your
SGDClassifier model using K-fold cross-validation, with three
folds
Measuring Accuracy Using Cross-Validation

let’s look at a very dumb classifier that just classifies every single image in the “not-5” class:

This demonstrates why accuracy is generally not the preferred performance measure for
classifiers, especially when you are dealing with skewed datasets (i.e., when some classes are
much more frequent than others)
Confusion Matrices

• A much better way to evaluate the performance of a classifier is to

look at the confusion matrix.
• The general idea is to count the number of times instances of class A
are classified as class B.
• For example, to know the number of times the classifier confused
images of 5s with 3s, you would look in the 5th row and 3rd column
of the confusion matrix.
Confusion Matrices
• To compute the confusion matrix, you first need to have a set of
predictions, so they can be compared to the actual targets.

• Just like the cross_val_score() function, cross_val_predict() performs

K-fold cross-validation, but instead of returning the evaluation scores,
it returns the predictions made on each test fold
Confusion Matrices
Confusion Matrices
• Each row in a confusion matrix represents an actual class, while each
column represents a predicted class.
• The first row of this matrix considers non-5 images (the negative class):
53,892 of them were correctly classified as non-5s (they are called true
negatives),
• while the remaining 687 were wrongly classified as 5s (false positives, also
called type I errors).
• The second row considers the images of 5s (the positive class): 1,891 were
wrongly classified as non-5s (false negatives, also called type II errors),
• while the remaining 3,530 were correctly classified as 5s (true positives).
• The confusion matrix gives you a lot of information,
but sometimes you may prefer a more concise metric.
• An interesting one to look at is the accuracy of the
positive predictions; this is called the precision of the
classifier

precision = TP /(TP + FP)

Where TP is the number of true positives, and FP is the

number of false positives.
• precision is typically used along with another metric
named recall, also called sensitivity or the true
positive rate (TPR):
• this is the ratio of positive instances that are correctly
detected by the classifier

recall = TP /(TP + FN)

Where FN is, of course, the number of false negatives.

F1 Score
The Precision/Recall Trade-off
Instead of calling the classifier’s predict() method, you can
call its decision_function() method, which returns a score
for each instance, and then use any threshold you want to
make predictions based on those scores
How do you decide which threshold to use?
• Suppose you decide to aim for 90% precision. You could use the first
plot to find the threshold you need to use, but that’s not very precise
• Alternatively, you can search for the lowest threshold that gives you at
least 90% precision. For this, you can use the NumPy array’s argmax()
method
The ROC Curve
• The receiver operating characteristic (ROC) curve is another common tool
used with binary classifiers.
• It is very similar to the precision/recall curve, but instead of plotting
precision versus recall, the ROC curve plots the true positive rate (another
name for recall) against the false positive rate (FPR).
• The FPR (also called the fall-out) is the ratio of negative instances that are
incorrectly classified as positive.
• It is equal to 1 – the true negative rate (TNR), which is the ratio of negative
instances that are correctly classified as negative.
• The TNR is also called specificity.
• Hence, the ROC curve plots sensitivity (recall) versus 1 – specificity
The RandomForestClassifier class does not have a decision_function() method, due to the way
it works. We can call the cross_val_predict() function to train the RandomForestClassifier using
cross-validation and make it predict class probabilities for every image as follows:
Multiclass Classification
• multiclass classifiers (also called multinomial classifiers) can distinguish
between more than two classes.
• there are various strategies that you can use to perform multiclass
classification with multiple binary classifiers
• One way to create a system that can classify the digit images into 10
classes (from 0 to 9) is to train 10 binary classifiers, one for each digit (a
0-detector, a 1-detector, a 2-detector, and so on).
• Then when you want to classify an image, you get the decision score
from each classifier for that image and you select the class whose
classifier outputs the highest score. This is called the one-versus-the-
rest (OvR) strategy, or sometimes one-versus-all (OvA)
• Another strategy is to train a binary classifier for every pair of digits:
one to distinguish 0s and 1s, another to distinguish 0s and 2s, another
for 1s and 2s, and so on. This is called the one-versus-one (OvO)
strategy.
• If there are N classes, you need to train N × (N – 1) / 2 classifiers. For
the MNIST problem, this means training 45 binary classifiers!
• When you want to classify an image, you have to run the image
through all 45 classifiers and see which class wins the most duels.
• The main advantage of OvO is that each classifier only needs to be
trained on the part of the training set containing the two classes that
it must distinguish
• Scikit-Learn detects when you try to use a binary classification
algorithm for a multiclass classification task, and it automatically runs
OvR or OvO, depending on the algorithm
• Let’s try this with a support vector machine classifier using the
sklearn.svm.SVC class
• Let’s only train on the first 2,000 images, or else it will take a very long
time:
• This code actually made 45 predictions—one per pair of classes—and
it selected the class that won the most duels.
• If you call the decision_function() method, you will see that it returns
10 scores per instance: one per class.
• If you want to force Scikit-Learn to use one-versus-one or one-versus-
the rest, you can use the OneVsOneClassifier or OneVsRestClassifier
classes.
• Simply create an instance and pass a classifier to its constructor.
• Training an SGDClassifier on a multiclass dataset and using it to make predictions is
just as easy:
Error Analysis
• Let’s look at the confusion matrix. For this, you first need to make
predictions using the cross_val_predict() function; then you can pass
the labels and predictions to the confusion_matrix() function
• since there are now 10 classes instead of 2, the confusion matrix will
contain quite a lot of numbers, and it may be hard to read. A colored
diagram of the confusion matrix is much easier to analyze. To plot such a
diagram, use the ConfusionMatrixDisplay.from_predictions() function:
• This confusion matrix looks
pretty good:
• most images are on the main
diagonal, which means that
they were classified correctly.
• Notice that the cell on the
diagonal in row #5 and
column #5 looks slightly
darker than the other digits.
• This could be because the
model made more errors on
5s, or because there are
fewer 5s in the dataset than
the other digits.
• It’s important to normalize the confusion matrix by dividing each
value by the total number of images in the corresponding (true)
class (i.e., divide by the row’s sum).
• This can be done simply by setting normalize="true". We can also
specify the values_format=".0%" argument to show percentages
with no decimals.
• Now we can easily see
that only 82% of the
images of 5s were
classified correctly.
• The most common
error the model made
with images of 5s was
to misclassify them as
8s: this happened for
10% of all 5s.
• But only 2% of 8s got
misclassified as 5s
• confusion matrices are
generally not
symmetrical
• If you want to make the errors stand out more, you can try putting zero
weight on the correct predictions.
Multilabel Classification
• Until now, each instance has always been assigned to just one class. But
in some cases you may want your classifier to output multiple classes for
each instance.
• Consider a face-recognition classifier: what should it do if it recognizes
several people in the same picture?
• It should attach one tag per person it recognizes. Say the classifier has
been trained to recognize three faces: Alice, Bob, and Charlie.
• Then when the classifier is shown a picture of Alice and Charlie, it should
output [True, False, True] (meaning “Alice yes, Bob no, Charlie yes”).
• Such a classification system that outputs multiple binary tags is called a
multilabel classification system
This code creates a y_multilabel array containing two target labels for each digit
image: the first indicates whether or not the digit is large (7, 8, or 9), and the second
indicates whether or not it is odd. Then the code creates a KNeighborsClassifier
instance, which supports multilabel classification (not all classifiers do), and trains this
model using the multiple targets array.
• There are many ways to evaluate a multilabel classifier, and selecting the right metric really
depends on your project.
• One approach is to measure the F score for each individual label (or any other binary classifier
metric discussed earlier), then simply compute the average score.
• The following code computes the average F score across all labels:
Multioutput Classification
• The last type of classification task we’ll discuss here is called multioutput–
multiclass classification (or just multioutput classification).
• It is a generalization of multilabel classification where each label can be
multiclass
• Unlike multiclass classification (where each instance is assigned to one class
from a set of classes), multioutput classification involves predicting multiple
labels for each instance
• Example Scenario:
Imagine a movie recommendation system that predicts:
• The genre(s) of the movie (Action, Comedy, Drama, etc.)
• The target audience (Kids, Teens, Adults)
• The expected rating (Low, Medium, High)
A simple linear model

life_satisfaction = θ0 + θ1 × GDP_per_capita

This model is just a linear function of the input feature GDP_per_capita.

θ0 and θ1 are the model’s parameters
• More generally, a linear model makes a prediction by simply
computing a weighted sum of the input features, plus a constant
called the bias term (also called the intercept term)
Linear regression model prediction (vectorized form)

yˆ = 𝛉 T . x,
where 𝛉 ⊺ is the transpose of θ (a row vector instead of a column vector)
and 𝛉 ⊺. x is the matrix multiplication of 𝛉 ⊺ and x.
Logistic Regression
• Logistic regression (also called logit regression) is commonly used to
estimate the probability that an instance belongs to a particular class

• If the estimated probability is greater than a given threshold, then the

model predicts that the instance belongs to that class called the
positive class, labeled “1”, and otherwise it predicts that it does not
i.e., it belongs to the negative class, labeled “0”.

• Thus makes it a binary classifier.

• Like a linear regression model, a logistic regression model computes a
weighted sum of the input features plus a bias term, but instead of
outputting the result directly like the linear regression model does, it
outputs the logistic of this result

The logistic σ(·)—is a sigmoid function (i.e., S-shaped) that outputs a number
between 0 and 1
Once the logistic regression model has estimated the probability
pˆ = h (x) that an instance x belongs to the positive class, it can make its
prediction ŷ easily

Notice that σ(t) < 0.5 when t < 0, and σ(t) ≥ 0.5 when t ≥ 0, so a logistic
regression model using the default threshold of 50% probability predicts
1 if θ x is positive and 0 if it is negative.
Training and Cost Function
• The objective of training is to set the parameter vector θ so that the
model estimates high probabilities for positive instances (y = 1) and
low probabilities for negative instances (y = 0).
• This idea is captured by the cost function shown in Equation below-
for a single training instance x.
• The cost function over the whole training set is the average cost over
all training instances. It can be written in a single expression called
the log loss
Iris Dataset
• This is a famous dataset that contains the sepal and petal length and
width of 150 iris flowers of three different species:
Iris setosa, Iris versicolor, and Iris virginica
Let’s try to build a classifier to detect the Iris virginica type based only on
the petal width feature. The first step is to load the data
Next we’ll split the data and train a logistic regression model on the training
set:
Let’s look at the model’s estimated probabilities for flowers with petal
widths varying from 0 cm to 3 cm
Softmax Regression
• The logistic regression model can be generalized to support multiple
classes directly, without having to train and combine multiple binary
classifiers. This is called softmax regression, or multinomial logistic
regression
• when given an instance x, the softmax regression model first
computes a score s (x) for each class k, then estimates the probability
of each class by applying the softmax function
K is the number of classes.
s(x) is a vector containing the scores of each class for the instance x.
σ(s(x)) is the estimated probability that the instance x belongs to class
k, given the scores of each class for that instance.
• Just like the logistic regression classifier, by default the softmax
regression classifier predicts the class with the highest estimated
probability (which is simply the class with the highest score), as shown
in Equation
Minimizing the cost function shown in Equation, called the cross
entropy, should lead to this objective because it penalizes the model
when it estimates a low probability for a target class. Cross entropy is
frequently used to measure how well a set of estimated class
probabilities matches the target classes.
• Let’s use softmax regression to classify the iris plants into all three classes.
ScikitLearn’s LogisticRegression classifier uses softmax regression
automatically when you train it on more than two classes It also applies ℓ
regularization by default, which you can control using the hyperparameter
C, as mentioned earlier:

Lec5 Classification
No ratings yet
Lec5 Classification
27 pages
Classification: Prof. Gheith Abandah
No ratings yet
Classification: Prof. Gheith Abandah
30 pages
ML Unit 2
No ratings yet
ML Unit 2
31 pages
Lecture03. Classification (Chapter 3)
No ratings yet
Lecture03. Classification (Chapter 3)
46 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Module 2
No ratings yet
Module 2
151 pages
Hands On ML Workshop-Classification
No ratings yet
Hands On ML Workshop-Classification
17 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Unit 2 Classification
No ratings yet
Unit 2 Classification
59 pages
Classification FoundationalMathofAI S24
No ratings yet
Classification FoundationalMathofAI S24
6 pages
Machine Learning Classification Guide
No ratings yet
Machine Learning Classification Guide
28 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
18 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
Module 4 - Classification
No ratings yet
Module 4 - Classification
10 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Lecture 3 1611410001002
No ratings yet
Lecture 3 1611410001002
51 pages
BSC ML CH1
No ratings yet
BSC ML CH1
63 pages
Machine Learning Unit-2
No ratings yet
Machine Learning Unit-2
89 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
Binary Classifier Evaluation Guide
No ratings yet
Binary Classifier Evaluation Guide
12 pages
CLASSIFICATION
No ratings yet
CLASSIFICATION
36 pages
Lecture 11 - 09.09.24 Classification Part 1
No ratings yet
Lecture 11 - 09.09.24 Classification Part 1
51 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
Maxbox Starter66 Machine Learning4
No ratings yet
Maxbox Starter66 Machine Learning4
10 pages
Machine Learning II
No ratings yet
Machine Learning II
61 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Comprehensive Guide On Confusion Matrix 1657202063
No ratings yet
Comprehensive Guide On Confusion Matrix 1657202063
5 pages
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
Classification
No ratings yet
Classification
53 pages
Machine Learning for IT Students
No ratings yet
Machine Learning for IT Students
13 pages
8.predictive Analytics - Classification 2
No ratings yet
8.predictive Analytics - Classification 2
28 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
4 Types of Classification Tasks in Machine Learning
No ratings yet
4 Types of Classification Tasks in Machine Learning
14 pages
Module 6
No ratings yet
Module 6
24 pages
ML LVC 3 Post-Session Summary
No ratings yet
ML LVC 3 Post-Session Summary
16 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Data Mining Classification and Prediction
No ratings yet
Data Mining Classification and Prediction
17 pages
Multiclass Classification
No ratings yet
Multiclass Classification
45 pages
Minimalist Business Slides XL by Slidesgo
No ratings yet
Minimalist Business Slides XL by Slidesgo
27 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Machine Learning Note
No ratings yet
Machine Learning Note
40 pages
ML Classification Algorithms Guide
No ratings yet
ML Classification Algorithms Guide
13 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
0 Machine Learning Overview and Metrics LT
No ratings yet
0 Machine Learning Overview and Metrics LT
84 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
Intro to Binary Classification
No ratings yet
Intro to Binary Classification
10 pages
ML-Unit 3 Classification
No ratings yet
ML-Unit 3 Classification
41 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
Mla Unit-5'2
No ratings yet
Mla Unit-5'2
74 pages
Chương 2e. Model Evaluation
No ratings yet
Chương 2e. Model Evaluation
27 pages
L02 Classification and Regression
No ratings yet
L02 Classification and Regression
26 pages
04-Presentation Directional Protection
100% (1)
04-Presentation Directional Protection
36 pages
P Block Notes
No ratings yet
P Block Notes
6 pages
Mesopotamia and The Bible Mark W. Chavalas (Eds.) PDF Download
No ratings yet
Mesopotamia and The Bible Mark W. Chavalas (Eds.) PDF Download
71 pages
Ganesh Chaturthi 2014
No ratings yet
Ganesh Chaturthi 2014
6 pages
Interference Between Trawl Gear and Pipelines: By: Afifah Abdul Rashid
No ratings yet
Interference Between Trawl Gear and Pipelines: By: Afifah Abdul Rashid
30 pages
Wallprinter
No ratings yet
Wallprinter
2 pages
3 Abdelilah Salim SEHLAOUI
No ratings yet
3 Abdelilah Salim SEHLAOUI
17 pages
Multiple Choice Questions B A History Capitalism and Colonialism Semester VI
No ratings yet
Multiple Choice Questions B A History Capitalism and Colonialism Semester VI
5 pages
Student Visa SOP for Canada
100% (2)
Student Visa SOP for Canada
3 pages
Internal Coil Calculation - Compress4
No ratings yet
Internal Coil Calculation - Compress4
17 pages
Resume Kartikey Bharadwaj-1
No ratings yet
Resume Kartikey Bharadwaj-1
2 pages
Leprosy and Measles Overview
No ratings yet
Leprosy and Measles Overview
176 pages
Sustainability 2 Marks Answers
No ratings yet
Sustainability 2 Marks Answers
3 pages
Hektoen Enteric Agar (HEA) : S Indicator: Ferric Ammonium Citrate (Producing Black Precipitate Due
No ratings yet
Hektoen Enteric Agar (HEA) : S Indicator: Ferric Ammonium Citrate (Producing Black Precipitate Due
3 pages
04 Stalls
100% (1)
04 Stalls
24 pages
Web Technology II
No ratings yet
Web Technology II
10 pages
Technical Drawing 8 (Q1-Week 3)
100% (1)
Technical Drawing 8 (Q1-Week 3)
3 pages
M.Com Marketing Analysis: Apple
No ratings yet
M.Com Marketing Analysis: Apple
19 pages
Development of The Inclusion Attitude Scale For High School Teachers
No ratings yet
Development of The Inclusion Attitude Scale For High School Teachers
19 pages
Cit211 Explanation
No ratings yet
Cit211 Explanation
4 pages
Report
No ratings yet
Report
27 pages
Anh11 CK2
No ratings yet
Anh11 CK2
18 pages
Money Adv Comp Essay
No ratings yet
Money Adv Comp Essay
5 pages
SML Prospekt Cast PET - Final
No ratings yet
SML Prospekt Cast PET - Final
32 pages
05 - m106 - Partie4-7e
No ratings yet
05 - m106 - Partie4-7e
34 pages
JLL Report On Green Building
No ratings yet
JLL Report On Green Building
29 pages
Lecture Notes On Topological Insulators: Zyuzin and Burkov 2012
No ratings yet
Lecture Notes On Topological Insulators: Zyuzin and Burkov 2012
3 pages
Algorithms For Polynomial and Rational Approximation
No ratings yet
Algorithms For Polynomial and Rational Approximation
141 pages
Relative Masses of Atoms and Molecules
100% (1)
Relative Masses of Atoms and Molecules
23 pages
OCC Module 9
No ratings yet
OCC Module 9
5 pages

ML - Mod2 Classification

Uploaded by

ML - Mod2 Classification

Uploaded by

Classification

• As the name suggests, Classification is the task of “classifying things”

Generated datasets are usually returned as an (X, y) tuple containing

• now let’s pick a classifier and train it

• A much better way to evaluate the performance of a classifier is to

• Just like the cross_val_score() function, cross_val_predict() performs

precision = TP /(TP + FP)

Where TP is the number of true positives, and FP is the

recall = TP /(TP + FN)

Where FN is, of course, the number of false negatives.

This model is just a linear function of the input feature GDP_per_capita.

• If the estimated probability is greater than a given threshold, then the

• Thus makes it a binary classifier.

You might also like