Introduction to Classification, Logistic Regression &
K-Nearest Neighbor
Instructor: Dr. Priyanka D Pantula
Assistant Professor, Department of Chemical Engg.
Indian Institute of Technology (ISM) Dhanbad
(Email: pantula@iitism.ac.in)
1
Types of Machine Learning
9
Types of Machine Learning
10
Approach in
Supervised &
Unsupervised
Machine
Learning
11
Supervised Learning
12
Supervised Learning: Binary Classification
Suspicious Unknown
E-Mail Words Sender Spam
1 Y Y Y
2 N Y N
3 Y N Y
4 Y N N
5 N Y Y
6 Y N Y
7 Y N Y
Fraud detection in credit card transactions
8 Y N N
9 N Y N
10 Y N Y Identifying whether an email is a spam or genuine
Output Variable – Categorical,
Can take only two values.
Detecting defective and non-defective items
Input Variables – Can be any in 13
number, Can be continuous or
categorical.
Supervised Learning: Multi-class Classification
setosa versicolor virginica
S.L [4.3,5.8] [4.9,7] [4.9,7.9]
S.W [2.3,4.4] [2,3.4] [2.2,3.8]
P.L [1,1.9] [3,5.1] [4.5,6.9]
P.W [0.1,0.6] [1,1.8] [1.4,2.5]
Distinguishing between different products
inretailstoreshelf.
Classifying different music pieces by
Output Variable – Categorical, Can take any number Genre.
of values.
Input Variables – Can be any in number, Can be
continuous or categorical 14
Supervised Learning: Regression
Height of Weight of Weight (y) as a function of Height (x)
Person Person 160
57 93 150 Output Variable – Continuous.
140
58 110 130
59 111 120 Input Variables – Can be any in number,
60 99 110
100
Continuous or Categorical.
60 122
90
61 115 80
55 60 65 70
61 116
62 110 Weight (y) = -133.764 + 4.095 * Height(x)
62 122
62 134
63 128 Predicting scores in a cricketmatch.
63 123
64
64
117
135
Predicting reaction kinetic parameters in biochemical reactions.
65 129
66 128
Predicting rainfall quantities in a monsoon season.
66 148
67 135 Predicting a target price for a share in the stock market.
68 142 15
69 155
Supervised Learning
16
Few other applications of SL
Face Detection
Signature recognition
Customer discovery
Spam detection
Weather forecasting
Predicting housing prices based on the prevailing market price
Stock price predictions
17
Logistic Regression Analysis
Logistic Regression
Supervised Learning algorithm
Output variable/ Dependent variable: Categorical
Classification algorithm 19
Linear Regression vs Logistic Regression
• Let us try if we can use linear regression to solve a binary class classification problem.
• Assume we have a dataset that is linearly separable and has the output that is categorical - two
classes (0, 1).
• We define a threshold T = 0.5, above which the
output belongs to class 1 and class 0 otherwise.
20
Problems with Linear Regression for Classification
• Case 1: the predicted value for x1 is ≈ 0.2 which
is less than the threshold, so x1 belongs to class 0.
• Case 2: the predicted value for the point x2 is ≈
0.6 which is greater than the threshold, so x2
belongs to class 1.
• Case 3: the predicted value for the point x3 is
beyond 1.
• Case 4: the predicted value for the point x4 is
The predicted values for the points x3,
x4 exceed the range (0,1) which doesn’t below 0.
make sense because the probability And our output can have only two values
values always lie between 0 and 1. either 0 or 1. 21
Problems with Linear Regression for Classification
• Now, introduce an outlier and see what happens.
• The regression line gets deviated (L2) to keep the distance of all the data points to the line to be
minimal wrongly classified points increase in error term.
The two limitations of using a linear
regression model for classification problems
are:
the predicted value may exceed the range
(0,1),
error rate increases if the data has outliers.
Need for Logistic regression
22
Logistic Regression
• The logistic regression equation is quite similar to the linear regression model.
• Consider we have a model with one predictor (or input) “x” and one Bernoulli response variable (or
output) “ŷ” and p is the probability of ŷ. The linear equation can be written as:
p = b0+b1x --------> eq 1
• The right-hand side of the equation (b0+b1x) is a linear equation and can hold values that exceed the
range (0,1). But we know probability will always be in the range of (0,1).
To overcome that, we predict odds instead of probability.
• Odds: The ratio of the probability of an event occurring to the probability of an event not occurring.
• Odds = p/(1-p)
23
Logistic Regression
The equation 1 can be re-written as:
p/(1-p) = b0+b1x --------> eq 2
• Odds can only be a positive value, to tackle the negative numbers, we predict the logarithm of odds.
ln(p/(1-p)) = b0+b1x --------> eq 3
• To recover p from equation 3, we apply exponential on both sides.
exp(ln(p/(1-p))) = exp(b0+b1x)
eln(p/(1-p)) = e(b0+b1x)
24
Logistic Regression
From the inverse rule of logarithms,
p/(1-p) = e(b0+b1x)
Simple algebraic manipulations
p = (1-p) * e(b0+b1x)
p = e(b0+b1x) - p * e(b0+b1x)
p = e(b0+b1x) / (1 + e(b0+b1x))
Dividing numerator and denominator by e(b0+b1x) on the right-hand side
p = 1 / (1 + e-(b0+b1x))
The right side part looks familiar, isn’t it? Yes, it is the sigmoid function. It helps to squeeze the output
to be in the range between 0 and 1. 25
Logistic Regression
• We started with a linear equation and ended up with a logistic regression model with the help of a
sigmoid function.
• Linear model: ŷ = b0+b1x
• Sigmoid function: σ(z) = 1/(1+e−z)
• Logistic regression model: ŷ = σ(b0+b1x) = 1/(1+e-(b0+b1x))
26
Cost function in Logistic Regression
Similarly, the equation for a logistic model with ‘n’ predictors is as below:
p = 1/ (1 + e-(b0+b1x1+b2x2+b3x3+----+bnxn)
• In linear regression, we use the cost function as the mean squared error (MSE), which was the
function of difference between y_predicted and y_actual.
• The graph of the cost function in linear regression is like this:
27
Cost function in Logistic Regression
In logistic regression, the predicted output is a non-linear function (Ŷ=1/1+ e-z).
If we use this in the above MSE equation then it will give a non-convex graph with many local
minima as shown -
• The problem here is that the solution may
highly get struck in local minima, and thus we
may miss out on our global minima and the
error will increase.
• To overcome this problem, a different cost
function is used, namely cross-entropy loss
function.
28
Cost function in Logistic Regression
The cross-entropy loss function is used to measure the performance of a classification model
whose output is a probability value.
Thus, in Logistic regression, the following loss function is used -
which can be minimized using an optimization algorithm such as steepest descent, etc., to get the
predicted values.
29
K-Nearest Neighbors
K-Nearest Neighbor (K-NN)
K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
K-NN algorithm assumes the similarity between the new case/data and available cases
and puts the new case into the category that is most similar to the available categories.
It classifies a new data point based on the similarity – distance metric.
It is used for both regression and classification.
31
K-Nearest Neighbor (K-NN)
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this data point will lie in which of these categories. To solve this type of problem,
we need a K-NN algorithm.
With the help of K-NN, we can easily identify the category or class of a particular dataset.
Consider the below diagram:
32
K-Nearest Neighbor (K-NN) for Classification
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
Step-1: Select the number K of the neighbors.
Step-2: Calculate the Euclidean distance of K number of neighbors (for the new data).
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these K neighbors, count the number of the data points in each
category.
Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
Step-6: Our model is ready.
33
K-Nearest Neighbor (K-NN)
Let us see the example to
make it a better
understanding
(k = 4 not 3)
34
K-Nearest Neighbor (K-NN)
Suppose we have a new data point and we need to put it in the required category. Consider
the below image:
Firstly, we will choose the number of neighbors,
so we will choose the k=5.
Next, we will calculate the Euclidean
distance between the data points. The Euclidean
distance is the distance between two points,
which we have already studied in geometry. It
can be calculated as:
35
K-Nearest Neighbor (K-NN)
By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the below
image:
As we can see the 3 nearest neighbors are
from category A, hence this new data point
must belong to category A.
Disadvantages of KNN Algorithm:
Always needs to determine the value of
K, which may be complex.
The computation cost is high because of
calculating the distance between the
data points for all the training samples.
36
Suggestions on Books & Blogs
Book - Pattern Recognition and Machine Learning – Christopher Bishop.
Blogs – analyticsvidya, towardsdatascience, medium.
37
The END
38