Introduction to Data
Analytics
ITE 5201
Lecture11-K Nearest Neighbor
Classification
Instructor: Parisa Pouladzadeh
Email: parisa.pouladzadeh@humber.ca
www.udemy. /course/python-for-data-science-and-machine-learning.com
Nearest Neighbor Classifiers
➢KNN algorithm is one of the simplest classification
algorithm
➢non-parametric
➢it does not make any assumptions on the underlying data
distribution
➢lazy learning algorithm.
➢there is no explicit training phase or it is very minimal.
➢also means that the training phase is pretty fast .
➢Lack of generalization means that KNN keeps all the training data.
➢Its purpose is to use a database in which the data points
are separated into several classes to predict the
classification of a new sample point.
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Nearest Neighbor Classifiers
Basic idea:
◦ If it walks like a duck, quacks like a duck, then it’s probably a duck
Compute
Distance Test Record
Training
Records 3
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Nearest Neighbor Classifiers
KNN Algorithm is based on feature similarity
How closely out-of-sample features resemble our training set
determines how we classify a given data point
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Basic Idea
➢k-NN classification rule is to assign to a test sample the
majority category label of its k nearest training samples
➢In practice, k is usually chosen to be odd, so as to avoid ties
➢The k = 1 rule is generally called the nearest-neighbor
classification rule
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Definition of Nearest Neighbor
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a record x are data points that
have the k smallest distance to x 6
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Classification steps
➢Training phase: a model is constructed from the training
instances.
➢ classification algorithm finds relationships between predictors
and targets
➢ relationships are summarised in a model
➢Testing phase: test the model on a test sample whose class
labels are known but not used for training the model
➢Usage phase: use the model for classification on new data
whose class labels are unknown
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
K-Nearest Neighbor
Features
◦ All instances correspond to points in an n-dimensional Euclidean
space
◦ Classification is delayed till a new instance arrives
◦ Classification done by comparing feature vectors of the different
points
◦ Target function may be discrete or real-valued
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
K-Nearest Neighbor
➢An arbitrary instance is represented by
(a1(x), a2(x), a3(x),.., an(x))
➢ai(x) denotes features
➢Euclidean distance between two instances
d(xi, xj)=sqrt (sum for r=1 to n (ar(xi) - ar(xj))2)
➢Continuous valued target function
➢ mean value of the k nearest training examples
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Euclidean Distance
•K-nearest neighbours uses the local neighborhood to obtain a
prediction
•The K memorized examples more similar to the one that is being
classified are retrieved
•A distance function is needed to compare the examples
similarity
•This means that if we change the distance function, we change
how examples are classified
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Normalization
If the ranges of the features differ, feaures with bigger values
will dominate decision
In general feature values are normalized prior to distance
calculation
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Voronoi diagram
•We frequently need to find the nearest hospital, surgery or
supermarket.
•A map divided into cells, each cell covering the region closest to a
particular centre, can assist us in our quest.
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Voronoi diagram
Another practical problem is to choose a location for a new service,
such as a school, which is as far as possible from existing schools while
still serving the maximum number of families.
A Voronoi diagram can be used to find the largest empty circle amid a
collection of points, giving the ideal location for the new school. Of
course, numerous parameters other than distance must be considered,
but access time is often the critical factor.
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Numerical Example
Steps:
1. Determine parameter K = number of nearest neighbors
2. Calculate the distance between the query-instance and all the training
samples
3. Sort the distance and determine nearest neighbors based on the K-th
minimum distance
4. Gather the category of the nearest neighbors
5. Use simple majority of the category of nearest neighbors as the
prediction value of the query instance
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Example
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Voronoi diagram
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.