KEMBAR78
Lecture-11-KNearest Clustering-Part-1 | PDF | Statistical Classification | Applied Mathematics
0% found this document useful (0 votes)
27 views18 pages

Lecture-11-KNearest Clustering-Part-1

The document provides an overview of the K Nearest Neighbor (KNN) classification algorithm, highlighting its simplicity, non-parametric nature, and lazy learning approach. It explains the classification process, including training, testing, and usage phases, as well as the importance of distance calculations and normalization of features. Additionally, it discusses practical applications of KNN, such as using Voronoi diagrams for optimal location selection.

Uploaded by

pateljil0247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views18 pages

Lecture-11-KNearest Clustering-Part-1

The document provides an overview of the K Nearest Neighbor (KNN) classification algorithm, highlighting its simplicity, non-parametric nature, and lazy learning approach. It explains the classification process, including training, testing, and usage phases, as well as the importance of distance calculations and normalization of features. Additionally, it discusses practical applications of KNN, such as using Voronoi diagrams for optimal location selection.

Uploaded by

pateljil0247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Introduction to Data

Analytics
ITE 5201
Lecture11-K Nearest Neighbor
Classification
Instructor: Parisa Pouladzadeh
Email: parisa.pouladzadeh@humber.ca
www.udemy. /course/python-for-data-science-and-machine-learning.com
Nearest Neighbor Classifiers
➢KNN algorithm is one of the simplest classification
algorithm
➢non-parametric
➢it does not make any assumptions on the underlying data
distribution

➢lazy learning algorithm.


➢there is no explicit training phase or it is very minimal.
➢also means that the training phase is pretty fast .
➢Lack of generalization means that KNN keeps all the training data.

➢Its purpose is to use a database in which the data points


are separated into several classes to predict the
classification of a new sample point.
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Nearest Neighbor Classifiers

Basic idea:
◦ If it walks like a duck, quacks like a duck, then it’s probably a duck

Compute
Distance Test Record

Training
Records 3

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Nearest Neighbor Classifiers

KNN Algorithm is based on feature similarity


How closely out-of-sample features resemble our training set
determines how we classify a given data point

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Basic Idea

➢k-NN classification rule is to assign to a test sample the


majority category label of its k nearest training samples
➢In practice, k is usually chosen to be odd, so as to avoid ties
➢The k = 1 rule is generally called the nearest-neighbor
classification rule

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points that


have the k smallest distance to x 6

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Classification steps

➢Training phase: a model is constructed from the training


instances.
➢ classification algorithm finds relationships between predictors
and targets
➢ relationships are summarised in a model
➢Testing phase: test the model on a test sample whose class
labels are known but not used for training the model
➢Usage phase: use the model for classification on new data
whose class labels are unknown

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


K-Nearest Neighbor
Features
◦ All instances correspond to points in an n-dimensional Euclidean
space
◦ Classification is delayed till a new instance arrives
◦ Classification done by comparing feature vectors of the different
points
◦ Target function may be discrete or real-valued

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


K-Nearest Neighbor

➢An arbitrary instance is represented by


(a1(x), a2(x), a3(x),.., an(x))
➢ai(x) denotes features
➢Euclidean distance between two instances
d(xi, xj)=sqrt (sum for r=1 to n (ar(xi) - ar(xj))2)
➢Continuous valued target function
➢ mean value of the k nearest training examples

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Euclidean Distance
•K-nearest neighbours uses the local neighborhood to obtain a
prediction
•The K memorized examples more similar to the one that is being
classified are retrieved
•A distance function is needed to compare the examples
similarity
•This means that if we change the distance function, we change
how examples are classified

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Normalization
If the ranges of the features differ, feaures with bigger values
will dominate decision
In general feature values are normalized prior to distance
calculation

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Voronoi diagram
•We frequently need to find the nearest hospital, surgery or
supermarket.
•A map divided into cells, each cell covering the region closest to a
particular centre, can assist us in our quest.

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Voronoi diagram
Another practical problem is to choose a location for a new service,
such as a school, which is as far as possible from existing schools while
still serving the maximum number of families.
A Voronoi diagram can be used to find the largest empty circle amid a
collection of points, giving the ideal location for the new school. Of
course, numerous parameters other than distance must be considered,
but access time is often the critical factor.

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Numerical Example
Steps:
1. Determine parameter K = number of nearest neighbors
2. Calculate the distance between the query-instance and all the training
samples
3. Sort the distance and determine nearest neighbors based on the K-th
minimum distance
4. Gather the category of the nearest neighbors
5. Use simple majority of the category of nearest neighbors as the
prediction value of the query instance

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Example

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Voronoi diagram

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Copyright © 2018 Pearson Education, Inc. All Rights Reserved.

You might also like