KEMBAR78
KNN Algorithms - 5082025 | PDF | Statistical Classification | Learning
0% found this document useful (0 votes)
8 views14 pages

KNN Algorithms - 5082025

The document outlines various classification algorithms, focusing on the K-Nearest Neighbors (KNN) algorithm, which is a nonparametric supervised learning model used for both classification and regression. It details the implementation steps, advantages, disadvantages, applications, and common distance metrics used in KNN. Additionally, it provides examples to illustrate how KNN can be applied to datasets for classification tasks.

Uploaded by

priya0212shukla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views14 pages

KNN Algorithms - 5082025

The document outlines various classification algorithms, focusing on the K-Nearest Neighbors (KNN) algorithm, which is a nonparametric supervised learning model used for both classification and regression. It details the implementation steps, advantages, disadvantages, applications, and common distance metrics used in KNN. Additionally, it provides examples to illustrate how KNN can be applied to datasets for classification tasks.

Uploaded by

priya0212shukla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Types of Classification Algorithms

• Logistic regression
• K-Nearest Neighbours
• Support Vector Machine
• Naive Bayes
• Decision Tree Classification
• Random Forest
Example of KNN
K-Nearest Neighbors (KNN) Algorithm for Classification

➢ KNN is nonparametric Supervised machine learning model.


➢ It is a Non-linear model. It can capture more complex relationships
between input features and target variable.
➢ Used for classification as well as Regression.
➢ It is called as a lazy learner algorithm
➢ Also called Instant based learning algorithm
➢ K indicates the number of neighbour which has to be considered for
prediction.
Cont----

➢For Classification, It finds the "k" data points that are the closest to a
given input. Then,it makes a predicts based on the majority class.
➢Selection of k is very important.
▪Low value of k - Overfitting
▪ high value of k -underfitting.
▪Generally odd number is considered for k.
➢Calculate the distance between the new data point which is to be
predicted and all the samples of the training data set. Distance can be
calculate using Euclidean distance formula.
Steps to implement KNN algorithm

1. Choose a value for 'k', the number of nearest neighbors.


2. For a new data point, calculate the distance to all other data points
in the training set.
3. Sort the distances in ascending order and select the top 'k' data
points.
4. For classification, the new data point is assigned to the class that
appears most frequently among the 'k' neighbors. For regression, the
value is the average of the 'k' nearest neighbors' values.
Advantages:
• It does not require any training time, making it ideal for real-time predictions.
• Easy to understand and implement.
• Can handle multi-class classification.
• Useful when data has no clear distribution.
• It is a simple yet effective algorithm that can be used for both classification and
regression tasks.
Disadvantages:
• Computationally expensive, especially with large datasets.
• Sensitive to noisy features and Outliers.
• Performance can be biased with imbalanced datasets.
• Requires careful selection of 'k’.
• If number of features increases then it gives poor performance.
Applications

• KNN has importance across industries such as healthcare, finance,


and marketing.
• In healthcare, it can be used to predict the risk of disease based on
patient data.
• In finance, it can be used to predict stock prices based on historical
data.
• In marketing, it can be used to predict customer behavior based on
their past interactions with a company.
Choice of distance metric in KNN algorithm.

Here are the most common ones:


• Euclidean Distance: This is the most widely used metric and represents the
straight-line distance between two points in a multidimensional space. It is the
default in many libraries like scikit-learn.
• Manhattan Distance: Also known as "taxicab" or "city block" distance, it
measures the distance by summing the absolute differences of the coordinates. It
is useful when the data represents movement on a grid and is less sensitive to
outliers.
• Minkowski Distance: This is a generalized form of both Euclidean and Manhattan
distances. By changing a parameter "p", you can get different distance metrics.
When p=1, it is Manhattan distance, and when p=2, it is Euclidean distance.
• Hammingh distance : It is distance between binary vector.
Formulaes

• Euclean distance between two points in a two dimensional space


(x1, Y1 ) and (x2, Y2 ) is given by formula

𝑑(𝑝, 𝑞) = 𝑥2 − 𝑥1 2 + 𝑦2 − 𝑦1 2

• Manhattan distance
𝑑(𝑝, 𝑞) = 𝑥2 − 𝑥1 - 𝑦2 − 𝑦1
Example 1 :A small dataset with two features, Feature 1 and Feature 2, and a class
label, which can be either Class A or Class B. Consider a new data point whose
P_new with coordinates (6, 6). Determine whether it belongs to Class A or Class B.

Data feature 1 feature 2 class


points

d1 2 4 A
d2 4 6 A

d3 6 2 B

d4 8 7 B

d5 5 9 A

d6 9 3 B
A small dataset with two features, Feature 1 and Feature 2, and a class label, which
can be either Class A or Class B. Consider a new data point whose P_new with
coordinates (6, 6). Determine whether it belongs to Class A or Class B. k=3

Data feature 1 feature 2 Distance class


points

d1 2 4 4.47 A

Answer: d2 4 6 2 A

d3 6 2 4 B

d4 8 7 2.24 B

d5 5 9 3.16 A

d6 9 3 B
Example 2: The table below represents data set which have two columns
Brightness and Saturation. Calculate Class if brightness is 20 and
Saturation is 35. Consider k=5 and k=3
Solution

Answer : Red for k


=3
example 2

• Given training data use KNN regreesionmodel to predict the target


value for a new data sample with x=7 and consider k=2.
• X =[ 2, 5, 8, 11, 14]
• y = target value=[ 5,8,12,16,22]

You might also like