0% found this document useful (0 votes)

13 views22 pages

New Data Science Module Nearest Neighbors

The document provides an overview of the k-Nearest Neighbors (kNN) classification method, explaining its general concept of classifying points based on the majority class of their neighbors. It includes examples of how to assign labels to data points, the importance of choosing an appropriate value for k, and the necessity of scaling features for accurate distance calculations. Additionally, it demonstrates the implementation of kNN in Python using the scikit-learn library with practical examples and visualizations.

Uploaded by

akul joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views22 pages

New Data Science Module Nearest Neighbors

Uploaded by

akul joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

BU MET CS-677: Data Science With Python, v.2.

0 kNN - Nearest Neighbors Classification

NEAREST

NEIGHBORS

CLASSIFICATION

Page 1
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

General Idea

• points in the same class are

ususally ”neighbors”
• assign class based on
majority of neighbors
• need distance

• need to choose k - number of

neighbors
• note: k must be odd for
simple majority

Page 2
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Example of kNN

12
10
8
6

4
Y 2
0 $ %
2
4
6
8
6 3 0 3 6 9 12 15 18
X

• what labels for A and B ?

Page 3
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Assigning a Label for A

Y
$
N
N
N

point k neighbors majority

1 x1 green
A 3 x1, x2, x3 red
5 x1, x2, x3, x4, x5 green

Page 4
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Assigning a Label for B

Y
%
N
N
N

point k neighbors majority

1 x2 red
B 3 x2, x3, x5 red
5 x1, x2, x3, x4, x5 green

Page 5
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

How to Choose k
12
10
8
6

4
Y
2
0 $ %
2
4
6
8
6 3 0 3 6 9 12 15 18
X

point k neighbors majority

1 x1 green
A 3 x1, x2, x3 red
5 x1, x2, x3, x4, x5 green
1 x2 red
B 3 x2, x3, x5 red
5 x1, x2, x3, x4, x5 green

Page 6
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Illustration in Python
import numpy as np
import pandas as pd
from sklearn . neighbors import \
KNeighborsClassifier

data = pd . DataFrame (
{ " id " : [ 1 ,2 ,3 ,4 ,5 ,6] ,
" Label " : [ " green " ," red " ," red " ,
" green " ," green " ," red " ] ,
" X " : [1 , 6 , 7 , 10 , 10 , 15] ,
" Y " : [2 , 4 , 5 , -1 , 2 , 2 ]} ,
columns = [ " id " , " Label " , " X " ," Y " ]}
X = data [[ " X " ," Y " ]]. values
Y = data [[ " Label " ]]. values
knn_classifier = KNeighborsClassifier (
n_neighbors =3)
knn_classifier . fit (X , Y )
new_instance = np . asmatrix ([3 , 2])
prediction = knn_classifier . predict (
new_instance )

ipdb> prediction[0]
red

Page 7
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

A Numerical Example

object Height Weight Foot Label

xi (H) (W) (F) (L)
x1 5.00 100 6 green
x2 5.50 150 8 green
x3 5.33 130 7 green
x4 5.75 150 9 green
x5 6.00 180 13 red
x6 5.92 190 11 red
x7 5.58 170 12 red
x8 5.92 165 10 red

• note different scales

Page 8
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

What is the Label?

14
12

10 Foot

8
6
200
180
4.8 5.0 160
5.2 5.4 140 ight
120 We
Height5.6 5.8 6.0 100

(H=6, W=160, F=10) 7→ ?

Page 9
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

kNN in Python
import pandas as pd
data = pd . DataFrame (
{ " id " :[ 1 ,2 ,3 ,4 ,5 ,6 ,7 ,8] ,
" Label " :[ " green " ," green " ," green " ," green " ,
" red " ," red " ," red " ," red " ] ,
" Height " :[5 ,5.5 ,5.33 ,5.75 ,6.00 ,5.92 ,5.58 ,5.92] ,
" Weight " :[100 ,150 ,130 ,150 ,180 ,190 ,170 ,165] ,
" Foot " :[6 , 8 , 7 , 9 , 13 , 11 , 12 , 10]} ,
columns =[ " id " ," Height " ," Weight " ,
" Foot " ," Label " ])

X = data [[ " Height " ," Weight " ," Foot " ]]. values
Y = data [[ " Label " ]]. values

scaler = StandardScaler (). fit ( X )

X = scaler . transform ( X )

knn_classifier = KNeighborsClassifier ( n_neighbors =3)

knn_classifier . fit (X , Y )

new_instance = np . asmatrix ([6 , 160 , 10])

new_instance_scaled = scaler . transform ( new_instance )
prediction = knn_classifier . predict ( new_instance_scaled )

ipdb> prediction[0]
’red’
Page 10
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Result Without Scaling

import pandas as pd
data = pd . DataFrame (
{ " id " :[ 1 ,2 ,3 ,4 ,5 ,6 ,7 ,8] ,
" Label " :[ " green " ," green " ," green " ," green " ,
" red " ," red " ," red " ," red " ] ,
" Height " :[5 ,5.5 ,5.33 ,5.75 ,6.00 ,5.92 ,5.58 ,5.92] ,
" Weight " :[100 ,150 ,130 ,150 ,180 ,190 ,170 ,165] ,
" Foot " :[6 , 8 , 7 , 9 , 13 , 11 , 12 , 10]} ,
columns =[ " id " ," Height " ," Weight " ,
" Foot " ," Label " ])

X = data [[ " Height " ," Weight " ," Foot " ]]. values
Y = data [[ " Label " ]]. values

knn_classifier = KNeighborsClassifier ( n_neighbors =3)

knn_classifier . fit (X , Y )

new_instance = np . asmatrix ([6 , 160 , 10])

prediction = knn_classifier . predict ( new_instance )

ipdb> prediction[0]
’red’

Page 11
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Why Scaling?

6

8

Foot 10

12

14
4.8 t
5.0
5.2
5.4
5.6
5.8
200 180 160 140 120 100 6.0 h
Weight Heig

• (euclidean) distances d(·)

dominated by one dimension

Page 12
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Effect of Scaling

2.0
1.5
1.0
0.5
Foot 0.0

0.5
1.0
1.5
2.0 1.52.0
1.0
2.0 1.5 1.0 0.5 0.00.5 ht
0.5
0.0 0.5 1.0 1.5 1.0
1.5
2.0 H eig
Weight 2.0

• without scaling: d(x7, x8) < d(x4, x8)

• with scaling: d(x7, x8) > d(x4, x8)

Page 13
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Calculating k
import pandas as pd
from sklearn . preprocessing import StandardScaler
from sklearn . neighbors import KNeighborsClassifier
from sklearn . model_selection import train_test_split

X = data [[ " Height " ," Weight " ," Foot " ]]. values
Y = data [[ " Label " ]]. values

scaler = StandardScaler (). fit ( X )

X = scaler . transform ( X )
X_train , X_test , Y_train , Y_test = train_test_split (X ,Y ,
test_size =0.5 , random_state =0)
error_rate = []
for k in [1 ,3]:
knn_classifier = KNeighborsClassifier ( n_neighbors = k )
knn_classifier . fit ( X_train , Y_train )
pred_k = knn_classifier . predict ( X_test )
error_rate . append ( np . mean ( pred_k != Y_test ))

ipdb> error_rate
[0.5, 0.5]
Page 14
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Calculating k for IRIS

import numpy as np
import pandas as pd
import matplotlib . pyplot as plt
from sklearn . preprocessing import StandardScaler , LabelEncoder
from sklearn . neighbors import KNeighborsClassifier
from sklearn . model_selection import train_test_split

url = r ’ https :// archive . ics . uci . edu / ml / ’ + \

r ’ machine - learning - databases / iris / iris . data ’

iris_feature_names = [ ’ sepal - length ’ , ’ sepal - width ’ ,

’ petal - length ’ , ’ petal - width ’]

data = pd . read_csv ( url , names =[ ’ sepal - length ’ , ’ sepal - width ’ ,

’ petal - length ’ , ’ petal - width ’ , ’ Class ’ ])

class_labels = [ ’ Iris - versicolor ’ , ’ Iris - virginica ’]

data = data [ data [ ’ Class ’ ]. isin ( class_labels )]

X = data [ iris_feature_names ]. values

scaler = StandardScaler ()
scaler . fit ( X )
X = scaler . transform ( X )

le = LabelEncoder ()
Y = le . fit_transform ( data [ ’ Class ’ ]. values )
X_train , X_test , Y_train , Y_test = train_test_split (X ,Y , test_size =0.5 ,
random_state =3)

Page 15
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Calculating k for IRIS

(cont’d)

error_rate = []
for k in range (1 ,21 ,2):
knn_classifier = KNeighborsClassifier ( n_neighbors = k )
knn_classifier . fit ( X_train , Y_train )
pred_k = knn_classifier . predict ( X_test )
error_rate . append ( np . mean ( pred_k != Y_test ))

figure ( figsize =(10 ,4))

ax = plt . gca ()
ax . xaxis . set_major_locator ( MaxNLocator ( integer = True ))
plt . plot ( range (1 ,21 ,2) , error_rate , color = ’ red ’ , linestyle = ’ dashed ’ ,
marker = ’o ’ , markerfacecolor = ’ black ’ , markersize =10)
plt . title ( ’ Error Rate vs . k for Iris Subset ’)
plt . xlabel ( ’ number of neighbors : k ’)
plt . ylabel ( ’ Error Rate ’)

Page 16
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Calculating k for IRIS

Error Rate vs. k: Iris-versicolor and Iris-virginica
0.060

0.055

0.050

0.045
Error Rate

0.040

0.035

0.030

0.025

0.020
2 4 6 8 10 12 14 16 18
number of neighbors: k

Page 17
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

k for IRIS
Iris-setosa
Iris-versicolor
Iris-virginica7
6

petal-length
5
4
3
2
1
4.5
4.0
3.5 h
4.5 5.0
5.5 3.0 l-widt
sepal-6.0 2.5 sepa
leng6.5
th 7.0 7.5 8.0 2.0

Error Rate vs. k: Iris-setosa and Iris-virginica

0.04

0.02
Error Rate

0.00

0.02

0.04

2 4 6 8 10 12 14 16 18
number of neighbors: k

Page 18
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

A Categorical Dataset

Day Weather Temperature Wind Play

1 sunny hot low no
2 rainy mild high yes
3 sunny cold low yes
4 rainy cold high no
5 sunny cold high yes
6 overcast mild low yes
7 sunny hot low yes
8 overcast hot high yes
9 rainy hot high no
10 rainy mild low yes

• what label for x* = (sunny,

cold, low)?

Page 19
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Python Code
import pandas as pd
import numpy as np
from sklearn . neighbors import KNeighborsClassifier
from sklearn . preprocessing import LabelEncoder

data = pd . DataFrame (
{ ’ Day ’: [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10] ,
’ Weather ’ :[ ’ sunny ’ , ’ rainy ’ , ’ sunny ’ , ’ rainy ’ , ’ sunny ’ , ’ overcast ’ ,
’ sunny ’ , ’ overcast ’ , ’ rainy ’ , ’ rainy ’] ,
’ Temperature ’: [ ’ hot ’ , ’ mild ’ , ’ cold ’ , ’ cold ’ , ’ cold ’ , ’ mild ’ ,
’ hot ’ , ’ hot ’ , ’ hot ’ , ’ mild ’] ,
’ Wind ’: [ ’ low ’ , ’ high ’ , ’ low ’ , ’ high ’ , ’ high ’ , ’ low ’ , ’ low ’ ,
’ high ’ , ’ high ’ , ’ low ’] ,
’ Play ’: [ ’ no ’ , ’ yes ’ , ’ yes ’ , ’ no ’ , ’ yes ’ , ’ yes ’ , ’ yes ’ ,
’ yes ’ , ’ no ’ , ’ yes ’]} ,
columns = [ ’ Day ’ , ’ Weather ’ , ’ Temperature ’ , ’ Wind ’ , ’ Play ’]
)
input_data = data [[ ’ Weather ’ , ’ Temperature ’ , ’ Wind ’ ]]
dummies = [ pd . get_dummies ( data [ c ]) for c in input_data . columns ]
binary_data = pd . concat ( dummies , axis =1)

X = binary_data [0:10]. values

le = LabelEncoder ()
Y = le . fit_transform ( data [ ’ Play ’ ]. values )

knn_classifier = KNeighborsClassifier ( n_neighbors =3)

knn_classifier . fit (X , Y )
new_instance = np . asmatrix ([0 ,0 ,1 ,1 ,0 ,0 ,0 ,1])
prediction = knn_classifier . predict ( new_instance )

ipdb> prediction
1
Page 20
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

kNN: IRIS
import pandas as pd
import numpy as np
from sklearn . preprocessing import StandardScaler , LabelEncoder
from sklearn . neighbors import KNeighborsClassifier
from sklearn . model_selection import train_test_split

url = r ’ https :// archive . ics . uci . edu / ml / ’ + \

r ’ machine - learning - databases / iris / iris . data ’

iris_feature_names = [ ’ sepal - length ’ , ’ sepal - width ’ ,

’ petal - length ’ , ’ petal - width ’]
data = pd . read_csv ( url , names =[ ’ sepal - length ’ , ’ sepal - width ’ ,
’ petal - length ’ , ’ petal - width ’ , ’ Class ’ ])
class_labels = [ ’ Iris - versicolor ’ , ’ Iris - virginica ’]
data = data [ data [ ’ Class ’ ]. isin ( class_labels )]

X = data [ iris_feature_names ]. values

scaler = StandardScaler ()
scaler . fit ( X )
X = scaler . transform ( X )
le = LabelEncoder ()
Y = le . fit_transform ( data [ ’ Class ’ ]. values )

X_train , X_test , Y_train , Y_test = train_test_split (X ,Y ,

test_size =0.5 , random_state =3)
knn_classifier = KNeighborsClassifier ( n_neighbors =15)
knn_classifier . fit ( X_train , Y_train )
prediction = knn_classifier . predict ( X_test )
error_rate = np . mean ( prediction != Y_test )

ipdb> error_rate
0.06

Page 21
BU MET CS-677: Data Science With Python, v.2.0 kNN - Nearest Neighbors Classification

Concepts Check:

(a) distances and neighbors

(b) nearest neigbor intuition
(c) need for scaling
(d) how to choose k
(e) analyzing categorical data

Page 22

It - S All About Neighbors - Completed
No ratings yet
It - S All About Neighbors - Completed
14 pages
Updated K-Nearest Neighbors in Machine Learning
No ratings yet
Updated K-Nearest Neighbors in Machine Learning
11 pages
Lab7.ipynb - Colaboratory
100% (1)
Lab7.ipynb - Colaboratory
5 pages
KNN Colab Illustration
No ratings yet
KNN Colab Illustration
5 pages
Machine Learning - K-Nearest Neighbors (KNN)
No ratings yet
Machine Learning - K-Nearest Neighbors (KNN)
3 pages
K-NN Algorithm in Machine Learning
No ratings yet
K-NN Algorithm in Machine Learning
11 pages
Rahul Raj - Ipynb - Colab
No ratings yet
Rahul Raj - Ipynb - Colab
50 pages
KNN Cookbook
No ratings yet
KNN Cookbook
8 pages
MLT Lab 09
No ratings yet
MLT Lab 09
3 pages
KNN Classifier
No ratings yet
KNN Classifier
5 pages
KNN Lab
No ratings yet
KNN Lab
4 pages
KNN Classification Lab Guide
No ratings yet
KNN Classification Lab Guide
4 pages
KNN Algorithm Guide with Python
No ratings yet
KNN Algorithm Guide with Python
15 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
KNN Activity
No ratings yet
KNN Activity
4 pages
K Nearest Neighbour's (KNN) (1) Using R
No ratings yet
K Nearest Neighbour's (KNN) (1) Using R
9 pages
ML Lab2 PGM
No ratings yet
ML Lab2 PGM
3 pages
T2 KNN
No ratings yet
T2 KNN
16 pages
Week 4 Classification KNN
No ratings yet
Week 4 Classification KNN
21 pages
Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook
No ratings yet
Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook
6 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
Intro to KNN for Data Science
No ratings yet
Intro to KNN for Data Science
37 pages
KNN Algorithm Guide with Python
No ratings yet
KNN Algorithm Guide with Python
13 pages
2 KNN
No ratings yet
2 KNN
67 pages
ML 3
No ratings yet
ML 3
6 pages
K-NN for Business Analytics
No ratings yet
K-NN for Business Analytics
21 pages
AML Lab No.04
No ratings yet
AML Lab No.04
7 pages
KNN & Decision Tree Basics
No ratings yet
KNN & Decision Tree Basics
9 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
KNN in Python for Data Scientists
No ratings yet
KNN in Python for Data Scientists
7 pages
K-Nearest Neighbors Guide: Python & R
No ratings yet
K-Nearest Neighbors Guide: Python & R
1 page
Unit 2
No ratings yet
Unit 2
30 pages
ML Lab Exp-3
No ratings yet
ML Lab Exp-3
5 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
KNN Assignment Report
No ratings yet
KNN Assignment Report
3 pages
Week 07
No ratings yet
Week 07
24 pages
KNN Datacamp
No ratings yet
KNN Datacamp
31 pages
KNN - Predictive Analysis
No ratings yet
KNN - Predictive Analysis
6 pages
K-Nearest Neighbor (KNN) Algorithm: Last Updated: 14 May, 2025
No ratings yet
K-Nearest Neighbor (KNN) Algorithm: Last Updated: 14 May, 2025
14 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
B-56 Sanket Jambhulkar MLA-7
No ratings yet
B-56 Sanket Jambhulkar MLA-7
9 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
2 pages
K-Nearest Neighbors: KNN Algorithm Pseudocode
No ratings yet
K-Nearest Neighbors: KNN Algorithm Pseudocode
2 pages
KNN Classifier with 6 Neighbors
No ratings yet
KNN Classifier with 6 Neighbors
4 pages
K-NN Algorithm: Need To Create Two Files File 1: KNN - Py Second File: Expt3.py
No ratings yet
K-NN Algorithm: Need To Create Two Files File 1: KNN - Py Second File: Expt3.py
4 pages
Assignment No 2 AI
No ratings yet
Assignment No 2 AI
4 pages
Here's An Visualization of The K-Nearest Neighbors Algorithm
No ratings yet
Here's An Visualization of The K-Nearest Neighbors Algorithm
5 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
Risss ML Record 6
No ratings yet
Risss ML Record 6
6 pages
Module 3 Lab 2
No ratings yet
Module 3 Lab 2
6 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
CSL0777 L22
No ratings yet
CSL0777 L22
35 pages
Intro to k-Nearest Neighbor Algorithm
No ratings yet
Intro to k-Nearest Neighbor Algorithm
3 pages
K-Nearest Neighbors Clearly Explained
No ratings yet
K-Nearest Neighbors Clearly Explained
11 pages
Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications
No ratings yet
Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications
55 pages
Sample KNN
No ratings yet
Sample KNN
7 pages
K-Nearest Neighbors Classifier: Import Import As Import As From Import From Import From Import Import As
No ratings yet
K-Nearest Neighbors Classifier: Import Import As Import As From Import From Import From Import Import As
6 pages
Lecture 4 KNN
No ratings yet
Lecture 4 KNN
17 pages
Imbalanced Data: How To Handle Imbalanced Classification Problems
No ratings yet
Imbalanced Data: How To Handle Imbalanced Classification Problems
17 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
No ratings yet
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
63 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
70 pages
Machine Learning for IT Students
No ratings yet
Machine Learning for IT Students
13 pages
Data Mining Unit2 3
No ratings yet
Data Mining Unit2 3
167 pages
Be Winter 2022
No ratings yet
Be Winter 2022
2 pages
Q Bank2
No ratings yet
Q Bank2
4 pages
Import Pandas As PD
No ratings yet
Import Pandas As PD
2 pages
Michael Chan
No ratings yet
Michael Chan
6 pages
An Efficient Incremental Clustering Algorithm
No ratings yet
An Efficient Incremental Clustering Algorithm
3 pages
Logistic Regression Overview
No ratings yet
Logistic Regression Overview
11 pages
Clustering - Jupyter Notebook
100% (1)
Clustering - Jupyter Notebook
11 pages
Practical Machine Learning for Cybersecurity
No ratings yet
Practical Machine Learning for Cybersecurity
33 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
24 pages
ML SPPU May Jun 2023
No ratings yet
ML SPPU May Jun 2023
2 pages
Unsupervised Learning: Clustering Algorithms
No ratings yet
Unsupervised Learning: Clustering Algorithms
13 pages
QB Pec-Cs701e
No ratings yet
QB Pec-Cs701e
12 pages
WEKA: Classification: Instructor: Amany Al Luhaybi
No ratings yet
WEKA: Classification: Instructor: Amany Al Luhaybi
8 pages
ML Question Bank
No ratings yet
ML Question Bank
4 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
Fraud Detection with Machine Learning
No ratings yet
Fraud Detection with Machine Learning
34 pages
F1 Score vs AUC: Key Differences
No ratings yet
F1 Score vs AUC: Key Differences
2 pages
LLM ML Interview Q
No ratings yet
LLM ML Interview Q
43 pages
Decision Trees and Random Forest
No ratings yet
Decision Trees and Random Forest
79 pages
PA Research Papers
No ratings yet
PA Research Papers
5 pages
Matlab Iris RBF
No ratings yet
Matlab Iris RBF
21 pages
A Comparison of K-Means Clustering Algorithm and C
No ratings yet
A Comparison of K-Means Clustering Algorithm and C
4 pages
Heart Disease ML Classification
No ratings yet
Heart Disease ML Classification
10 pages
EM vs K-Means Clustering Comparison
No ratings yet
EM vs K-Means Clustering Comparison
3 pages

New Data Science Module Nearest Neighbors

Uploaded by

New Data Science Module Nearest Neighbors

Uploaded by

BU MET CS-677: Data Science With Python, v.2.

0 kNN - Nearest Neighbors Classification

• points in the same class are

• need to choose k - number of

• what labels for A and B ?

Assigning a Label for A

point k neighbors majority

Assigning a Label for B

point k neighbors majority

point k neighbors majority

object Height Weight Foot Label

• note different scales

What is the Label?

(H=6, W=160, F=10) 7→ ?

scaler = StandardScaler (). fit ( X )

knn_classifier = KNeighborsClassifier ( n_neighbors =3)

new_instance = np . asmatrix ([6 , 160 , 10])

Result Without Scaling

knn_classifier = KNeighborsClassifier ( n_neighbors =3)

new_instance = np . asmatrix ([6 , 160 , 10])

• (euclidean) distances d(·)

• without scaling: d(x7, x8) < d(x4, x8)

• with scaling: d(x7, x8) > d(x4, x8)

scaler = StandardScaler (). fit ( X )

Calculating k for IRIS

url = r ’ https :// archive . ics . uci . edu / ml / ’ + \

iris_feature_names = [ ’ sepal - length ’ , ’ sepal - width ’ ,

data = pd . read_csv ( url , names =[ ’ sepal - length ’ , ’ sepal - width ’ ,

class_labels = [ ’ Iris - versicolor ’ , ’ Iris - virginica ’]

X = data [ iris_feature_names ]. values

Calculating k for IRIS

figure ( figsize =(10 ,4))

Calculating k for IRIS

Error Rate vs. k: Iris-setosa and Iris-virginica

Day Weather Temperature Wind Play

• what label for x* = (sunny,

X = binary_data [0:10]. values

knn_classifier = KNeighborsClassifier ( n_neighbors =3)

url = r ’ https :// archive . ics . uci . edu / ml / ’ + \

iris_feature_names = [ ’ sepal - length ’ , ’ sepal - width ’ ,

X = data [ iris_feature_names ]. values

X_train , X_test , Y_train , Y_test = train_test_split (X ,Y ,

(a) distances and neighbors

You might also like