K-Nearest Neighbor On Python Ken Ocuma

The document discusses the k-nearest neighbors (k-NN) algorithm, a non-parametric classification and regression method. It explains that k-NN involves finding the k closest training examples in feature space to a new data point. For classification, a majority vote of the neighbors' classes determines the new point's class, while for regression, the new point takes the average value of its neighbors. The document also provides an example Python code to implement k-NN classification on a sample dataset.

Uploaded by

Aliyha Dionio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

76 views9 pages

K-Nearest Neighbor On Python Ken Ocuma

Uploaded by

Aliyha Dionio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Reported by:Kenn Rolph Ocuma

BSCS 3-A
In pattern recognition, the k-nearest neighbors algorithm (k-
NN) is a non-parametric method used for classification and
regression. In both cases, the input consists of the k closest
training examples in the feature space. The output depends on
whether k-NN is used for classification or regression:
-In k-NN classification, the output is a class membership. An object is classified
by a majority vote of its neighbors, with the object being assigned to the class
most common among its k nearest neighbors (k is a positive integer, typically
small). If k = 1, then the object is simply assigned to the class of that single
nearest neighbor.

-In k-NN regression, the output is the property value for the object. This value
is the average of the values of its k nearest neighbors.
k-NN is a type of instance-based learning, or lazy learning, where the
function is only approximated locally and all computation is deferred
until classification. The k-NN algorithm is among the simplest of all
machine learning algorithms.

Both for classification and regression, a useful technique can be used to

assign weight to the contributions of the neighbors, so that the nearer
neighbors contribute more to the average than the more distant ones.
For example, a common weighting scheme consists in giving each
neighbor a weight of 1/d, where d is the distance to the neighbor.
The neighbors are taken from a set of objects for which the class (for k-
NN classification) or the object property value (for k-NN regression) is
known. This can be thought of as the training set for the algorithm,
though no explicit training step is required.

A peculiarity of the k-NN algorithm is that it is sensitive to the local

structure of the data.The algorithm is not to be confused with k-means,
another popular machine learning technique.

Lets imagine we have a scenario with 2 categories and take into

consideration 2 indipendent variables, and add a new point. Where
should it fall, in the green or red data point area?
To solve this problem we first we need to choose the number K
neighbors (usually 5) according to the euclidian distances. We can recall
from high school the Euclidean distance formula:
To implement K-N in Python we first need to create our classifier
through the sklearn.neighbors library and KNeighbors class, and
create our object classifier and specify the number of neighbors, the
metric we want to implement (in this case the Euclidean distance) and
type ‘minkowski’.
#Data Preprocessing

# Importing the Library

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset

dataset= pd.read_csv('Data.csv')
X = dataset.iloc[: , [2, 3]].values
Y = dataset.iloc[: , 4].values

# Feature Scaling

from sklearn.preprocessing import

StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

# Fitting Classifier to the Training set

from sklearn.neighbors import
KNeighborsClassifier
classifier =
KNeighborsClassifier(n_neighbors = 5,
metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)

# Predicting the Test set results

y_pred = classifier.predict(X_test)

Next we fit our classifier to our training set and create our
confusion matrix. Finally we visualise our results.
# Data Preprocessing

# Importing the Library

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset

dataset= pd.read_csv('Data.csv')
X = dataset.iloc[: , [2, 3]].values
Y = dataset.iloc[: , 4].values

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

# Fitting Classifier to the Training set

from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)

# Predicting the Test set results

y_pred = classifier.predict(X_test)

# Making the Confusion Matrix

fromsklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

# Visualising the Training set results

from matplotlib.colors import ListedColormap
X_set, Y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arrange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() +
1, step = 0.01),
np.arrange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01)
plt.contourf(X1, X2, classifier.predict(np.array([X1.rave(),
X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'Green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X1.min(), X1.max())
for i, j in emunerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1]
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('K-NN (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

# Visualising the Test set results

from matplotlib.colors import ListedColormap
X_set, Y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arrange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() +
1, step = 0.01),
np.arrange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01)
plt.contourf(X1, X2, classifier.predict(np.array([X1.rave(),
X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'Green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X1.min(), X1.max())
for i, j in emunerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1]
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('K-NN (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

Chapter 6 - Advanced Machine Learning PDF
No ratings yet
Chapter 6 - Advanced Machine Learning PDF
37 pages
Machine Learning With SQL
100% (1)
Machine Learning With SQL
12 pages
Intro to Machine Learning Basics
100% (1)
Intro to Machine Learning Basics
52 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
ML Project Guide for Practitioners
No ratings yet
ML Project Guide for Practitioners
7 pages
03 - K Means Clustering On Iris Datasets
No ratings yet
03 - K Means Clustering On Iris Datasets
4 pages
Lab7.ipynb - Colaboratory
100% (1)
Lab7.ipynb - Colaboratory
5 pages
HW1
100% (1)
HW1
8 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
Neural Network Based Rainfall Prediction System
100% (1)
Neural Network Based Rainfall Prediction System
6 pages
Regression Analysis
100% (2)
Regression Analysis
9 pages
KNN for Telecom Customer Segmentation
100% (1)
KNN for Telecom Customer Segmentation
11 pages
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
Advanced Scikit Learn
No ratings yet
Advanced Scikit Learn
98 pages
Tutor
100% (1)
Tutor
309 pages
SAT and GPA Regression Analysis
100% (1)
SAT and GPA Regression Analysis
1 page
Thinkcspy 3
100% (1)
Thinkcspy 3
415 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Machine Learning and Data Analytics Using Python Lab
No ratings yet
Machine Learning and Data Analytics Using Python Lab
36 pages
Scikit Learn
No ratings yet
Scikit Learn
17 pages
Book
100% (1)
Book
480 pages
Python For You and Me: Release 0.3.alpha1
100% (1)
Python For You and Me: Release 0.3.alpha1
143 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Stat1012 Cheatsheet Double-Sided
100% (1)
Stat1012 Cheatsheet Double-Sided
2 pages
Poly
100% (1)
Poly
108 pages
By Ghazwan Khalid Auda
100% (1)
By Ghazwan Khalid Auda
17 pages
Unit-5 Decision Trees and Ensemble Learning
100% (1)
Unit-5 Decision Trees and Ensemble Learning
162 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
Credit Card Fraud Detection Using Machine Learning
100% (1)
Credit Card Fraud Detection Using Machine Learning
82 pages
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
100% (1)
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
151 pages
Prof. Chandan Singhavi
No ratings yet
Prof. Chandan Singhavi
86 pages
ML Unit 2
No ratings yet
ML Unit 2
25 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
21 Feature Importance Methods in ML
100% (1)
21 Feature Importance Methods in ML
41 pages
Lecture Notes - Random Forests PDF
100% (1)
Lecture Notes - Random Forests PDF
4 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Machine Learning Classification Guide
No ratings yet
Machine Learning Classification Guide
7 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
PR01
100% (1)
PR01
41 pages
SVM Guide for Data Science Enthusiasts
100% (1)
SVM Guide for Data Science Enthusiasts
28 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
NYC Taxi Fare Data Cleaning
100% (1)
NYC Taxi Fare Data Cleaning
8 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
The Multilayer Perceptron
No ratings yet
The Multilayer Perceptron
11 pages
Regression Project
100% (1)
Regression Project
60 pages
Decision Trees
No ratings yet
Decision Trees
32 pages
6 XG Boost - Jupyter Notebook
100% (1)
6 XG Boost - Jupyter Notebook
3 pages
ML Lect1
100% (1)
ML Lect1
51 pages
CPE412 Pattern Recognition (Week 8)
100% (1)
CPE412 Pattern Recognition (Week 8)
25 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
ML0101EN Clas Logistic Reg Churn Py v1
100% (1)
ML0101EN Clas Logistic Reg Churn Py v1
13 pages
Import As
100% (1)
Import As
27 pages
KNN Algorithm: Basics and Python Guide
No ratings yet
KNN Algorithm: Basics and Python Guide
17 pages
B-56 Sanket Jambhulkar MLA-7
No ratings yet
B-56 Sanket Jambhulkar MLA-7
9 pages
2.a Soveling LP in Excel (Yao, 2022)
No ratings yet
2.a Soveling LP in Excel (Yao, 2022)
18 pages
Unit 3
No ratings yet
Unit 3
41 pages
DSA Lec 15
No ratings yet
DSA Lec 15
27 pages
Numerical Differentiation - Summary PDF
No ratings yet
Numerical Differentiation - Summary PDF
8 pages
Practical Machine Learning-1
No ratings yet
Practical Machine Learning-1
5 pages
Operations Scheduling Tutorial
No ratings yet
Operations Scheduling Tutorial
4 pages
ELEN90055 Control Systems: Midsemester Test
No ratings yet
ELEN90055 Control Systems: Midsemester Test
2 pages
Soal Binary Search
No ratings yet
Soal Binary Search
15 pages
B市多水源供水系统一级分区和泵组优化调度控制漏失欧谌昊
No ratings yet
B市多水源供水系统一级分区和泵组优化调度控制漏失欧谌昊
109 pages
CSIT Assignment 2
No ratings yet
CSIT Assignment 2
14 pages
Mca 501 Data Warehousing and Mining Jun 2020
No ratings yet
Mca 501 Data Warehousing and Mining Jun 2020
2 pages
DSA Insem
No ratings yet
DSA Insem
2 pages
Algorithms Lab Exam Guide
No ratings yet
Algorithms Lab Exam Guide
3 pages
Introduction To Neural Networks: RWTH Aachen University Chair of Computer Science 6 Prof. Dr.-Ing. Hermann Ney
No ratings yet
Introduction To Neural Networks: RWTH Aachen University Chair of Computer Science 6 Prof. Dr.-Ing. Hermann Ney
31 pages
LMS - Linear Programming (Simplex Method) ACC 421
No ratings yet
LMS - Linear Programming (Simplex Method) ACC 421
90 pages
VGG16 for Image Classification
No ratings yet
VGG16 for Image Classification
15 pages
Cui 2014
No ratings yet
Cui 2014
11 pages
2 1graph
No ratings yet
2 1graph
70 pages
EstimationTheory Lecture 03
No ratings yet
EstimationTheory Lecture 03
21 pages
Product of Two Binomials
No ratings yet
Product of Two Binomials
19 pages
Introduction To Deep Learning: Welcome
No ratings yet
Introduction To Deep Learning: Welcome
17 pages
Node2vec: Scalable Feature Learning For Networks: Aditya Grover Et Al. Presented By: Saim Mehmood Ahmadreza Jeddi
No ratings yet
Node2vec: Scalable Feature Learning For Networks: Aditya Grover Et Al. Presented By: Saim Mehmood Ahmadreza Jeddi
30 pages
Practical 5
No ratings yet
Practical 5
6 pages
Linear and Nonlinear Programming (International Series in Operations Research & Management Science, 228) Luenberger Instant Download
100% (1)
Linear and Nonlinear Programming (International Series in Operations Research & Management Science, 228) Luenberger Instant Download
91 pages
ML MR-22 Model Paper
No ratings yet
ML MR-22 Model Paper
2 pages
AMGMI
No ratings yet
AMGMI
5 pages
Class IX
No ratings yet
Class IX
18 pages
Building Good Training Sets
No ratings yet
Building Good Training Sets
51 pages
Understanding Signal Correlation
No ratings yet
Understanding Signal Correlation
16 pages
2 - Module3-Routh Stability Criterion
No ratings yet
2 - Module3-Routh Stability Criterion
15 pages