Reported by:Kenn Rolph Ocuma
BSCS 3-A
In pattern recognition, the k-nearest neighbors algorithm (k-
NN) is a non-parametric method used for classification and
regression. In both cases, the input consists of the k closest
training examples in the feature space. The output depends on
whether k-NN is used for classification or regression:
-In k-NN classification, the output is a class membership. An object is classified
by a majority vote of its neighbors, with the object being assigned to the class
most common among its k nearest neighbors (k is a positive integer, typically
small). If k = 1, then the object is simply assigned to the class of that single
nearest neighbor.
-In k-NN regression, the output is the property value for the object. This value
is the average of the values of its k nearest neighbors.
k-NN is a type of instance-based learning, or lazy learning, where the
function is only approximated locally and all computation is deferred
until classification. The k-NN algorithm is among the simplest of all
machine learning algorithms.
Both for classification and regression, a useful technique can be used to
assign weight to the contributions of the neighbors, so that the nearer
neighbors contribute more to the average than the more distant ones.
For example, a common weighting scheme consists in giving each
neighbor a weight of 1/d, where d is the distance to the neighbor.
The neighbors are taken from a set of objects for which the class (for k-
NN classification) or the object property value (for k-NN regression) is
known. This can be thought of as the training set for the algorithm,
though no explicit training step is required.
A peculiarity of the k-NN algorithm is that it is sensitive to the local
structure of the data.The algorithm is not to be confused with k-means,
another popular machine learning technique.
Lets imagine we have a scenario with 2 categories and take into
consideration 2 indipendent variables, and add a new point. Where
should it fall, in the green or red data point area?
To solve this problem we first we need to choose the number K
neighbors (usually 5) according to the euclidian distances. We can recall
from high school the Euclidean distance formula:
To implement K-N in Python we first need to create our classifier
through the sklearn.neighbors library and KNeighbors class, and
create our object classifier and specify the number of neighbors, the
metric we want to implement (in this case the Euclidean distance) and
type ‘minkowski’.
#Data Preprocessing
# Importing the Library
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset= pd.read_csv('Data.csv')
X = dataset.iloc[: , [2, 3]].values
Y = dataset.iloc[: , 4].values
# Feature Scaling
from sklearn.preprocessing import
StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
# Fitting Classifier to the Training set
from sklearn.neighbors import
KNeighborsClassifier
classifier =
KNeighborsClassifier(n_neighbors = 5,
metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
Next we fit our classifier to our training set and create our
confusion matrix. Finally we visualise our results.
# Data Preprocessing
# Importing the Library
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset= pd.read_csv('Data.csv')
X = dataset.iloc[: , [2, 3]].values
Y = dataset.iloc[: , 4].values
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
# Fitting Classifier to the Training set
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
# Making the Confusion Matrix
fromsklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, Y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arrange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() +
1, step = 0.01),
np.arrange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01)
plt.contourf(X1, X2, classifier.predict(np.array([X1.rave(),
X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'Green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X1.min(), X1.max())
for i, j in emunerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1]
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('K-NN (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, Y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arrange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() +
1, step = 0.01),
np.arrange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01)
plt.contourf(X1, X2, classifier.predict(np.array([X1.rave(),
X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'Green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X1.min(), X1.max())
for i, j in emunerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1]
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('K-NN (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()