KEMBAR78
Case Study - Classifier | PDF | Machine Learning | Support Vector Machine
0% found this document useful (0 votes)
73 views5 pages

Case Study - Classifier

1. The document describes a case study comparing different machine learning classifiers - SVM, KNN, K-means clustering, and decision trees - on a bill authentication dataset. 2. The SVM, KNN, and K-means classifiers are implemented on the dataset, with the KNN and K-means classifiers achieving 100% accuracy on the test data. 3. The document discusses preprocessing steps like feature selection and train-test splitting, and evaluates the different classifier performances using classification reports.

Uploaded by

Stuti Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views5 pages

Case Study - Classifier

1. The document describes a case study comparing different machine learning classifiers - SVM, KNN, K-means clustering, and decision trees - on a bill authentication dataset. 2. The SVM, KNN, and K-means classifiers are implemented on the dataset, with the KNN and K-means classifiers achieving 100% accuracy on the test data. 3. The document discusses preprocessing steps like feature selection and train-test splitting, and evaluates the different classifier performances using classification reports.

Uploaded by

Stuti Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

10/16/2020 Case Study - Classifier - Colaboratory

NAME - AJINKYA KSHIRSAGAR

PRN - 19030141005

COURSE - Applied Data Analytics with Python

SEM - 3

Assignment No: 3

Title of the Assignment:-

Case Study : Machine Learning : Classi ers

Due tomorrow at 11:59 PM

Instructions

1. Select any data set of your own choice


2. Use following classi ers for the data prediction a. SVM b. K-NN c. K-means clustering d. Decision Tree classi er
3. Do the comparative study and discuss the predictions from these classi ers.

CASE STUDY

Choosing the right estimator for Machine Learning

INTRODUCTION

Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of arti cial
intelligence.

Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or
decisions without being explicitly programmed to do so.

Machine learning algorithms are used in a wide variety of applications, such as email ltering and computer vision, where it is di cult or
infeasible to develop conventional algorithms to perform the needed tasks.

Example
A learning problem considers a set of n samples of data and then tries to predict properties of unknown data. If each sample is more than a
single number and, for instance, a multi-dimensional entry (aka multivariate data), it is said to have several attributes or features.

Learning problems fall into a few categories:

1. Supervised learning
The in which the data comes with additional attributes that we want to predict .

This problem can be either:

Classi cation:

If thesamples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data. An
example of a classi cation problem would be handwritten digit recognition, in which the aim is to assign each input vector to one of a nite
number of discrete categories. Another way to think of classi cation is as a discrete (as opposed to continuous) form of supervised learning
where one has a limited number of categories and for each of the n samples provided, one is to try to label them with the correct category or
class.

Regression:

If the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem
would be the prediction of the length of a salmon as a function of its age and weight.

2. Unsupervised learning
In which the training data consists of a set of input vectors x without any corresponding target values.

The goal in such problems

Clustering:

To discover groups of similar examples within the data, where it is called clustering

Density estimation:

To determine the distribution of data within the input space, known as density estimation

Other ==> High-dimensional space:

To project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization

CHOOSE RIGHT ESTIMATOR

https://colab.research.google.com/drive/14fe7516SkWr_pLHGviNJ546-HlCcNK5o?authuser=1#scrollTo=1PXQI5cYxQYp&printMode=true 1/5
10/16/2020 Case Study - Classifier - Colaboratory

IMPEMENTATION PART -- DATASET USED - Bill Authentication

PREPROCESSING PART

1. Importing Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

2. Importing Dataset

DF= pd.read_csv("/content/bill_authentication.csv")

DF.head()

Variance Skewness Curtosis Entropy V1 V2 Class

0 3.62160 8.6661 -2.8073 -0.44699 2.072345 -3.241693 0

1 4.54590 8.1674 -2.4586 -1.46210 17.936710 15.784810 0

2 3.86600 -2.6383 1.9242 0.10645 1.083576 7.319176 0

3 3.45660 9.5228 -4.0112 -3.59440 11.120670 14.406780 0

4 0.32924 -4.4552 4.5718 -0.98880 23.711550 2.557729 0

3. Target & Predicted Variable

A = DF.drop('Class', axis=1)
B = A.drop('V1', axis=1)

X = B.drop('V2', axis=1)
Y = DF['Class']

4. Splitting in Train & Test

from sklearn.model_selection import train_test_split


X_train, x_test, Y_train, y_test = train_test_split(X, Y, test_size = 0.20)

CLASSIFIER PART

1. Support Vector Machine(SVM) Classi er

from sklearn.svm import SVC


SVM_Classifier = SVC(kernel='linear')
SVM_Classifier.fit(X_train, Y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,


decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)

y_pred = SVM_Classifier.predict(x_test)

from sklearn.metrics import classification_report, confusion_matrix


print('Support Vector Machine : \n',classification report(y test,y pred))
https://colab.research.google.com/drive/14fe7516SkWr_pLHGviNJ546-HlCcNK5o?authuser=1#scrollTo=1PXQI5cYxQYp&printMode=true 2/5
10/16/2020 Case Study - Classifier - Colaboratory
print( Support Vector Machine : \n ,classification_report(y_test,y_pred))

Support Vector Machine :


precision recall f1-score support

0 0.97 1.00 0.99 149


1 1.00 0.97 0.98 126

accuracy 0.99 275


macro avg 0.99 0.98 0.99 275
weighted avg 0.99 0.99 0.99 275

2. K-Nearest Neighbour(KNN) Classi er

from sklearn.neighbors import KNeighborsClassifier


KNN_Classifier = KNeighborsClassifier(n_neighbors = 1)
KNN_Classifier.fit(X_train, Y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',


metric_params=None, n_jobs=None, n_neighbors=1, p=2,
weights='uniform')

y_pred = KNN_Classifier.predict(x_test)

from sklearn.metrics import classification_report, confusion_matrix


print('K-Nearest Neighbour : \n',classification_report(y_test,y_pred))

K-Nearest Neighbour :
precision recall f1-score support

0 1.00 1.00 1.00 149


1 1.00 1.00 1.00 126

accuracy 1.00 275


macro avg 1.00 1.00 1.00 275
weighted avg 1.00 1.00 1.00 275

3. K-Means Classi er

f1 = DF['V1'].values
f2 = DF['V2'].values
z = np.array(list(zip(f1, f2)))
plt.scatter(f1, f2, c='black', s=7)

<matplotlib.collections.PathCollection at 0x7f6db52cfc18>

def dist(a, b, ax=1):


return np.linalg.norm(a - b, axis=ax)

k = 2

C_x = np.random.randint(0, np.max(z)-20, size=k)

C_y = np.random.randint(0, np.max(z)-20, size=k)

C = np.array(list(zip(C_x, C_y)), dtype=np.float32)


print("Initial Centroids")
print(C)

Initial Centroids
[[11. 34.]
[36. 47.]]

plt.scatter(f1, f2, c='#050505', s=7)


plt.scatter(C_x, C_y, marker='*', s=600, c='g')

<matplotlib.collections.PathCollection at 0x7f6db4ae14a8>

kmeans=KMeans(n_clusters=2)

kmeans.fit(X)

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,


n_clusters=2, n_init=10, n_jobs=None, precompute_distances='auto',
random_state=None, tol=0.0001, verbose=0)

from sklearn.metrics import classification_report


print('K-Means : \n',classification_report(y_test,y_pred))

https://colab.research.google.com/drive/14fe7516SkWr_pLHGviNJ546-HlCcNK5o?authuser=1#scrollTo=1PXQI5cYxQYp&printMode=true 3/5
10/16/2020 Case Study - Classifier - Colaboratory

K-Means :
precision recall f1-score support

0 1.00 1.00 1.00 149


1 1.00 1.00 1.00 126

accuracy 1.00 275


macro avg 1.00 1.00 1.00 275
weighted avg 1.00 1.00 1.00 275

4. Decision Tree Classi er

from sklearn import tree


Decision_Tree_Classifier = tree.DecisionTreeClassifier()
Decision_Tree_Classifier = Decision_Tree_Classifier.fit(X, Y)

tree.plot_tree(Decision_Tree_Classifier)

[Text(164.27686567164181, 203.85, 'X[0] <= 0.32\ngini = 0.494\nsamples = 1372\nvalue = [762, 610]'),


Text(107.4358208955224, 176.67000000000002, 'X[1] <= 7.565\ngini = 0.306\nsamples = 657\nvalue = [124, 533]'),
Text(74.95522388059702, 149.49, 'X[0] <= -0.403\ngini = 0.131\nsamples = 552\nvalue = [39, 513]'),
Text(39.97611940298508, 122.31, 'X[2] <= 6.219\ngini = 0.07\nsamples = 471\nvalue = [17, 454]'),
Text(19.98805970149254, 95.13, 'X[1] <= 7.293\ngini = 0.006\nsamples = 324\nvalue = [1, 323]'),
Text(9.99402985074627, 67.94999999999999, 'gini = 0.0\nsamples = 320\nvalue = [0, 320]'),
Text(29.982089552238808, 67.94999999999999, 'X[1] <= 7.349\ngini = 0.375\nsamples = 4\nvalue = [1, 3]'),
Text(19.98805970149254, 40.77000000000001, 'gini = 0.0\nsamples = 1\nvalue = [1, 0]'),
Text(39.97611940298508, 40.77000000000001, 'gini = 0.0\nsamples = 3\nvalue = [0, 3]'),
Text(59.964179104477616, 95.13, 'X[1] <= -4.675\ngini = 0.194\nsamples = 147\nvalue = [16, 131]'),
Text(49.97014925373135, 67.94999999999999, 'gini = 0.0\nsamples = 130\nvalue = [0, 130]'),
Text(69.95820895522388, 67.94999999999999, 'X[1] <= -2.962\ngini = 0.111\nsamples = 17\nvalue = [16, 1]'),
Text(59.964179104477616, 40.77000000000001, 'X[1] <= -4.581\ngini = 0.5\nsamples = 2\nvalue = [1, 1]'),
Text(49.97014925373135, 13.590000000000003, 'gini = 0.0\nsamples = 1\nvalue = [1, 0]'),
Text(69.95820895522388, 13.590000000000003, 'gini = 0.0\nsamples = 1\nvalue = [0, 1]'),
Text(79.95223880597015, 40.77000000000001, 'gini = 0.0\nsamples = 15\nvalue = [15, 0]'),
Text(109.93432835820896, 122.31, 'X[1] <= 5.454\ngini = 0.396\nsamples = 81\nvalue = [22, 59]'),
Text(99.9402985074627, 95.13, 'X[2] <= 2.625\ngini = 0.265\nsamples = 70\nvalue = [11, 59]'),
Text(89.94626865671643, 67.94999999999999, 'gini = 0.0\nsamples = 58\nvalue = [0, 58]'),
Text(109.93432835820896, 67.94999999999999, 'X[3] <= 1.228\ngini = 0.153\nsamples = 12\nvalue = [11, 1]'),
Text(99.9402985074627, 40.77000000000001, 'gini = 0.0\nsamples = 11\nvalue = [11, 0]'),
Text(119.92835820895523, 40.77000000000001, 'gini = 0.0\nsamples = 1\nvalue = [0, 1]'),
Text(119.92835820895523, 95.13, 'gini = 0.0\nsamples = 11\nvalue = [11, 0]'),
Text(139.91641791044776, 149.49, 'X[0] <= -4.726\ngini = 0.308\nsamples = 105\nvalue = [85, 20]'),
Text(129.9223880597015, 122.31, 'gini = 0.0\nsamples = 20\nvalue = [0, 20]'),
Text(149.91044776119404, 122.31, 'gini = 0.0\nsamples = 85\nvalue = [85, 0]'),
Text(221.1179104477612, 176.67000000000002, 'X[2] <= -4.386\ngini = 0.192\nsamples = 715\nvalue = [638, 77]'),
Text(179.89253731343285, 149.49, 'X[1] <= 7.192\ngini = 0.363\nsamples = 42\nvalue = [10, 32]'),
Text(169.89850746268658, 122.31, 'gini = 0.0\nsamples = 32\nvalue = [0, 32]'),
Text(189.88656716417913, 122.31, 'gini = 0.0\nsamples = 10\nvalue = [10, 0]'),
Text(262.34328358208955, 149.49, 'X[0] <= 1.592\ngini = 0.125\nsamples = 673\nvalue = [628, 45]'),
Text(209.87462686567164, 122.31, 'X[2] <= -2.272\ngini = 0.352\nsamples = 184\nvalue = [142, 42]'),
Text(184.88955223880598, 95.13, 'X[1] <= 5.667\ngini = 0.198\nsamples = 27\nvalue = [3, 24]'),
Text(174.8955223880597, 67.94999999999999, 'gini = 0.0\nsamples = 24\nvalue = [0, 24]'),
Text(194.88358208955225, 67.94999999999999, 'gini = 0.0\nsamples = 3\nvalue = [3, 0]'),
Text(234.85970149253734, 95.13, 'X[3] <= 0.082\ngini = 0.203\nsamples = 157\nvalue = [139, 18]'),
Text(214.8716417910448, 67.94999999999999, 'X[0] <= 0.42\ngini = 0.017\nsamples = 120\nvalue = [119, 1]'),
Text(204.87761194029852, 40.77000000000001, 'X[2] <= -1.324\ngini = 0.111\nsamples = 17\nvalue = [16, 1]'),
Text(194.88358208955225, 13.590000000000003, 'gini = 0.0\nsamples = 1\nvalue = [0, 1]'),
Text(214.8716417910448, 13.590000000000003, 'gini = 0.0\nsamples = 16\nvalue = [16, 0]'),
Text(224.86567164179107, 40.77000000000001, 'gini = 0.0\nsamples = 103\nvalue = [103, 0]'),
Text(254.84776119402986, 67.94999999999999, 'X[2] <= 1.853\ngini = 0.497\nsamples = 37\nvalue = [20, 17]'),
Text(244.85373134328358, 40.77000000000001, 'X[1] <= 3.559\ngini = 0.188\nsamples = 19\nvalue = [2, 17]'),
Text(234.85970149253734, 13.590000000000003, 'gini = 0.0\nsamples = 17\nvalue = [0, 17]'),
Text(254.84776119402986, 13.590000000000003, 'gini = 0.0\nsamples = 2\nvalue = [2, 0]'),
Text(264.84179104477613, 40.77000000000001, 'gini = 0.0\nsamples = 18\nvalue = [18, 0]'),
Text(314.81194029850747, 122.31, 'X[0] <= 2.037\ngini = 0.012\nsamples = 489\nvalue = [486, 3]'),
Text(304.8179104477612, 95.13, 'X[2] <= -2.648\ngini = 0.101\nsamples = 56\nvalue = [53, 3]'),
Text(294.8238805970149, 67.94999999999999, 'X[3] <= -1.796\ngini = 0.375\nsamples = 4\nvalue = [1, 3]'),
Text(284.8298507462687, 40.77000000000001, 'gini = 0.0\nsamples = 1\nvalue = [1, 0]'),
Text(304.8179104477612, 40.77000000000001, 'gini = 0.0\nsamples = 3\nvalue = [0, 3]'),
Text(314.81194029850747, 67.94999999999999, 'gini = 0.0\nsamples = 52\nvalue = [52, 0]'),
Text(324.80597014925377, 95.13, 'gini = 0.0\nsamples = 433\nvalue = [433, 0]')]

tree.plot_tree(Decision_Tree_Classifier)
plt.savefig('DTImage')

y_pred = Decision_Tree_Classifier.predict(x_test)

from sklearn.metrics import classification_report, confusion_matrix


print('Decision Tree : \n',classification_report(y_test,y_pred))

Decision Tree :
precision recall f1-score support

0 1.00 1.00 1.00 149


1 1.00 1.00 1.00 126

accuracy 1.00 275


macro avg 1.00 1.00 1.00 275
weighted avg 1.00 1.00 1.00 275

https://colab.research.google.com/drive/14fe7516SkWr_pLHGviNJ546-HlCcNK5o?authuser=1#scrollTo=1PXQI5cYxQYp&printMode=true 4/5
10/16/2020 Case Study - Classifier - Colaboratory

COMPARISION ANALYSIS

https://colab.research.google.com/drive/14fe7516SkWr_pLHGviNJ546-HlCcNK5o?authuser=1#scrollTo=1PXQI5cYxQYp&printMode=true 5/5

You might also like