KNN-SVM Assignment

The assignment involves implementing KNN and SVM classifiers using two datasets: car-evaluation and Breast Cancer Wisconsin. It includes tasks such as data preparation, model training, feature selection, and performance analysis, focusing on accuracy, training time, and the impact of different parameters. The final conclusions highlight SVM's strengths in high-dimensional spaces and its weaknesses with noisy or overlapping data.

Uploaded by

harithmsylhy3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views4 pages

KNN-SVM Assignment

Uploaded by

harithmsylhy3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Assignment: Classification (KNN - SVM)

Dataset:
During this assignment you will use:
- Data User Modeling Dataset (use car-evaluation) is used. Training and test splits are
provided in csv file format for task 1.
- Data User Modeling Dataset (Breast Cancer Wisconsin (Diagnostic)) is used. Training and
test splits are provided in csv file format for task 2.
You will find them on the drive.

Task 1:
Use scikit-learn or other python packages to implement a KNN classifier (KNeigh-
borsClassifier). In this question, we use car-evaluation-dataset,
(a) In this dataset, there are 1728 samples in total. Firstly, you need to shuffle the
dataset and split the dataset into a training set with 1000 samples and a validation set
with 300 samples and a testing set with 428 samples. Use python to implement this data
preparation step.
(b) Since some attributes are represented by string values. If we choose a distance
metric like Euclidean distance, we need to transform the string values into numbers. Use
python to implement this preprocessing step.
(c) Try to use different number of training samples to show the impact of number of
training samples. Use 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% and 100% of the
training set for 10 separate KNN classifiers and show their performance (accuracy score)
on the validation set and testing set. You can specify a fixed K=2 value (nearest
neighbor) in this question. Notably, X axis is the portion of the training set, Y axis should
be the accuracy score. There should be two lines in total, one is for the validation set
and another is for the testing set.
(d) Use 100% of training samples, try to find the best K value, and show the accuracy
curve on the validation set when K varies from 1 to 10.
(e) Analysis the training time when use different number of training samples. Consider
the following 4 cases:
• 10% of the whole training set and K = 2
• 100% of the whole training set and K = 2
• 10% of the whole training set and K = 10
• 100% of the whole training set and K = 10.
Plot a bar chart figure to show the prediction time on the testing set.
(f) Provide your conclusions from the experiments of question (c), (d) and (e) in this
question.

Task 2:
1.
• Load the dataset and convert categorical class labels under the target column to
numerical values by using the LabelEncoder.
• Choose two features from dataset to apply SVM and Logistic Regression
algorithms for classification. Plot the data by showing classes separately. Explain
how and why you chose the two features?
• Classify testing data by using SVM and Logistic Regression classifiers. Provide
accuracies.
2.
➢ Shuffling and Splitting:
o The dataset contains 569 samples.
o Shuffle all the dataset and split it into:
▪ Training set: 400 samples
▪ Validation set: 100 samples
▪ Testing set: 69 samples
➢ Preprocessing:
o Standardize the numerical features (use StandardScaler from Scikit-learn).
➢ Feature Selection:

o Perform feature selection using correlation or other methods to identify

the most important features.
o Visualize the dataset in a 2D plot with the chosen features, showing classes
separately

➢ Linear Kernel and Decision Boundary:

• Train an SVM classifier with a linear kernel and visualize the decision boundaries.
• Explain the results: Is the dataset linearly separable? How does SVM handle this?

➢ RBF Kernel and Decision Boundary:

• Train an SVM classifier with an RBF kernel and visualize the decision boundaries.
• Compare the results with the linear kernel. Discuss how the RBF kernel uses the kernel trick
to map data into a higher-dimensional space.

➢ poly Kernel and Decision Boundary:

• Train an SVM classifier with a poly kernel and visualize the decision boundaries.
• Compare the results with the linear kernel. Discuss how the poly kernel handle this?

➢ Compare kernels accuracy scores on the validation and testing sets using default
hyperparameters.
➢ Use grid search to tune the following hyperparameters:

• C: Test values [0.01, 0.1, 1, 10, 100]. Plot the accuracy score on the validation set as C varies
• gamma: Test values [0.001, 0.01, 0.1, 1]. Plot the accuracy score on the validation set as
gamma varies.

4. Performance Analysis:

➢ Training Time and Prediction Time:

• Measure and analyze the training and prediction times for SVM under the
following scenarios:
o Case 1: 10% of the training set and kernel=linear, C=1.
o Case 2: 100% of the training set and kernel=linear, C=1.
o Case 3: 10% of the training set and kernel=rbf, C=1, gamma=0.01.
o Case 4: 100% of the training set and kernel=rbf, C=1, gamma=0.01.
• Visualization: Plot a bar chart to show the training time and prediction time
for each of these scenarios.

5. Exploring the Weaknesses of SVM

➢ Overlapping Classes:

• Manipulate the feature values (e.g., reduce separability between some wine
quality scores) to create overlapping classes.
• Train the classifier again and discuss the impact on accuracy. How does the
performance drop when classes overlap?

➢ Noisy Data:
• Add Gaussian noise to 10% of the samples in the training set.
• Retrain the SVM classifier with the RBF kernel and tuned parameters.
• Report accuracy on the validation and testing sets, and compare with the
results from the clean dataset.

6. Conclusions:
After running your code, you can summarize:

• SVM's strengths: Good for high-dimensional spaces, especially when classes are
separable.
• SVM's weaknesses: Struggles with noisy or overlapping data, leading to
decreased accuracy.

ML Lab6
No ratings yet
ML Lab6
4 pages
SVM Implementation
No ratings yet
SVM Implementation
8 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
B24 ML Exp-3
No ratings yet
B24 ML Exp-3
10 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Da Pra Week 12 (SVM)
No ratings yet
Da Pra Week 12 (SVM)
15 pages
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
No ratings yet
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
5 pages
Program 5
No ratings yet
Program 5
3 pages
Artificial Intelligence Advance Practical
No ratings yet
Artificial Intelligence Advance Practical
12 pages
ML5&6&7&8&9&10
No ratings yet
ML5&6&7&8&9&10
35 pages
SVM Classification on Iris Dataset
No ratings yet
SVM Classification on Iris Dataset
4 pages
Data Science Practical
No ratings yet
Data Science Practical
22 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
Title: Implement Support Vector Machine Classifier: Department of Computer Science and Engineering
No ratings yet
Title: Implement Support Vector Machine Classifier: Department of Computer Science and Engineering
5 pages
ML W8 Merged
No ratings yet
ML W8 Merged
27 pages
ML Batch
No ratings yet
ML Batch
36 pages
SVM K NN MLP With Sklearn Jupyter NoteBo
No ratings yet
SVM K NN MLP With Sklearn Jupyter NoteBo
22 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
EX - NO:3: Algorithm
No ratings yet
EX - NO:3: Algorithm
11 pages
Machine Learning - SEAIML-242 (PR) b2
No ratings yet
Machine Learning - SEAIML-242 (PR) b2
39 pages
Madderla 1229719428 L10
No ratings yet
Madderla 1229719428 L10
15 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
CS6301 Homework2 KR
No ratings yet
CS6301 Homework2 KR
13 pages
Assignment 2 Specification
No ratings yet
Assignment 2 Specification
3 pages
Maxbox - Starter67 Machine Learning
No ratings yet
Maxbox - Starter67 Machine Learning
7 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
ML Fat
No ratings yet
ML Fat
9 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
Shobit Sharma (2124399) ML Lab File PDF
No ratings yet
Shobit Sharma (2124399) ML Lab File PDF
19 pages
Assignment 2 Mufan
No ratings yet
Assignment 2 Mufan
9 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
ML Internal Questions
No ratings yet
ML Internal Questions
15 pages
Fundamentals of Machine Learning Support Vector Machines, Practical Session
No ratings yet
Fundamentals of Machine Learning Support Vector Machines, Practical Session
4 pages
ML II Lab
No ratings yet
ML II Lab
5 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
1
No ratings yet
1
13 pages
SVM Classification Case Study
No ratings yet
SVM Classification Case Study
7 pages
Linearregression SVM
No ratings yet
Linearregression SVM
3 pages
Ex 6, EX 7 AIML
No ratings yet
Ex 6, EX 7 AIML
9 pages
Machine Learning Evaluation Guide
100% (1)
Machine Learning Evaluation Guide
504 pages
AAM PR QB
No ratings yet
AAM PR QB
13 pages
ML Assignment 1 - Nageswar
No ratings yet
ML Assignment 1 - Nageswar
7 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
Iii Aid - ML
No ratings yet
Iii Aid - ML
30 pages
ML Assignment-8
No ratings yet
ML Assignment-8
3 pages
MLT 07
No ratings yet
MLT 07
8 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
Iris Dataset EDA & ML Techniques
100% (2)
Iris Dataset EDA & ML Techniques
24 pages
Week-2 NK
No ratings yet
Week-2 NK
12 pages
21CSC305P ML - Lab Programs 1 - 9
No ratings yet
21CSC305P ML - Lab Programs 1 - 9
36 pages
INSY446 - 02 - Linear Model Part 1
No ratings yet
INSY446 - 02 - Linear Model Part 1
27 pages
SVM Report
No ratings yet
SVM Report
6 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
ML Assignment
No ratings yet
ML Assignment
34 pages
Logistic Regression
No ratings yet
Logistic Regression
87 pages
Presentation
No ratings yet
Presentation
19 pages
Linear Regression
No ratings yet
Linear Regression
80 pages
Python Lec3
No ratings yet
Python Lec3
28 pages
Python Lec3
No ratings yet
Python Lec3
46 pages
Assignment Numpy
No ratings yet
Assignment Numpy
1 page
Assignment ML3
No ratings yet
Assignment ML3
2 pages
Assignment Pandas#
No ratings yet
Assignment Pandas#
1 page
Regression Assignment#
No ratings yet
Regression Assignment#
1 page
Cloud Solutions and Benefits FAQ
No ratings yet
Cloud Solutions and Benefits FAQ
21 pages
The Role of Artificial Intelligence in Enhancing Personalized Learning
No ratings yet
The Role of Artificial Intelligence in Enhancing Personalized Learning
1 page
Chat GPT Final Speech
No ratings yet
Chat GPT Final Speech
2 pages
SDE2 QuickGuide
No ratings yet
SDE2 QuickGuide
13 pages
Engineering Thesis Writing Help
100% (3)
Engineering Thesis Writing Help
6 pages
ChatGPT Money Machine How To Make Money With ChatGPT and The Best AI Tools To Grow Your Online Business (Updated 2024) (Reuben, Mike) (Z-Library)
No ratings yet
ChatGPT Money Machine How To Make Money With ChatGPT and The Best AI Tools To Grow Your Online Business (Updated 2024) (Reuben, Mike) (Z-Library)
144 pages
Neuro Solutions
No ratings yet
Neuro Solutions
3 pages
Data Science Internship Report
No ratings yet
Data Science Internship Report
17 pages
Sentiment Analysis on Nias MSME Sales
No ratings yet
Sentiment Analysis on Nias MSME Sales
8 pages
Deep Learning-Question Bank-Module-Wise
75% (4)
Deep Learning-Question Bank-Module-Wise
5 pages
Impact of Artificial Intelligence On Education
No ratings yet
Impact of Artificial Intelligence On Education
3 pages
It WK 9-10
No ratings yet
It WK 9-10
4 pages
Abnormal AI Data Analyst Data Sheet
No ratings yet
Abnormal AI Data Analyst Data Sheet
1 page
Deep Learning Question
No ratings yet
Deep Learning Question
4 pages
UChicago-Jigsaw Data Science Program
No ratings yet
UChicago-Jigsaw Data Science Program
8 pages
Azure Training + Certification Guide
100% (1)
Azure Training + Certification Guide
39 pages
Generative AI Lab: Data Augmentation
No ratings yet
Generative AI Lab: Data Augmentation
7 pages
2024 Selling Trends: E-commerce & More
No ratings yet
2024 Selling Trends: E-commerce & More
10 pages
Data Analytics & Machine Learning Q&A
No ratings yet
Data Analytics & Machine Learning Q&A
4 pages
Artificial Intelligence in Business Decision Making
No ratings yet
Artificial Intelligence in Business Decision Making
23 pages
Chapter: 9 A Ghost in The Machine
No ratings yet
Chapter: 9 A Ghost in The Machine
7 pages
CPMAI - v7 PMI Exam Practice Questions
No ratings yet
CPMAI - v7 PMI Exam Practice Questions
5 pages
Bateman US-China Decoupling Final
100% (1)
Bateman US-China Decoupling Final
177 pages
Result Prediction For Soccer Games
No ratings yet
Result Prediction For Soccer Games
10 pages
Venture Pulse q4 2024
No ratings yet
Venture Pulse q4 2024
96 pages
AAT Cover Page
No ratings yet
AAT Cover Page
17 pages
CXO's Guide: Resilience & Innovation
No ratings yet
CXO's Guide: Resilience & Innovation
30 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Visionwaves - Fresh Grad - Selection Criteria & Procedure
No ratings yet
Visionwaves - Fresh Grad - Selection Criteria & Procedure
5 pages
论文数据分析
100% (1)
论文数据分析
6 pages

KNN-SVM Assignment

Uploaded by

KNN-SVM Assignment

Uploaded by

Assignment: Classification (KNN - SVM)

o Perform feature selection using correlation or other methods to identify

➢ Linear Kernel and Decision Boundary:

➢ RBF Kernel and Decision Boundary:

➢ poly Kernel and Decision Boundary:

➢ Training Time and Prediction Time:

5. Exploring the Weaknesses of SVM

You might also like