0% found this document useful (0 votes)

18 views7 pages

Lab5 DataMining

lab data mining course HCMIU

Uploaded by

Nguyễn Duy Phúc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views7 pages

Lab5 DataMining

lab data mining course HCMIU

Uploaded by

Nguyễn Duy Phúc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

ITDSIU21030 Nguyễn Duy Phúc

Introduction to Data Mining

Lab 5: More Classifiers

5.1. Classification boundaries

In the fifth class, we are going to look at some machine learning methods used to classify datasets in
Weka. (See the lecture of class 4 by Ian H. Witten, [1]1). We are going to learn about linear regression,
classification by regression, and support vector machines.

In this section, we are going to start by looking at classification boundaries for different machine
learning methods. We are going to use Weka’s Boundary Visualizer, and a 2-dimensional datasets.
Follow the instructions in [1] to do some experiments, and then fill in the following table with the
classifier models.

Dataset Rules → OneR Lazy → IBk

K=5 K=20
Iris.2D.arff Classifier model (full training set) Classifier model (full training set)
=========================== =============================
=== Classifier model (full IB1 instance-based classifier IB1 instance-based classifier using
training set) === using 5 nearest neighbor(s) for 20 nearest neighbor(s) for
petalwidth: classification classification
< 0.8 -> Iris-setosa Time taken to build model: 0 Time taken to build model: 0
< 1.75 -> Iris-versicolor seconds seconds
>= 1.75 -> Iris-virginica
(144/150 instances correct)

1
http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/

1
ITDSIU21030 Nguyễn Duy Phúc

Try other learning methods, e.g NaiveBayes using SupervisedDiscretization, i.e. supervised discretization
is to take the classes into account when discretizing numeric attributes into ranges... [Refer to Text [2].
Chapter 7 for discretization part]

Dataset Bayes > NaiveBayes Trees > J48

minNumbObj = 5 minNumbObj = 10
Iris.2D.arf === Classifier model (full training set) === === Classifier model (full === Classifier model (full
f training set) === training set) ===
Naive Bayes Classifier
J48 pruned tree J48 pruned tree
Class ------------------ ------------------
Attribute Iris-setosa Iris-versicolor Iris-
virginica petalwidth <= 0.6: Iris-setosa petalwidth <= 0.6: Iris-setosa
(0.33) (0.33) (0.33) (50.0) (50.0)
===================================== petalwidth > 0.6 petalwidth > 0.6
= |petalwidth <= 1.7 | petalwidth <= 1.7: Iris-
petallength |petallength <= 4.9: Iris- versicolor (54.0/5.0)
mean 1.4694 4.2452 versicolor (48.0/1.0) | petalwidth > 1.7: Iris-
5.5516 |petallength > 4.9: Iris- virginica (46.0/1.0)
std. dev. 0.1782 0.4712 virginica (6.0/2.0)
0.5529 |petalwidth > 1.7: Iris- Number of Leaves :
weight sum 50 50 50 virginica (46.0/1.0) 3
precision 0.1405 0.1405
0.1405 Number of Leaves: 4 Size of the tree : 5

petalwidth Size of the tree: 7

mean 0.2743 1.3097
2.0343
std. dev. 0.1096 0.1915
0.2646
weight sum 50 50 50
precision 0.1143 0.1143
0.1143

2
ITDSIU21030 Nguyễn Duy Phúc

5.2. Linear regression

In this section, we are going to deal with numeric classes using a classical statistical method.

Follow the lecture of linear regression in [1] to learn how to calculate weights of attributes from training
data, and make predictions. [Refer to Text [2]. Chapter 4.6 for linear regression part]

Follow the instructions in [1] to examine the model of linear regression on the cpu dataset.

Write down the results in the following table:

Dataset Correlation Mean Root mean Relative Root relative

coefficient absolute squared absolute squared
error error error error
Cpu 0.9012 41.0886 69.556 42.6943% 43.2421%

Linear class =
Regression 0.0491 * MYCT +
Model 0.0152 * MMIN +
0.0056 * MMAX +
0.6298 * CACH +
1.4599 * CHMAX +
-56.075
Time taken to build model: 0.08 seconds

Do again to examine M5P on the cpu dataset, and then write down the results in the following table:

Dataset Correlation Mean Root mean Relative Root relative

coefficient absolute squared absolute squared
error error error error
Cpu 0.9274 29.8309 60.7112 30.9967% 37.7434%

Classifier M5 pruned model tree:

model
(using smoothed linear models)

CHMIN <= 7.5 : LM1 (165/12.903%)

3
ITDSIU21030 Nguyễn Duy Phúc

| MMAX > 28000 : LM5 (23/48.302%)

LM num: 1
class = -0.0055 * MYCT
+ 0.0013 * MMIN
+ 0.0029 * MMAX
+ 0.8007 * CACH
+ 0.4015 * CHMAX
+ 11.0971

LM num: 2
class = -1.0307 * MYCT
+ 0.0086 * MMIN
+ 0.0031 * MMAX
+ 0.7866 * CACH
- 2.4503 * CHMIN
+ 1.1597 * CHMAX
+ 70.8672
LM num: 3
class = -1.1057 * MYCT
+ 0.0086 * MMIN
+ 0.0031 * MMAX
+ 0.7995 * CACH
- 2.4503 * CHMIN
+ 1.1597 * CHMAX
+ 83.0016

LM num: 4
class = -0.8813 * MYCT
+ 0.0086 * MMIN
+ 0.0031 * MMAX
+ 0.6547 * CACH
- 2.3561 * CHMIN
+ 1.1597 * CHMAX
+ 82.5725

LM num: 5
class = -0.4882 * MYCT
+ 0.0218 * MMIN
+ 0.003 * MMAX
+ 0.3865 * CACH
- 1.3252 * CHMIN
+ 3.3671 * CHMAX
- 51.8474

Number of Rules : 5
Time taken to build model: 0.04 seconds

4
ITDSIU21030 Nguyễn Duy Phúc

Linear regression models:

=== Classifier model (full training set) ===

Linear Regression Model

class =

0.0491 * MYCT +
0.0152 * MMIN +
0.0056 * MMAX +
0.6298 * CACH +
1.4599 * CHMAX +
-56.075

Is M5P non-linear regression?

- Yes, M5P is a type of non-linear regression model. M5P stands for "Model Tree" and is an extension of
the M5 model tree algorithm. It is designed to handle both linear and non-linear relationships within the
data.

M5P combines decision trees and linear regression models. The algorithm first constructs a decision tree
where each internal node represents a decision based on input features, and the leaves contain linear
regression models. By partitioning the data space and fitting linear models within each partition, M5P
can capture complex, non-linear relationships in the data.

In summary, while the linear regression models at the leaves are linear, the overall model structure and
its ability to split the data based on different conditions make M5P a non-linear regression model.

5.3. Classification by regression

Follow the instructions in [1] to investigate two‐class classification by regression, using the
diabetes dataset.

We are going to convert the nominal class to the numeric class so that the linear regression
model is applicable.

Write down the results in the following table:

Classifier model Evaluation

5
ITDSIU21030 Nguyễn Duy Phúc

Linear Regression Model Correlation coefficient:

0.5322
class=tested_positive = Mean absolute error:
0.0209 * preg + 0.3366
0.0057 * plas + Root mean squared error:
-0.0024 * pres + 0.4036
0.0131 * mass + Relative absolute error:
0.1403 * pedi + 74.0119 %
0.0028 * age + Root relative squared
-0.8363 error: 84.6013 %
Total Number of Instances:
768

5.4. Support vector machines

Learn about logistic regression in [2]. Chapter 4.6

Follow the lecture of support vector machines (SVMs) in [1], …

Support vector machines (SVMs, also support vector networks [1]) are supervised learning models with
associated learning algorithms that analyze data and recognize patterns, used for classification and
regression analysis. Given a set of training examples, each marked as belonging to one of two
categories, an SVM training algorithm builds a model that assigns new examples into one category or
the other, making it a non-probabilistic binary linear classifier. An SVM model is a representation of the
examples as points in space, mapped so that the examples of the separate categories are divided by a
clear gap that is as wide as possible. New examples are then mapped into that same space and
predicted to belong to a category based on which side of the gap they fall on.

Follow the instructions in [1] to examine SMO and LibSVM, and fill in the following table:

Dataset SMO’s classifier model and performance LibSVM’s classifier model and
performance
diabetes Kernel used: LibSVM wrapper, original code by
Linear Kernel: K(x,y) = <x,y> Yasser EL-Manzalawy (= WLSVM)

Classifier for classes: tested_negative, Time taken to build model: 0.07

tested_positive seconds

6
ITDSIU21030 Nguyễn Duy Phúc

BinarySMO ==============================
Correctly Classified Instances: 500
Machine linear: showing attribute weights, not 65.1042 %
support vectors. Incorrectly Classified Instances: 268
34.8958 %
1.3614 * (normalized) preg Kappa statistic: 0
+ 4.8764 * (normalized) plas Mean absolute error: 0.349
+ -0.8118 * (normalized) pres Root mean squared error: 0.5907
+ -0.1158 * (normalized) skin Relative absolute error: 76.7774 %
+ -0.1776 * (normalized) insu Root relative squared error: 23.9347 %
+ 3.0745 * (normalized) mass Total Number of Instances: 768
+ 1.4242 * (normalized) pedi
+ 0.2601 * (normalized) age
- 5.1761

Number of kernel evaluations: 19131 (69.279%

cached)

Notice: A wrapper class for the libsvm tools (the libsvm classes, typically the jar file, need to be in the
classpath to use this classifier) >> see http://weka.wikispaces.com/LibSVM

Lab 5
No ratings yet
Lab 5
4 pages
Research
No ratings yet
Research
12 pages
04 SVM
No ratings yet
04 SVM
8 pages
Python ML Algorithm
No ratings yet
Python ML Algorithm
30 pages
MLA Manual
No ratings yet
MLA Manual
25 pages
Ludic - Workshop - Iris - Copie
No ratings yet
Ludic - Workshop - Iris - Copie
5 pages
TA LAB 5 BTech
No ratings yet
TA LAB 5 BTech
2 pages
Wa0001
No ratings yet
Wa0001
39 pages
Machine Learning in Disease Prediction
No ratings yet
Machine Learning in Disease Prediction
21 pages
EXP 9 DWM - Merged
No ratings yet
EXP 9 DWM - Merged
11 pages
01 Machine Learning
No ratings yet
01 Machine Learning
25 pages
22BCS14374 - Sanya - Singh - Assignment 2
No ratings yet
22BCS14374 - Sanya - Singh - Assignment 2
8 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Machine Learning Algorithm
No ratings yet
Machine Learning Algorithm
18 pages
24CSR1R01 DSF Assignment 2
No ratings yet
24CSR1R01 DSF Assignment 2
9 pages
Government Engineering College, Modasa: B.E. - Computer Engineering (Semester - VII) 3170724 - Machine Learning
No ratings yet
Government Engineering College, Modasa: B.E. - Computer Engineering (Semester - VII) 3170724 - Machine Learning
3 pages
ML Lab Manual 4-8
No ratings yet
ML Lab Manual 4-8
11 pages
LS Project Report
No ratings yet
LS Project Report
10 pages
FDA Assignment
No ratings yet
FDA Assignment
2 pages
Macine Resit
No ratings yet
Macine Resit
7 pages
VAMSHI PR (1) 2 Edit
No ratings yet
VAMSHI PR (1) 2 Edit
16 pages
ML Mini Project
No ratings yet
ML Mini Project
9 pages
Decision Trees
No ratings yet
Decision Trees
38 pages
K-Nearest Neighbors Classifiers 2025
No ratings yet
K-Nearest Neighbors Classifiers 2025
33 pages
Lab Manual ML
No ratings yet
Lab Manual ML
23 pages
Iris Dataset
No ratings yet
Iris Dataset
3 pages
SupervisedLearning Classification
No ratings yet
SupervisedLearning Classification
20 pages
Lesson 5 - Supervised Learning-Classification
100% (1)
Lesson 5 - Supervised Learning-Classification
91 pages
Machine Learning and Pattern Recognition Week 3 Intro - Classification
No ratings yet
Machine Learning and Pattern Recognition Week 3 Intro - Classification
5 pages
Comparative Analysis of ML
No ratings yet
Comparative Analysis of ML
9 pages
Seminar Presentation
No ratings yet
Seminar Presentation
25 pages
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
Breast Cancer Classification
100% (2)
Breast Cancer Classification
16 pages
Update Ai 2
No ratings yet
Update Ai 2
22 pages
Practical File DL
No ratings yet
Practical File DL
14 pages
Results Thesis
No ratings yet
Results Thesis
14 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
Week 05 Classification Performance
No ratings yet
Week 05 Classification Performance
11 pages
Ai/Ml Lab-4: Name: Pratik Jadhav PRN: 20190802050
No ratings yet
Ai/Ml Lab-4: Name: Pratik Jadhav PRN: 20190802050
5 pages
Machine Learning Journal
No ratings yet
Machine Learning Journal
13 pages
BT-2016 SEM-IV Project Report (Review 1)
No ratings yet
BT-2016 SEM-IV Project Report (Review 1)
42 pages
Solution HW2
No ratings yet
Solution HW2
6 pages
ML Lab Manual
No ratings yet
ML Lab Manual
6 pages
Aditya Predictive
No ratings yet
Aditya Predictive
12 pages
Sameed Ahmed Khan Tools For Artificial Neural Network and Machine Learning
No ratings yet
Sameed Ahmed Khan Tools For Artificial Neural Network and Machine Learning
14 pages
Variable Selection Benchmark Guide
No ratings yet
Variable Selection Benchmark Guide
30 pages
Machine Learning Unit-2
No ratings yet
Machine Learning Unit-2
89 pages
Eai Exp 2-5
No ratings yet
Eai Exp 2-5
13 pages
JAYESH BANSAL - FinalProjectReport - Jayesh Bansal
No ratings yet
JAYESH BANSAL - FinalProjectReport - Jayesh Bansal
38 pages
Classification Algorithms II
No ratings yet
Classification Algorithms II
9 pages
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
No ratings yet
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
20 pages
M5 Prime Lab
No ratings yet
M5 Prime Lab
10 pages
ML Lab Record2
No ratings yet
ML Lab Record2
42 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Lab 6
No ratings yet
Lab 6
4 pages
Paper Pengolahan Data
No ratings yet
Paper Pengolahan Data
9 pages
Soni 2022 J. Phys. Conf. Ser. 2161 012065
No ratings yet
Soni 2022 J. Phys. Conf. Ser. 2161 012065
11 pages
Machine Learning Multiple Choice Questions
100% (1)
Machine Learning Multiple Choice Questions
20 pages
AI-ML-DataScience Notes
No ratings yet
AI-ML-DataScience Notes
7 pages
Apple Fruit Quality Evaluation
No ratings yet
Apple Fruit Quality Evaluation
6 pages
Predictive Analytics in Healthcare: An Engineering Project in Community Service
No ratings yet
Predictive Analytics in Healthcare: An Engineering Project in Community Service
23 pages
AI-Powered Surveillance Solution
No ratings yet
AI-Powered Surveillance Solution
11 pages
Cse-564 (Final Viva Voce
No ratings yet
Cse-564 (Final Viva Voce
32 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
2018 Recent Advances On Spectral-Spatial Hyperspectral Image Classification An Overview and New Guidelines
No ratings yet
2018 Recent Advances On Spectral-Spatial Hyperspectral Image Classification An Overview and New Guidelines
19 pages
Data Science
No ratings yet
Data Science
5 pages
A Two-Stage Classification of Heart Sounds Using Tunable Quality Wavelet Transform Features
No ratings yet
A Two-Stage Classification of Heart Sounds Using Tunable Quality Wavelet Transform Features
5 pages
Dsai Report
No ratings yet
Dsai Report
12 pages
Stock Prediction Using Machine Learning Google Scholar
No ratings yet
Stock Prediction Using Machine Learning Google Scholar
8 pages
Yasha Hasija, Rajkumar Chakraborty - Hands On Data Science For Biologists Using Python (2021, CRC Press) - Libgen - Li
No ratings yet
Yasha Hasija, Rajkumar Chakraborty - Hands On Data Science For Biologists Using Python (2021, CRC Press) - Libgen - Li
299 pages
Intent Detection Report
0% (1)
Intent Detection Report
41 pages
Datamining Lab Record
No ratings yet
Datamining Lab Record
36 pages
ML 2 Marks
No ratings yet
ML 2 Marks
7 pages
Lin 2022 - Recent Advances in The Application of Machine Learning Methods
No ratings yet
Lin 2022 - Recent Advances in The Application of Machine Learning Methods
7 pages
2.1 Regression Analysis
No ratings yet
2.1 Regression Analysis
28 pages
Feature Scaling (Standardization & Normalization)
No ratings yet
Feature Scaling (Standardization & Normalization)
35 pages
10-701 Midterm Exam Solutions, Spring 2007
No ratings yet
10-701 Midterm Exam Solutions, Spring 2007
20 pages
Deep Learning For Soil Nutrient Prediction and Strategic Crop Recommendations: An Analytic Perspective
No ratings yet
Deep Learning For Soil Nutrient Prediction and Strategic Crop Recommendations: An Analytic Perspective
12 pages
Image Based Bird Species Identification Using
No ratings yet
Image Based Bird Species Identification Using
7 pages
Rao Et Al 2021 Covid 19 Detection Using Cough Sound Analysis and Deep Learning Algorithms
No ratings yet
Rao Et Al 2021 Covid 19 Detection Using Cough Sound Analysis and Deep Learning Algorithms
11 pages
Variable Importance Analysis in Imbalanced Datasets A New Approach
No ratings yet
Variable Importance Analysis in Imbalanced Datasets A New Approach
27 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
Lesson Plan - ML24ECSC306
No ratings yet
Lesson Plan - ML24ECSC306
22 pages
Multimedia and Computer Vision Unit 5
No ratings yet
Multimedia and Computer Vision Unit 5
25 pages
AI Internship 45 Days Plan
No ratings yet
AI Internship 45 Days Plan
3 pages

Lab5 DataMining

Uploaded by

Lab5 DataMining

Uploaded by

ITDSIU21030 Nguyễn Duy Phúc

Introduction to Data Mining

5.1. Classification boundaries

Dataset Rules → OneR Lazy → IBk

Dataset Bayes > NaiveBayes Trees > J48

petalwidth Size of the tree: 7

5.2. Linear regression

Write down the results in the following table:

Dataset Correlation Mean Root mean Relative Root relative

Dataset Correlation Mean Root mean Relative Root relative

Classifier M5 pruned model tree:

CHMIN <= 7.5 : LM1 (165/12.903%)

| MMAX > 28000 : LM5 (23/48.302%)

Linear regression models:

Linear regression models:

Linear Regression Model

Is M5P non-linear regression?

5.3. Classification by regression

Write down the results in the following table:

Classifier model Evaluation

Linear Regression Model Correlation coefficient:

5.4. Support vector machines

Learn about logistic regression in [2]. Chapter 4.6

Follow the lecture of support vector machines (SVMs) in [1], …

Classifier for classes: tested_negative, Time taken to build model: 0.07

Number of kernel evaluations: 19131 (69.279%

You might also like