0% found this document useful (0 votes)

737 views42 pages

Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3

This document provides a tutorial on using the WEKA data mining software. It introduces WEKA's capabilities for preprocessing data, building classifiers, clustering data, finding associations, attribute selection, and data visualization. It explains how to launch WEKA's GUI, load data files, apply preprocessing techniques, and evaluate models. The goal is to help users learn how to apply machine learning algorithms to data sets using WEKA's open-source tools. Exercises are included throughout to help users practice each technique.

Uploaded by

borjaunda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPS, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

737 views42 pages

Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3

Uploaded by

borjaunda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPS, PDF, TXT or read online on Scribd

You are on page 1/ 42

Data Mining Term Project

Machine Learning with WEKA

Weka Explorer Tutorial

for Version 3.4.3
Svetlana S. Aksenova
Department of Computer Science
California State University, Sacramento
Fall 2004

Machine learning methods

for data mining
use techniques from computer science, statistics
and probability, and data visualization to search for
patterns and relationships in large data sets
Allow automatically analyze a large amount of data
The result of analysis automatically makes
predictions faster and more accurately
The result of analysis makes decisions faster and
more accurately

About WEKA
Developed by University of Waikato in New
Zealand
open source software issued under the GNU
General Public License
WEKA is a data mining system written in Java
implements data mining algorithms
compatible with most of computer platforms
applied to the dataset by choosing either
command line or graphic user interface

Introduction to the Tutorial

Created to help in learning process
Consists of 8 parts:
Introduction
Launching WEKA
Preprocessing Data
Building Classifiers
Clustering Data
Finding Associations
Attribute Selection
Data Visualization

Launching WEKA
GUI Chooser the Main Menu

Preprocessing
Data can be read from a
Local filesystem (in ARFF, CSV, C4.5, binary formats)
URL
SQL database (using JDBC)

File conversion
Preprocessing window
Preprocessing tools - filters

File Conversion

Excel

CSV

ARFF

Open File (from the local filesystem)

Open File (from a website)

http://gaia.ecs.csus.edu/~aksenovs/ weather.arff

Preprocessing Window

Setting Filters
WEKA contains filters for discretization,
normalization, resampling, attribute selection,
transformation and combination of attributes.
Some techniques, such as association rule mining,
can only be performed on categorical data.

Filter Configuration Options

Right-click on on filter

Building Classifiers
Choosing a classifier J48 (C4.5)

Setting Test Options

Output the Result

Used weather data in weather.arff for classification

Analyzing Results

Visualizing Results

Tree Visualizer

Error Visualizer

Error Visualizer (contd)

Exercise
Given at the end of the section
Classification Exercise
Use ID3 algorithm to classify weather data
from the weather.arff file. Perform initial
preprocessing and create a version of the
initial dataset in which all numeric attributes
should be converted to categorical data.

Clustering Data
The clustering schemes available in WEKA are
k-Means, EM, Cobweb, X-means, FarthestFirst.
Used customer data for clustering in customers.arff

Clustering Data (contd)

Choosing clustering scheme
K- means
5 clusters
Setting test options

Analyzing results

Visualizing Results

Results of Clustering in ARFF File

Exercise
Given at the end of the section
Clustering Exercise
Use k-means algorithm to bank data from
the bank.arff file. Perform initial
preprocessing and create a version of the
initial data set in which the ID field should
be removed and the "children" attribute
should be converted to categorical data.

Finding Associations
Apriori
works only with discrete data
identifies statistical dependencies between
groups of attributes
used grocery store data
from grocery.arff file with
confidence 40% and
support 30%.

Setting test options

Analyzing Results

Exercise
Given at the end of the section
Association Rules Exercise
Use Apriori algorithm to generate association
rules for Iris data from the iris.arff file.
Perform initial preprocessing and create a
version of the initial data set in which the
numeric attributes should be converted to
categorical data.

Attribute Selection
searches through all possible combinations of
attributes
finds which subset of attributes works best for
prediction.
contain two parts:
a search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking,
evaluation method: correlation-based, wrapper,
information gain, chi-squared.
used weather data from weather.arff file

Attribute Selection (contd)

Data Visualization
visualize a 2-D plot of the current working relation
determine difficulty of the learning problem

Data Visualization (contd)

Selecting Instances
A group of points on the graph can be selected in
four ways:
1. Select Instance
2. Rectangle
3. Polygon
4. Polyline

Select Instance

Rectangle

Polygon

Polyline

Why should we use WEKA

You can solve a machine learning
problem with a minimum programming
WEKA includes
reading of data,
implementation of filtering,
result evaluation

Performance
Has not been evaluated in this project
Can it process large ARFF files (GB)?

An answer has been found in

wekalist
It can process some schemes that are

either incrementally trainable or can be

made to be.

Future Work
Has not been done due to time constraints
Simple CLI provides a simple commandline interface and allows direct execution of
Weka commands.
KnowledgeFlow is a Java-Beans-based
interface for setting up and running machine
learning experiments.

References
1.

2.
3.
4.
5.

I. Witten, E. Frank, Data Mining, Practical Machine.

Learning Tools and Techniques with Java
Implementation, Morgan Kaufmann Publishers, 2000.
R. Kirkby, WEKA Explorer User Guide for version 3-3-4,
University of Weikato, 2002.
Weka Machine Learning Project,
http://www.cs.waikato.ac.nz/~ml/index.html.
Machine Learning With WEKA, E.Frank, University of
Waikato, New Zealand.
B. Mobasher, Data Preparation and Mining with WEKA,
http://maua.cs.depaul.edu/~classes/ect584/WEKA/associ
ation_rules.html, DePaul University, 2003.
M. H. Dunham, Data Mining, Introductory and Advanced
Topics, Prentice Hall, 2002.

Question Bank UM19MB602: Introduction To Machine Learning Unit 4: Decision Tree
No ratings yet
Question Bank UM19MB602: Introduction To Machine Learning Unit 4: Decision Tree
4 pages
SCT - QB - Anwers - p1
No ratings yet
SCT - QB - Anwers - p1
53 pages
Java Collections PDF
No ratings yet
Java Collections PDF
566 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
Assignment I Data Analytics
No ratings yet
Assignment I Data Analytics
3 pages
Data Reduction
No ratings yet
Data Reduction
22 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
Facets of Data
No ratings yet
Facets of Data
6 pages
JNTUA MCA V Semester R17 Syllabus
No ratings yet
JNTUA MCA V Semester R17 Syllabus
24 pages
Chapter 2 Introduction To R and Python
No ratings yet
Chapter 2 Introduction To R and Python
35 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
28 pages
Multidimensional Database Schemas
No ratings yet
Multidimensional Database Schemas
5 pages
Chpater 1 - Unit 2
No ratings yet
Chpater 1 - Unit 2
31 pages
Heart Disease Prediction Using Adaptive Infinite Feature Selection and Deep Neural Networks
No ratings yet
Heart Disease Prediction Using Adaptive Infinite Feature Selection and Deep Neural Networks
6 pages
Data Engineering Interview Preparation Questions
No ratings yet
Data Engineering Interview Preparation Questions
7 pages
Web & Internet Technologies Question Bank: Dr. P. Chitralingappa Page 1 of 6
No ratings yet
Web & Internet Technologies Question Bank: Dr. P. Chitralingappa Page 1 of 6
6 pages
Data Discretization Techniques
No ratings yet
Data Discretization Techniques
21 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Rule Based Systems in Artificial Intelligence
No ratings yet
Rule Based Systems in Artificial Intelligence
2 pages
Data Foundation & Visualization Guide
No ratings yet
Data Foundation & Visualization Guide
10 pages
Unit 2 Fod
No ratings yet
Unit 2 Fod
27 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
Chapter 1: Boolean Retrieval
No ratings yet
Chapter 1: Boolean Retrieval
9 pages
Attribute Oriented Analysis
No ratings yet
Attribute Oriented Analysis
27 pages
ML Lab Programs (1-12)
No ratings yet
ML Lab Programs (1-12)
35 pages
MATPLOTLIB Updated
No ratings yet
MATPLOTLIB Updated
95 pages
Digital Literacy - All Units
No ratings yet
Digital Literacy - All Units
29 pages
NLP Question Bank: Chapter-Wise Practice Problems With Solutions
No ratings yet
NLP Question Bank: Chapter-Wise Practice Problems With Solutions
45 pages
Jntuk Machine Learning 3-2 Unit-4
No ratings yet
Jntuk Machine Learning 3-2 Unit-4
32 pages
Ce2017 Data Visualization
No ratings yet
Ce2017 Data Visualization
5 pages
Syllabus
No ratings yet
Syllabus
9 pages
Software Engineering Course Guide
No ratings yet
Software Engineering Course Guide
3 pages
It Workshop Lab Viva Voce Questions
No ratings yet
It Workshop Lab Viva Voce Questions
5 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
52 pages
Fdsa Unit 5
No ratings yet
Fdsa Unit 5
48 pages
Question Bank - ML - Unit1,2,3
0% (1)
Question Bank - ML - Unit1,2,3
3 pages
Unit 4
No ratings yet
Unit 4
105 pages
DSF Unit IV MCQ Notes
No ratings yet
DSF Unit IV MCQ Notes
6 pages
II Cse Cs3352 Fds QB Unit2
No ratings yet
II Cse Cs3352 Fds QB Unit2
5 pages
PHP Programming Unit III
No ratings yet
PHP Programming Unit III
23 pages
Stock Market Big Data Insights
No ratings yet
Stock Market Big Data Insights
3 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
April 2020,2021,2022,2023 Anna University MCA Question Paper
No ratings yet
April 2020,2021,2022,2023 Anna University MCA Question Paper
16 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
Data Mining and Model Selection
No ratings yet
Data Mining and Model Selection
27 pages
Introduction To Time Series Analysis
No ratings yet
Introduction To Time Series Analysis
93 pages
T.Y.B.Sc. (Computer Science) - 07.07.2021
No ratings yet
T.Y.B.Sc. (Computer Science) - 07.07.2021
46 pages
Chapter 6 Measures of Skewness and Kurtosis
No ratings yet
Chapter 6 Measures of Skewness and Kurtosis
25 pages
Machine Learning Exam Prep
No ratings yet
Machine Learning Exam Prep
5 pages
Mc4301 APR May 24 (Machine Learning)
No ratings yet
Mc4301 APR May 24 (Machine Learning)
3 pages
Programming in Java - NPTEL - Assignments Solutions 2024
No ratings yet
Programming in Java - NPTEL - Assignments Solutions 2024
135 pages
CS8091 - Big Data Analytics - Unit 1
No ratings yet
CS8091 - Big Data Analytics - Unit 1
28 pages
PPT1
No ratings yet
PPT1
93 pages
FDS Iat-2 Part-B
No ratings yet
FDS Iat-2 Part-B
4 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
WEKA Guide for ML Practitioners
No ratings yet
WEKA Guide for ML Practitioners
58 pages
RAID Is A Redundant Array of Inexpensive Disks
No ratings yet
RAID Is A Redundant Array of Inexpensive Disks
7 pages
Cheat Sheet Full
100% (1)
Cheat Sheet Full
4 pages
Logcat
No ratings yet
Logcat
4,950 pages
Chapter 1-3 Sample
No ratings yet
Chapter 1-3 Sample
18 pages
tmagc Privacy
No ratings yet
tmagc Privacy
14 pages
Updated Chapter One - Introduction
No ratings yet
Updated Chapter One - Introduction
24 pages
The Datto Advantage: Products Built For The MSP
No ratings yet
The Datto Advantage: Products Built For The MSP
36 pages
DDOS Attack Reckon
No ratings yet
DDOS Attack Reckon
17 pages
How To... Configure and Use Time Dependent Hierarchy in SAP BPC 10.0 Version For NetWeaver
No ratings yet
How To... Configure and Use Time Dependent Hierarchy in SAP BPC 10.0 Version For NetWeaver
53 pages
Release NotesMagnifiGO 5.3R3
No ratings yet
Release NotesMagnifiGO 5.3R3
3 pages
LinuxFoundation CKS v2021-09-20 q9
No ratings yet
LinuxFoundation CKS v2021-09-20 q9
10 pages
Buy Verified Cash App Accounts
No ratings yet
Buy Verified Cash App Accounts
7 pages
Traditional Commerce Vs e Comm
No ratings yet
Traditional Commerce Vs e Comm
9 pages
Digital Banking: A Scenario of Bangladesh: Mohammad Zoynul Abedin and Md. Mahabub Alom
No ratings yet
Digital Banking: A Scenario of Bangladesh: Mohammad Zoynul Abedin and Md. Mahabub Alom
14 pages
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
4 pages
Textile Shop Management System
50% (2)
Textile Shop Management System
64 pages
01 03 Storage Structure
No ratings yet
01 03 Storage Structure
2 pages
Chapter 8 Introduction To DBMS Notes - Important Points - CS-IP-Learning-Hub
No ratings yet
Chapter 8 Introduction To DBMS Notes - Important Points - CS-IP-Learning-Hub
4 pages
Advanced Software Dev Course Guide
100% (1)
Advanced Software Dev Course Guide
6 pages
Blockchain in Marketing
100% (1)
Blockchain in Marketing
14 pages
Oracle Bi Applications Release Announcement
No ratings yet
Oracle Bi Applications Release Announcement
2 pages
PNP Policy
100% (6)
PNP Policy
8 pages
Topical Authority Workshop
No ratings yet
Topical Authority Workshop
27 pages
Data Science Bootcamp Overview
No ratings yet
Data Science Bootcamp Overview
19 pages
A Comparison of Network Simulation and Emulation Virtualization Tools
No ratings yet
A Comparison of Network Simulation and Emulation Virtualization Tools
9 pages
Database Systems for Students
No ratings yet
Database Systems for Students
18 pages
Research Paper-SQA
No ratings yet
Research Paper-SQA
8 pages
College Admission System SRS
100% (1)
College Admission System SRS
30 pages
Categorical Data in Python Guide
No ratings yet
Categorical Data in Python Guide
33 pages
Database:: Introduction To Database: A. B. C. D
No ratings yet
Database:: Introduction To Database: A. B. C. D
4 pages

Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3

Uploaded by

Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3

Uploaded by

Data Mining Term Project

Machine Learning with WEKA

Weka Explorer Tutorial

Machine learning methods

Introduction to the Tutorial

Open File (from the local filesystem)

Open File (from a website)

Filter Configuration Options

Setting Test Options

Output the Result

Error Visualizer (contd)

Clustering Data (contd)

Results of Clustering in ARFF File

Setting test options

Attribute Selection (contd)

Data Visualization (contd)

Why should we use WEKA

An answer has been found in

either incrementally trainable or can be

I. Witten, E. Frank, Data Mining, Practical Machine.

You might also like