Introduction to Weka
Overview
What is Weka?
Where to find Weka?
Command Line Vs GUI
Datasets in Weka
ARFF Files
Classifiers in Weka
Filters
What is Weka?
Weka is a collection of machine learning
algorithms for data mining tasks. The
algorithms can either be applied directly to a
dataset or called from your own Java code.
Weka contains tools for data pre-processing,
classification, regression, clustering,
association rules, and visualization. It is also
well-suited for developing new machine
learning schemes.
Where to find Weka
Weka website (Latest version 3.6):
– http://www.cs.waikato.ac.nz/ml/weka/
Weka Manual:
− http://transact.dl.sourceforge.net/sourcefor
ge/weka/WekaManual-3.6.0.pdf
CLI Vs GUI
Recommended for in-depth usage
Explorer
Offers some functionality not
Experimenter
available via the GUI
Knowledge Flow
Datasets in Weka
Each entry in a dataset is an instance of the
java class:
− weka.core.Instance
Each instance consists of a number of
attributes
Attributes
Nominal: one of a predefined list of values
− e.g. red, green, blue
Numeric: A real or integer number
String: Enclosed in “double quotes”
Date
Relational
ARFF Files
The external representation of an Instances
class
Consists of:
− A header: Describes the attribute types
− Data section: Comma separated list of data
ARFF File Example
Dataset name
Comment
Attributes
Target / Class variable
Data Values
Assignment ARFF Files
Credit-g
Heart-c
Hepatitis
Vowel
Zoo
http://www.cs.auckland.ac.nz/~pat/weka/
ARFF Files
Basic statistics and validation by running:
− java weka.core.Instances data/soybean.arff
Classifiers in Weka
Learning algorithms in Weka are derived from
the abstract class:
− weka.classifiers.Classifier
Simple classifier: ZeroR
− Just determines the most common class
− Or the median (in the case of numeric
values)
− Tests how well the class can be predicted
without considering other attributes
− Can be used as a Lower Bound on
Performance.
Classifiers in Weka
Simple Classifier Example
− java weka.classifiers.rules.ZeroR -t
data/weather.arff
− java weka.classifiers.trees.J48 -t
data/weather.arff
Help Command
− java weka.classifiers.trees.J48 -h
Classifiers in Weka
Soybean.arff split into train and test set
– Soybean-train.arff
– Soybean-test.arff Training data
Input command:
– java weka.classifiers.trees.J48 -t soybean-
train.arff -T soybean-test.arff -i
Test data Provides more detailed
output
Soybean Results
Soybean Results (cont...)
Soybean Results (cont...)
• True Positive (TP)
– Proportion classified as class x / Actual total in
class x
– Equivalent to Recall
• False Positive (FP)
– Proportion incorrectly classified as class x /
Actual total of all classes, except x
Soybean Results (cont...)
• Precision:
– Proportion of the examples which truly have
class x / Total classified as class x
• F-measure:
– 2*Precision*Recall / (Precision + Recall)
– i.e. A combined measure for precision and
recall
Soybean Results (cont...)
Total Actual h
Total Classified as h Total Correct
Filters
weka.filters package
Transform datasets
Support for data preprocessing
− e.g. Removing/Adding Attributes
− e.g. Discretize numeric attributes into
nominal ones
More info in Weka Manual p. 15 & 16.
More Classifiers
Explorer
• Preprocess
• Classify
• Cluster
• Associate
• Select attributes
• Visualize
Preprocess
• Load Data
• Preprocess Data
• Analyse Attributes
Classify
• Select Test Options e.g:
– Use Training Set
– % Split,
– Cross Validation...
• Run classifiers
• View results
Classify
Results
Experimenter
• Allows users to create, run, modify and
analyse experiments in a more convenient
manner than when processing individually.
– Setup
– Run
– Analyse
Experimenter: Setup
• Simple/Advanced
• Results Destinations
– ARFF
– CSV
– JDBC Database
10-fold
Cross Datasets
Validation
Num of
runs
Classifiers
Run Simple Experiment
Results
Advanced Example
Multiple Classifiers
Advanced Example