KEMBAR78
IFN645Lecture4 - Feature Selection - 2021 | PDF | Systems Science | Computer Science
0% found this document useful (0 votes)
73 views39 pages

IFN645Lecture4 - Feature Selection - 2021

The document discusses feature selection techniques for large scale data mining. It begins by introducing the topic of feature selection and the problems associated with initial feature sets, such as irrelevant, redundant, and interacting features. It then describes three common approaches to feature selection: manual selection, wrapper methods, and filter methods. For manual selection, domain experts are consulted to identify important features. Wrapper methods treat feature selection as a search problem to optimize a learning algorithm. Filter methods assign scores to features based on intrinsic properties. The document provides examples of these approaches and discusses their advantages and limitations.

Uploaded by

meghraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views39 pages

IFN645Lecture4 - Feature Selection - 2021

The document discusses feature selection techniques for large scale data mining. It begins by introducing the topic of feature selection and the problems associated with initial feature sets, such as irrelevant, redundant, and interacting features. It then describes three common approaches to feature selection: manual selection, wrapper methods, and filter methods. For manual selection, domain experts are consulted to identify important features. Wrapper methods treat feature selection as a search problem to optimize a learning algorithm. Filter methods assign scores to features based on intrinsic properties. The document provides examples of these approaches and discusses their advantages and limitations.

Uploaded by

meghraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

IFN645 Large Scale Data Mining

Lecture 4 – Feature Selection

Faculty of Science and Engineering


Semester 2, 2021

CRICOS No. 00213J


Queensland University of Technology
Last Week

• Major types of Clustering algorithms


• Hierarchical
• EM
• DBSCAN
• Canopy

a university for the real world


R
2
CRICOS No. 00213J
Practical

• Use Weka
– Running KMeans
– Running Hierarchical Clustering
– Running EM
– Running Canopy
• DBScan
– Install it
– Activate it
• Accessing Weka in Java

a university for the real world


R
3
CRICOS No. 00213J
This Week Lecture

• Feature Selection
– Manual Selection
– Wrappers
– Filters

a university for the real world


R
4
CRICOS No. 00213J
FEATURE SELECTION

a university for the real world


R
5
CRICOS No. 00213J
Feature Selection

a university for the real world


R
6
CRICOS No. 00213J
Problems in Initial Features

• Irreverent – Features that actively add noise to the system


• Redundant – Features do not add anything to the system
• Interacting features – Appear to be irrelevant, but can be relevant if
combined with others

a university for the real world


R
7
CRICOS No. 00213J
Feature Extraction vs Feature Selection

• Deal with the curse of dimensionality issue


• Improve data mining quality

a university for the real world


R
8
CRICOS No. 00213J
Feature Selection
Use some features/dimensions

a university for the real world


R
9
CRICOS No. 00213J
Feature Selection

• Usually, the initial feature set is large and contains irrelevant or redundant
features.
• The goal of feature selection is to manually or automatically select a subset
of features which retain most of the relevant information.
– To improve performance by removing irrelevant or redundant features (noise)
– To increase execution speed
– To decrease memory requirements
– To increase generalization by reducing the risk of overfitting

a university for the real world


R
10
CRICOS No. 00213J
Feature Selection

a university for the real world


R
11
CRICOS No. 00213J
Feature Selection Approaches

1. Manual feature selection


2. Wrapper methods - Consider feature selection as a search problem
3. Filter feature selection - Assign a score to features
4. Embedded methods – combine wrapper and filter techniques to learn features.

a university for the real world


R
12
CRICOS No. 00213J
MANUAL SELECTION

a university for the real world


R
13
CRICOS No. 00213J
Manual Selection

• Usually we need to get domain experts’ advice about which features to keep.
• Experts on the application domain.
• E.g. Glass dataset in Weka with J48 algorithm (cross-validation, 10 folders)
– Using all features, accuracy is 66.8%
– Remove feature ‘Fe’, 67.3%
– Remove feature ‘AI’, 70.6%
– Remove ‘RI’ and ‘Mg’, 65.4%
– Remove everything except for features ‘RI’ and ‘Mg’, 68.7%
– Remove everything except for ‘RI’, ‘Na’, ‘Mg,’ ‘Ca’, ‘Ba’, 73.8%

• For the Glass dataset


– Mg and RI look important

a university for the real world


R
14
CRICOS No. 00213J
Manual Selection

a university for the real world


R
15
CRICOS No. 00213J
Manual Selection

• Problems with manual selection


1. It is an inefficient process, which does not scale.
2. It can only be applied to objects whose features can easily be manually
identified.

a university for the real world


R
16
CRICOS No. 00213J
WRAPPER

a university for the real world


R
17
CRICOS No. 00213J
Wrapper

• Wrapper-based feature selection will take a mining model into consideration


to determine a set of features based on the performance of the mining model.
• Feature selection process is a search process, e.g., using a greedy search.
• Basic strategy
– Create a subset of features
– Use a mining model to evaluate the accuracy of using the set of features.
– Based on the accuracy to either accept or reject the set of features.

a university for the real world


R
18
CRICOS No. 00213J
Wrapper

• Basic approach of wrapper-based feature selection using exhaustive search


– Pick a feature subset and pass it into a learning algorithm for feature selection.
– Train the learning model with the training set
– Calculate the accuracy of the learning model with a test set
– Repeat for all feature subsets and pick the feature subset which has the highest
evaluation result.
• Basic approach is simple
• Can be inefficient since there are an exponential number of subsets

a university for the real world


R
19
CRICOS No. 00213J
Search Strategies

a university for the real world


R
20
CRICOS No. 00213J
Weka Wrapper: WraperSubsetEval

a university for the real world


R
21
CRICOS No. 00213J
Weka Wrapper: Cross-validation

a university for the real world


R
22
CRICOS No. 00213J
Weka Wrapper: learning algorithm

a university for the real world


R
23
CRICOS No. 00213J
Weka Wrapper
• Select attributes and apply classifier J48 or IBK
– Dataset: glass.arff Classifier
– No attribute selection and use default parameters Attributes J48 IBk
everywhere All attributes 67% 71%
– Use Wrapper selection with J48: {RI, Mg, AI, K, Ba} {RI, Mg, AI, K, Ba} 71%
– Use Wrapper selection with IBK: { RI, Mg, AI, K, Ca, Ba} { RI, Mg, AI, K, Ca, Ba} 78%

a university for the real world


R
24
CRICOS No. 00213J
Weka Wrapper
• Attribute selection and classification together (in meta)
– Select and configure “AttributeSelectedClassifier”

a university for the real world


R
25
CRICOS No. 00213J
Weka Wrapper
• Attribute selection and classification together (in meta)

– Use AttributeSelectedClassifier to wrap J48 Classifier


Wrapper J48 IBk
J48 72% 74%
– Use AttributeSelectedClassifier to wrap IBk IBk 70% 72%

• Without using attribute selection


Classifier
J48 IBk
67% 71%

• Classification performance is increased after attribute selection

a university for the real world


R
26
CRICOS No. 00213J
Wrapper-based feature selection

• Wrapper-based feature selection is simple and direct, but slow


• Either
– Use a single-attribute evaluator, eliminate irrelevant attributes
– Use a subset evaluator with a search method

• Wrapper methods are scheme-dependent


• Filters are scheme-independent

a university for the real world


R
27
CRICOS No. 00213J
FILTERS

a university for the real world


R
28
CRICOS No. 00213J
Feature Selection - Filters

• Filters work independent of particular data mining models, i.e., scheme-


independent.
• Filters seek a subset of features which maximize or minimize some merit
scores, e.g., feature similarity/distance, feature correlation, and information
gain, etc.
– Can score each feature individually, then choose the top features based on the score.
– Can score subsets of features together, then choose the best subset of features based
on the score.

a university for the real world


R
29
CRICOS No. 00213J
Subset Filter: CfsSubsetEval

• CfsSubsetEval: a scheme-independent attribute subset evaluator.


– An attribute subset is good if the attributes in the subset are
• Highly correlated with the class attribute
• Not strongly correlated with one another

σ𝑥∈𝐴 𝐶(𝑥, 𝑐𝑙𝑎𝑠𝑠_𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒)


𝐺𝑜𝑜𝑑𝑛𝑒𝑠𝑠 𝐴 =
σ𝑥∈𝐴 σ𝑦∈𝐴 𝐶(𝑥, 𝑦)

‘A’ is a subset of attributes, C(x,y) measures the correlation between x and y.


Goodness(A) measures how good the subset is.

a university for the real world


R
30
CRICOS No. 00213J
Single-attribute Filters

• Two Single-attribute filters:


InfoGainAttributeEval , GainRatioAttributeEval
– Measure attributes’ information gain or gain ratio with respect to the class
attribute.
– The higher the measurement score, the higher the correlation between
the attribute and the class attribute.

a university for the real world


R
31
CRICOS No. 00213J
Ranking

• Attribute subset selection involves


– Attribute subset evaluation measure
– Search method
• Searching is slow because it finds a good subset by traversing a large space
of attribute subsets.
• Alternative: use a single-attribute evaluator with a ranking scheme for
individual attributes.
– Rank individual attributes by their individual evaluation score
– Eliminate irrelevant attributes with lower evaluation scores

a university for the real world


R
32
CRICOS No. 00213J
Ranking Metric in Weka

• Metrics for evaluating attributes, we have seen some of them before


– InfoGainAttributeEval
information gain is used by decision tree algorithm C4.5 (i.e., J48)
– GainRatioAttributeEval
gain ratio is used by decision tree algorithm (i.e., J48)
– OneRAttributeEval
This is the evaluation value used in OneR algorithm
– SymmetricalUncertAttribute
symmetric uncertainty
• Ranker in Weka sorts attributes according to their evaluation scores and return
top-k attributes.

a university for the real world


R
33
CRICOS No. 00213J
InfoGainAttributeEval
For a given dataset D and an attribute A, the information gain of A is defined by
𝐺𝑎𝑖𝑛 𝐴 = 𝐻 𝐷 − H𝐴 𝐷 , 𝐻 𝐷 is the entropy of D
𝐻 D = − σ𝑘𝑖=1 𝑃𝑖 log(𝑃𝑖 ), 𝑃𝑖 is the probability of the ith class.
|𝐷𝑗 |
H𝐴 𝐷 = σ𝑣𝑗=1 𝐻(𝐷𝑗 ) , 𝐷𝑗 is a subset of D consisting of records with A being the jth value.
|𝐷|
𝐶𝑖,𝐷
𝑃𝑖 =
𝐷
, 𝐶𝑖,𝐷 is a subset of D, contains records in the ith class. Temperature Wind Class
|𝐶𝑖,𝐷 | is the number of records whose class is the ith class. High Low Play
𝐷 is the number of records in D. Low Low Play
High Low Play
class 1: Play; class 2: Cancelled Low High Cancelled
𝑃𝑝𝑙𝑎𝑦 = 5/7 = 0.71 Low Low Play
𝑃𝑐𝑎𝑛𝑐𝑒𝑙𝑙𝑒𝑑 = 2/7 = 0.29 High High Cancelled
High Low Play

a university for the real world


R
34
CRICOS No. 00213J
InfoGainAttributeEval
𝑮𝒂𝒊𝒏 𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆 = 𝐻 𝐷 − H𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝐷 𝑃𝒑𝒍𝒂𝒚= 0.71
𝑃𝒄𝒂𝒏𝒄𝒆𝒍𝒍𝒆𝒅 = 0.29

𝐻 D = − σ𝑘𝑖=1 𝑃𝑖 𝑙𝑜𝑔2 𝑃𝑖 =-(0.71* 𝑙𝑜𝑔2 (0.71) + 0.29* 𝑙𝑜𝑔2 (0.29)) = 0.863


𝐷𝑗 3 4
H𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝐷 = σ𝑣𝑗=1 𝐻 𝐷𝑗 = ∗ 𝐻 𝐷𝑙𝑜𝑤 + ∗ 𝐻 𝐷ℎ𝑖𝑔ℎ
𝐷 7 7
1 𝟏 𝟐 𝟐
𝐻 𝐷𝑙𝑜𝑤 = −( * 𝑙𝑜𝑔2 + *𝑙𝑜𝑔2 ) = 0.918 Temperature Wind Class
3 𝟑 𝟑 𝟑
3 𝟑 𝟏 𝟏 High Low Play
𝐻 𝐷ℎ𝑖𝑔ℎ = −(
4
* 𝑙𝑜𝑔2
𝟒
+ *𝑙𝑜𝑔2
𝟒 𝟒
) =0.811
Low Low Play
3 4 High Low Play
H𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝐷 = ∗ 0.918 + ∗ 0.811 = 0.857
7 7 Low High Cancelled
Low Low Play
𝐺𝑎𝑖𝑛 𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝐻 𝐷 − H𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝐷
High High Cancelled
= 0.863 – 0.857 = 0.006 High Low Play

a university for the real world


R
35
CRICOS No. 00213J
CONCLUSION

a university for the real world


R
36
CRICOS No. 00213J
This Week

• Feature Selection
– Manual Selection
– Wrappers
– Scheme-independent filters
– Attribute ranking

a university for the real world


R
37
CRICOS No. 00213J
Practical

• In Weka, select attributes using


– Manual Selection
– Wrappers
– Filters
– Ranking
• Accessing Weka in Java

a university for the real world


R
38
CRICOS No. 00213J
Thank You

CRICOS No. 00213J


Queensland University of Technology 39

You might also like