0% found this document useful (0 votes)

73 views39 pages

IFN645Lecture4 - Feature Selection - 2021

The document discusses feature selection techniques for large scale data mining. It begins by introducing the topic of feature selection and the problems associated with initial feature sets, such as irrelevant, redundant, and interacting features. It then describes three common approaches to feature selection: manual selection, wrapper methods, and filter methods. For manual selection, domain experts are consulted to identify important features. Wrapper methods treat feature selection as a search problem to optimize a learning algorithm. Filter methods assign scores to features based on intrinsic properties. The document provides examples of these approaches and discusses their advantages and limitations.

Uploaded by

meghraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views39 pages

IFN645Lecture4 - Feature Selection - 2021

Uploaded by

meghraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

IFN645 Large Scale Data Mining

Lecture 4 – Feature Selection

Faculty of Science and Engineering

Semester 2, 2021

CRICOS No. 00213J

Queensland University of Technology
Last Week

• Major types of Clustering algorithms

• Hierarchical
• EM
• DBSCAN
• Canopy

a university for the real world

R
2
CRICOS No. 00213J
Practical

• Use Weka
– Running KMeans
– Running Hierarchical Clustering
– Running EM
– Running Canopy
• DBScan
– Install it
– Activate it
• Accessing Weka in Java

a university for the real world

R
3
CRICOS No. 00213J
This Week Lecture

• Feature Selection
– Manual Selection
– Wrappers
– Filters

a university for the real world

R
4
CRICOS No. 00213J
FEATURE SELECTION

a university for the real world

R
5
CRICOS No. 00213J
Feature Selection

a university for the real world

R
6
CRICOS No. 00213J
Problems in Initial Features

• Irreverent – Features that actively add noise to the system

• Redundant – Features do not add anything to the system
• Interacting features – Appear to be irrelevant, but can be relevant if
combined with others

a university for the real world

R
7
CRICOS No. 00213J
Feature Extraction vs Feature Selection

• Deal with the curse of dimensionality issue

• Improve data mining quality

a university for the real world

R
8
CRICOS No. 00213J
Feature Selection
Use some features/dimensions

a university for the real world

R
9
CRICOS No. 00213J
Feature Selection

• Usually, the initial feature set is large and contains irrelevant or redundant
features.
• The goal of feature selection is to manually or automatically select a subset
of features which retain most of the relevant information.
– To improve performance by removing irrelevant or redundant features (noise)
– To increase execution speed
– To decrease memory requirements
– To increase generalization by reducing the risk of overfitting

a university for the real world

R
10
CRICOS No. 00213J
Feature Selection

a university for the real world

R
11
CRICOS No. 00213J
Feature Selection Approaches

1. Manual feature selection

2. Wrapper methods - Consider feature selection as a search problem
3. Filter feature selection - Assign a score to features
4. Embedded methods – combine wrapper and filter techniques to learn features.

a university for the real world

R
12
CRICOS No. 00213J
MANUAL SELECTION

a university for the real world

R
13
CRICOS No. 00213J
Manual Selection

• Usually we need to get domain experts’ advice about which features to keep.
• Experts on the application domain.
• E.g. Glass dataset in Weka with J48 algorithm (cross-validation, 10 folders)
– Using all features, accuracy is 66.8%
– Remove feature ‘Fe’, 67.3%
– Remove feature ‘AI’, 70.6%
– Remove ‘RI’ and ‘Mg’, 65.4%
– Remove everything except for features ‘RI’ and ‘Mg’, 68.7%
– Remove everything except for ‘RI’, ‘Na’, ‘Mg,’ ‘Ca’, ‘Ba’, 73.8%

• For the Glass dataset

– Mg and RI look important

a university for the real world

R
14
CRICOS No. 00213J
Manual Selection

a university for the real world

R
15
CRICOS No. 00213J
Manual Selection

• Problems with manual selection

1. It is an inefficient process, which does not scale.
2. It can only be applied to objects whose features can easily be manually
identified.

a university for the real world

R
16
CRICOS No. 00213J
WRAPPER

a university for the real world

R
17
CRICOS No. 00213J
Wrapper

• Wrapper-based feature selection will take a mining model into consideration

to determine a set of features based on the performance of the mining model.
• Feature selection process is a search process, e.g., using a greedy search.
• Basic strategy
– Create a subset of features
– Use a mining model to evaluate the accuracy of using the set of features.
– Based on the accuracy to either accept or reject the set of features.

a university for the real world

R
18
CRICOS No. 00213J
Wrapper

• Basic approach of wrapper-based feature selection using exhaustive search

– Pick a feature subset and pass it into a learning algorithm for feature selection.
– Train the learning model with the training set
– Calculate the accuracy of the learning model with a test set
– Repeat for all feature subsets and pick the feature subset which has the highest
evaluation result.
• Basic approach is simple
• Can be inefficient since there are an exponential number of subsets

a university for the real world

R
19
CRICOS No. 00213J
Search Strategies

a university for the real world

R
20
CRICOS No. 00213J
Weka Wrapper: WraperSubsetEval

a university for the real world

R
21
CRICOS No. 00213J
Weka Wrapper: Cross-validation

a university for the real world

R
22
CRICOS No. 00213J
Weka Wrapper: learning algorithm

a university for the real world

R
23
CRICOS No. 00213J
Weka Wrapper
• Select attributes and apply classifier J48 or IBK
– Dataset: glass.arff Classifier
– No attribute selection and use default parameters Attributes J48 IBk
everywhere All attributes 67% 71%
– Use Wrapper selection with J48: {RI, Mg, AI, K, Ba} {RI, Mg, AI, K, Ba} 71%
– Use Wrapper selection with IBK: { RI, Mg, AI, K, Ca, Ba} { RI, Mg, AI, K, Ca, Ba} 78%

a university for the real world

R
24
CRICOS No. 00213J
Weka Wrapper
• Attribute selection and classification together (in meta)
– Select and configure “AttributeSelectedClassifier”

a university for the real world

R
25
CRICOS No. 00213J
Weka Wrapper
• Attribute selection and classification together (in meta)

– Use AttributeSelectedClassifier to wrap J48 Classifier

Wrapper J48 IBk
J48 72% 74%
– Use AttributeSelectedClassifier to wrap IBk IBk 70% 72%

• Without using attribute selection

Classifier
J48 IBk
67% 71%

• Classification performance is increased after attribute selection

a university for the real world

R
26
CRICOS No. 00213J
Wrapper-based feature selection

• Wrapper-based feature selection is simple and direct, but slow

• Either
– Use a single-attribute evaluator, eliminate irrelevant attributes
– Use a subset evaluator with a search method

• Wrapper methods are scheme-dependent

• Filters are scheme-independent

a university for the real world

R
27
CRICOS No. 00213J
FILTERS

a university for the real world

R
28
CRICOS No. 00213J
Feature Selection - Filters

• Filters work independent of particular data mining models, i.e., scheme-

independent.
• Filters seek a subset of features which maximize or minimize some merit
scores, e.g., feature similarity/distance, feature correlation, and information
gain, etc.
– Can score each feature individually, then choose the top features based on the score.
– Can score subsets of features together, then choose the best subset of features based
on the score.

a university for the real world

R
29
CRICOS No. 00213J
Subset Filter: CfsSubsetEval

• CfsSubsetEval: a scheme-independent attribute subset evaluator.

– An attribute subset is good if the attributes in the subset are
• Highly correlated with the class attribute
• Not strongly correlated with one another

σ𝑥∈𝐴 𝐶(𝑥, 𝑐𝑙𝑎𝑠𝑠_𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒)

𝐺𝑜𝑜𝑑𝑛𝑒𝑠𝑠 𝐴 =
σ𝑥∈𝐴 σ𝑦∈𝐴 𝐶(𝑥, 𝑦)

‘A’ is a subset of attributes, C(x,y) measures the correlation between x and y.

Goodness(A) measures how good the subset is.

a university for the real world

R
30
CRICOS No. 00213J
Single-attribute Filters

• Two Single-attribute filters:

InfoGainAttributeEval , GainRatioAttributeEval
– Measure attributes’ information gain or gain ratio with respect to the class
attribute.
– The higher the measurement score, the higher the correlation between
the attribute and the class attribute.

a university for the real world

R
31
CRICOS No. 00213J
Ranking

• Attribute subset selection involves

– Attribute subset evaluation measure
– Search method
• Searching is slow because it finds a good subset by traversing a large space
of attribute subsets.
• Alternative: use a single-attribute evaluator with a ranking scheme for
individual attributes.
– Rank individual attributes by their individual evaluation score
– Eliminate irrelevant attributes with lower evaluation scores

a university for the real world

R
32
CRICOS No. 00213J
Ranking Metric in Weka

• Metrics for evaluating attributes, we have seen some of them before

– InfoGainAttributeEval
information gain is used by decision tree algorithm C4.5 (i.e., J48)
– GainRatioAttributeEval
gain ratio is used by decision tree algorithm (i.e., J48)
– OneRAttributeEval
This is the evaluation value used in OneR algorithm
– SymmetricalUncertAttribute
symmetric uncertainty
• Ranker in Weka sorts attributes according to their evaluation scores and return
top-k attributes.

a university for the real world

R
33
CRICOS No. 00213J
InfoGainAttributeEval
For a given dataset D and an attribute A, the information gain of A is defined by
𝐺𝑎𝑖𝑛 𝐴 = 𝐻 𝐷 − H𝐴 𝐷 , 𝐻 𝐷 is the entropy of D
𝐻 D = − σ𝑘𝑖=1 𝑃𝑖 log(𝑃𝑖 ), 𝑃𝑖 is the probability of the ith class.
|𝐷𝑗 |
H𝐴 𝐷 = σ𝑣𝑗=1 𝐻(𝐷𝑗 ) , 𝐷𝑗 is a subset of D consisting of records with A being the jth value.
|𝐷|
𝐶𝑖,𝐷
𝑃𝑖 =
𝐷
, 𝐶𝑖,𝐷 is a subset of D, contains records in the ith class. Temperature Wind Class
|𝐶𝑖,𝐷 | is the number of records whose class is the ith class. High Low Play
𝐷 is the number of records in D. Low Low Play
High Low Play
class 1: Play; class 2: Cancelled Low High Cancelled
𝑃𝑝𝑙𝑎𝑦 = 5/7 = 0.71 Low Low Play
𝑃𝑐𝑎𝑛𝑐𝑒𝑙𝑙𝑒𝑑 = 2/7 = 0.29 High High Cancelled
High Low Play

a university for the real world

R
34
CRICOS No. 00213J
InfoGainAttributeEval
𝑮𝒂𝒊𝒏 𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆 = 𝐻 𝐷 − H𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝐷 𝑃𝒑𝒍𝒂𝒚= 0.71
𝑃𝒄𝒂𝒏𝒄𝒆𝒍𝒍𝒆𝒅 = 0.29

𝐻 D = − σ𝑘𝑖=1 𝑃𝑖 𝑙𝑜𝑔2 𝑃𝑖 =-(0.71* 𝑙𝑜𝑔2 (0.71) + 0.29* 𝑙𝑜𝑔2 (0.29)) = 0.863

𝐷𝑗 3 4
H𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝐷 = σ𝑣𝑗=1 𝐻 𝐷𝑗 = ∗ 𝐻 𝐷𝑙𝑜𝑤 + ∗ 𝐻 𝐷ℎ𝑖𝑔ℎ
𝐷 7 7
1 𝟏 𝟐 𝟐
𝐻 𝐷𝑙𝑜𝑤 = −( * 𝑙𝑜𝑔2 + *𝑙𝑜𝑔2 ) = 0.918 Temperature Wind Class
3 𝟑 𝟑 𝟑
3 𝟑 𝟏 𝟏 High Low Play
𝐻 𝐷ℎ𝑖𝑔ℎ = −(
4
* 𝑙𝑜𝑔2
𝟒
+ *𝑙𝑜𝑔2
𝟒 𝟒
) =0.811
Low Low Play
3 4 High Low Play
H𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝐷 = ∗ 0.918 + ∗ 0.811 = 0.857
7 7 Low High Cancelled
Low Low Play
𝐺𝑎𝑖𝑛 𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝐻 𝐷 − H𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝐷
High High Cancelled
= 0.863 – 0.857 = 0.006 High Low Play

a university for the real world

R
35
CRICOS No. 00213J
CONCLUSION

a university for the real world

R
36
CRICOS No. 00213J
This Week

• Feature Selection
– Manual Selection
– Wrappers
– Scheme-independent filters
– Attribute ranking

a university for the real world

R
37
CRICOS No. 00213J
Practical

• In Weka, select attributes using

– Manual Selection
– Wrappers
– Filters
– Ranking
• Accessing Weka in Java

a university for the real world

R
38
CRICOS No. 00213J
Thank You

CRICOS No. 00213J

Queensland University of Technology 39

7 Selectia Trasaturilor
No ratings yet
7 Selectia Trasaturilor
54 pages
کتاب پنجم بارگزاری شده
No ratings yet
کتاب پنجم بارگزاری شده
35 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
Feature Selection
No ratings yet
Feature Selection
22 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Presentation 1
No ratings yet
Presentation 1
22 pages
Class Basic
No ratings yet
Class Basic
75 pages
Module 3
No ratings yet
Module 3
33 pages
Unit 3
No ratings yet
Unit 3
95 pages
Unit 3 DVA
No ratings yet
Unit 3 DVA
50 pages
Feature Selection for Data Scientists
No ratings yet
Feature Selection for Data Scientists
56 pages
Business Intelligence DM4 Feature Selection
No ratings yet
Business Intelligence DM4 Feature Selection
19 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
5 pages
Module-3 - DS (Autosaved)
No ratings yet
Module-3 - DS (Autosaved)
18 pages
Introduction To Feature Selection Methods With An Example
No ratings yet
Introduction To Feature Selection Methods With An Example
10 pages
Korpela Introduction
No ratings yet
Korpela Introduction
25 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
9-Module 5 Decision Tree-21-03-2024
No ratings yet
9-Module 5 Decision Tree-21-03-2024
83 pages
PPT4 TOPIK4 R0 Predictive Modeling
No ratings yet
PPT4 TOPIK4 R0 Predictive Modeling
35 pages
Filter Based Feature Selection Using ANOVA: Suppose A Company Wants To Analyze Whether The
No ratings yet
Filter Based Feature Selection Using ANOVA: Suppose A Company Wants To Analyze Whether The
66 pages
Data Mining Practical Guide
No ratings yet
Data Mining Practical Guide
27 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
CS464 Ch5 FeatureSelection
No ratings yet
CS464 Ch5 FeatureSelection
31 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Class4 MoreDataMiningWithWeka 2014 Old Version
No ratings yet
Class4 MoreDataMiningWithWeka 2014 Old Version
43 pages
Experiment No. 7
No ratings yet
Experiment No. 7
4 pages
3ML.03.Feature Reduction
No ratings yet
3ML.03.Feature Reduction
44 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
Correlation-Based Feature Selection
No ratings yet
Correlation-Based Feature Selection
4 pages
Decision Tree Basics
No ratings yet
Decision Tree Basics
30 pages
Feature Selection Techniques
No ratings yet
Feature Selection Techniques
5 pages
Feature Selection
No ratings yet
Feature Selection
36 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
WEKA Machine Learning Tutorials
No ratings yet
WEKA Machine Learning Tutorials
5 pages
Decision Trees for Data Mining Students
No ratings yet
Decision Trees for Data Mining Students
30 pages
Data Mining Lab Manual Student - Copy - For - Print
No ratings yet
Data Mining Lab Manual Student - Copy - For - Print
24 pages
A Comparative Study Between Feature Selection Algorithms - Ok
No ratings yet
A Comparative Study Between Feature Selection Algorithms - Ok
10 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
69 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
Feature Selection 1692278667
No ratings yet
Feature Selection 1692278667
100 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
ML Lec-12
No ratings yet
ML Lec-12
17 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Decision Tree
No ratings yet
Decision Tree
22 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Data Warehousing Lab Guide
No ratings yet
Data Warehousing Lab Guide
55 pages
Eature Engineering: Presenter: Prof. Amit Kumar Das
No ratings yet
Eature Engineering: Presenter: Prof. Amit Kumar Das
17 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Shap-Select:: Lightweight Feature Selection Using SHAP Values and Regression
No ratings yet
Shap-Select:: Lightweight Feature Selection Using SHAP Values and Regression
13 pages
Engg Chemistry Lab
No ratings yet
Engg Chemistry Lab
2 pages
102-Article Text-176-1-10-20240521
No ratings yet
102-Article Text-176-1-10-20240521
23 pages
Chapter 9 Infinite Series
No ratings yet
Chapter 9 Infinite Series
25 pages
Structural Analysis Techniques
No ratings yet
Structural Analysis Techniques
10 pages
Hoffman Calculus Ch0 To Ch3
No ratings yet
Hoffman Calculus Ch0 To Ch3
330 pages
Fire Engineering for Aspiring Leaders
No ratings yet
Fire Engineering for Aspiring Leaders
17 pages
2.5 ApplyingthePowerRuleQuiz
No ratings yet
2.5 ApplyingthePowerRuleQuiz
3 pages
From The Numerical Solution To The Symbolic Form.
No ratings yet
From The Numerical Solution To The Symbolic Form.
10 pages
Advanced Generating Functions
No ratings yet
Advanced Generating Functions
9 pages
Knot Tying Math Exercise
No ratings yet
Knot Tying Math Exercise
2 pages
Big M Method for LPP Solutions
No ratings yet
Big M Method for LPP Solutions
9 pages
Solution Manual Elias M - Stein Rami Shakarchi-Real Analysis PDF
100% (4)
Solution Manual Elias M - Stein Rami Shakarchi-Real Analysis PDF
112 pages
200L BCM Practical Past Questions
No ratings yet
200L BCM Practical Past Questions
13 pages
Control of A Multivariable System Using Optimal Control Pairs: A Quadruple-Tank Process
No ratings yet
Control of A Multivariable System Using Optimal Control Pairs: A Quadruple-Tank Process
100 pages
A2 M Bronze P2 A MS PDF
No ratings yet
A2 M Bronze P2 A MS PDF
13 pages
Solution 4
No ratings yet
Solution 4
8 pages
V. S. Vladimirov - Equations of Mathematical Physics B
No ratings yet
V. S. Vladimirov - Equations of Mathematical Physics B
427 pages
Creating An SDG-Oriented Youth Engagement Plan - An Action Researc
No ratings yet
Creating An SDG-Oriented Youth Engagement Plan - An Action Researc
21 pages
Testing of Hypothesis - Quiz 2
No ratings yet
Testing of Hypothesis - Quiz 2
5 pages
MTES3083 Calculus Kemaskini 2017 PDF
No ratings yet
MTES3083 Calculus Kemaskini 2017 PDF
8 pages
Assignement 2 Geo
No ratings yet
Assignement 2 Geo
7 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Rifampicin & Piperine Estimation Method
No ratings yet
Rifampicin & Piperine Estimation Method
5 pages
Reliability & Maintainability Engineering Ebeling Chapter 3 Book Solutions - Constant Failure PDF
100% (1)
Reliability & Maintainability Engineering Ebeling Chapter 3 Book Solutions - Constant Failure PDF
11 pages
Engineering Mathematics - I (MATH ZC 161) : BITS Pilani
No ratings yet
Engineering Mathematics - I (MATH ZC 161) : BITS Pilani
53 pages
Random Variable 1 PDF
No ratings yet
Random Variable 1 PDF
4 pages
Introduction To Business Analytics Hokey Min
No ratings yet
Introduction To Business Analytics Hokey Min
14 pages
PH Bromocresol Green
No ratings yet
PH Bromocresol Green
3 pages
Analytical Chemistry: Chromatographic Techniques Chromatographic Techniques-TLC, HPTLC, IEC
No ratings yet
Analytical Chemistry: Chromatographic Techniques Chromatographic Techniques-TLC, HPTLC, IEC
13 pages
Class Notes - 1547284812
No ratings yet
Class Notes - 1547284812
150 pages

IFN645Lecture4 - Feature Selection - 2021

Uploaded by

IFN645Lecture4 - Feature Selection - 2021

Uploaded by

IFN645 Large Scale Data Mining

Lecture 4 – Feature Selection

Faculty of Science and Engineering

CRICOS No. 00213J

• Major types of Clustering algorithms

a university for the real world

a university for the real world

a university for the real world

a university for the real world

a university for the real world

• Irreverent – Features that actively add noise to the system

a university for the real world

• Deal with the curse of dimensionality issue

a university for the real world

a university for the real world

a university for the real world

a university for the real world

1. Manual feature selection

a university for the real world

a university for the real world

• For the Glass dataset

a university for the real world

a university for the real world

• Problems with manual selection

a university for the real world

a university for the real world

• Wrapper-based feature selection will take a mining model into consideration

a university for the real world

• Basic approach of wrapper-based feature selection using exhaustive search

a university for the real world

a university for the real world

a university for the real world

a university for the real world

a university for the real world

a university for the real world

a university for the real world

– Use AttributeSelectedClassifier to wrap J48 Classifier

• Without using attribute selection

• Classification performance is increased after attribute selection

a university for the real world

• Wrapper-based feature selection is simple and direct, but slow

• Wrapper methods are scheme-dependent

a university for the real world

a university for the real world

• Filters work independent of particular data mining models, i.e., scheme-

a university for the real world

• CfsSubsetEval: a scheme-independent attribute subset evaluator.

σ𝑥∈𝐴 𝐶(𝑥, 𝑐𝑙𝑎𝑠𝑠_𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒)

‘A’ is a subset of attributes, C(x,y) measures the correlation between x and y.

a university for the real world

• Two Single-attribute filters:

a university for the real world

• Attribute subset selection involves

a university for the real world

• Metrics for evaluating attributes, we have seen some of them before

a university for the real world

a university for the real world

𝐻 D = − σ𝑘𝑖=1 𝑃𝑖 𝑙𝑜𝑔2 𝑃𝑖 =-(0.71* 𝑙𝑜𝑔2 (0.71) + 0.29* 𝑙𝑜𝑔2 (0.29)) = 0.863

a university for the real world

a university for the real world

a university for the real world

• In Weka, select attributes using

a university for the real world

CRICOS No. 00213J

You might also like