0% found this document useful (0 votes)

17 views33 pages

Decision Tree Learning Guide

Uploaded by

gopineedivignesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views33 pages

Decision Tree Learning Guide

Uploaded by

gopineedivignesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Decision Tree

Supervised vs. Unsupervised Learning

 Supervised learning (classification)

 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels indicating
the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with the
aim of establishing the existence of classes or clusters in
the data
2
Prediction Problems: Classification vs.
Numeric Prediction
 Classification
 predicts categorical class labels (discrete or nominal)

 classifies data (constructs a model) based on the training

set and the values (class labels) in a classifying attribute
and uses it in classifying new data
 Numeric Prediction
 models continuous-valued functions, i.e., predicts
unknown or missing values
 Typical applications
 Credit/loan approval:

 Medical diagnosis: if a tumor is cancerous or benign

 Fraud detection: if a transaction is fraudulent

 Web page categorization: which category it is

3
Classification—A Two-Step Process
 Model construction: describing a set of predetermined classes
 Each tuple/sample is assumed to belong to a predefined class, as

determined by the class label attribute

 The set of tuples used for model construction is training set

 The model is represented as classification rules, decision trees, or

mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model

 The known label of test sample is compared with the classified

result from the model

 Accuracy rate is the percentage of test set samples that are

correctly classified by the model

 Test set is independent of training set (otherwise overfitting)

 If the accuracy is acceptable, use the model to classify new data

 Note: If the test set is used to select models, it is called validation (test) set
4
A is discrete valued

A is continuous valued

A is discrete valued and

2 branches (binary value)
Decision Tree

1. Learning

Training data are analyzed by a classification algorithm. Here, Class label is loan
decision; learned model or classifier is represented in the form of classification rules.
Decision Tree

2. Classification

Test data are used to estimate the accuracy of classification rules. If accuracy is
acceptable, the rules can be applied to the classification of new data tuples.
Decision Tree Induction
It is the learning of decision tree from class labeled
training tuples.

Internal nodes-test on an attribute

Branch-outcome of the test
Leaf node-holds a class label
Path is traced from root to leaf
Algorithm for Decision Tree Induction
 Basic algorithm (a greedy algorithm)
 Tree is constructed in a top-down recursive divide-and-

conquer manner
 At start, all the training examples are at the root

 Attributes are categorical (if continuous-valued, they are

discretized in advance)
 Examples are partitioned recursively based on selected

attributes
 Test attributes are selected on the basis of a heuristic or

statistical measure (e.g., information gain)

 Conditions for stopping partitioning
 All samples for a given node belong to the same class

 There are no remaining attributes for further partitioning –

majority voting is employed for classifying the leaf

 There are no samples left
11
Brief Review of Entropy


m=2

12
Attribute Selection Measure:
Information Gain (ID3/C4.5)
 Select the attribute with the highest information gain
 Let pi be the probability that an arbitrary tuple in D belongs to
class Ci, estimated by |Ci, D|/|D|
 Expected information (entropy) needed to classify a tuple in D:
m
Info( D)   pi log2 ( pi )
i 1
 Information needed (after using A to split D into v partitions) to
classify D: v | D |
InfoA ( D)    Info( D j )
j

j 1 | D |
 Information gained by branching on attribute A

Gain(A)  Info(D)  InfoA(D)

13
Attribute Selection: Information Gain

Gain(age)  Info( D)  Infoage ( D)  0.246

Gain(income)  0.029
Gain( student )  0.151
Gain(credit _ rating )  0.048

15
Computing Information-Gain for
Continuous-Valued Attributes
 Let attribute A be a continuous-valued attribute
 Must determine the best split point for A
 Sort the value A in increasing order
 Typically, the midpoint between each pair of adjacent values
is considered as a possible split point
 (ai+ai+1)/2 is the midpoint between the values of ai and ai+1
 The point with the minimum expected information
requirement for A is selected as the split-point for A
 Split:
 D1 is the set of tuples in D satisfying A ≤ split-point, and D2 is
the set of tuples in D satisfying A > split-point
18
Gain Ratio for Attribute Selection (C4.5)
 Information gain measure is biased towards attributes with a
large number of values
 Example, product id, attribute acts as an unique identifier
 A split on product id would result in large number of partitions,
each one containing just one tuple

 Gain (Product-Id) is maximal. Such a partitioning is useless for

classification
 C4.5 (a successor of ID3) uses gain ratio to overcome the
problem (normalization to information gain)
v | Dj | | Dj |
SplitInfoA ( D)    log2 ( )
j 1 | D| | D|
19
Gain Ratio for Attribute Selection (C4.5)
v | Dj | | Dj |
SplitInfoA ( D)    log2 ( )
j 1 | D| | D|

 GainRatio(A) = Gain(A)/SplitInfo(A)
 Ex. Consider income as the attribute

 gain_ratio(income) = 0.029/1.557 = 0.019

 The attribute with the maximum gain ratio is selected as the
splitting attribute

20
Gini Index (CART, IBM IntelligentMiner)
 Gini Index considers a binary split on each attribute
 If a data set D contains examples from m classes, gini index, gini(D) is
defined as

where pi is the probability that a tuple in D belongs to class Ci pi =

 If a data set D is split (binary split) on A into two subsets D1 and D2, the gini
index gini(D) is defined as
|D1| |D2 |
giniA ( D)  gini( D1)  gini( D2)
|D| |D|

For each attribute, each of the possible binary split is considered

21
Gini Index (CART, IBM IntelligentMiner)

 Reduction in Impurity:
gini( A)  gini(D)  giniA(D)

 The attribute provides the smallest ginisplit(D) (or the largest reduction in
impurity) is chosen to split the node (need to enumerate all the possible
splitting points for each attribute)

22
Computation of Gini Index
 Ex. D has 9 tuples in buys_computer = “yes” and 5 in “no”
2 2
9 5
gini( D)  1        0.459
 14   14 
 Suppose the attribute income partitions D into 10 in D1: {low,
medium} and 4 in D2 giniincome{low,medium} ( D)   10 Gini( D1 )   4 Gini( D2 )
 14   14 

Gini{low,high} is 0.458; Gini{medium,high} is 0.450. Thus, split on the

{low,medium} (and {high}) since it has the lowest Gini index

23
Comparing Attribute Selection Measures

 The three measures, in general, return good results but

 Information gain:
 biased towards multivalued attributes
 Gain ratio:
 tends to prefer unbalanced splits in which one partition is
much smaller than the others
 Gini index:
 biased to multivalued attributes
 has difficulty when # of classes is large
 tends to favor tests that result in equal-sized partitions
and purity in both partitions
24
PRUNING

If Information gain or
gini index falls below a
prespecified threshold,
then further
partitioning of the
given subset is halted.

Pre Pruning -halting its construction early. i.e. by deciding not to further split or partition the
subset of training tuples at a given node. The leaf may hold the most frequent class among
the subset tuples.
Post Pruning - removes the subtrees from a fully grown tree
A subtree at a given node is pruned by removing its branches and replacing it with a leaf.
The leaf is labelled with the most frequent class among the subtree being replaced.

Cost complexity-number of leaves in the tree; error rate-% of tuples misclassified by the tree
Drawbacks of Decision Tree

Repitition
Drawbacks of Decision Tree

Replication
Scalability Framework for RainForest

 Separates the scalability aspects from the criteria that

determine the quality of the tree
 Capable of handling large data set that should fit in memory
 Builds an AVC-list: AVC (Attribute, Value, Class_label)
 AVC-set (of an attribute X )
 Projection of training dataset onto the attribute X and
class label where counts of individual class label are
aggregated

28
Home work problem.
Build a decision tree for the table given based on information gain.
Entropy and Information Gain
 Let’s use IG based criterion to construct a DT for the Tennis example
 At root node, let’s compute IG of each of the 4 features
 Consider feature “wind”. Root contains all examples S = [9+,5-]
H(S ) = −(9/14) log2(9/14) − (5/14) log2(5/14) = 0.94
Sweak = [6+, 2−] ⇒ H(Sweak ) = 0.811
Sstrong = [3+, 3−] ⇒ H(Sstrong) = 1
𝑆weak 𝑆strong
𝐼𝐺(𝑆, 𝑤𝑖𝑛𝑑) = 𝐻 𝑆 − 𝐻 𝑆weak − 𝐻 𝑆strong = 0.94 − 8/14 ∗ 0.811 − 6/14 ∗ 1 = 0.048
𝑆 𝑆

 Likewise, at root: IG(S, outlook) = 0.246, IG(S, humidity) = 0.151, IG(S,temp) = 0.029
 Thus we choose “outlook” feature to be tested at the root node
 Now how to grow the DT, i.e., what to do at the next level? Which feature to test next?
 Rule: Iterate - for each child node, select the feature with the highest IG
Growing the tree

 Proceeding as before, for level 2, left node, we can verify that

 IG(S,temp) = 0.570, IG(S, humidity) = 0.970, IG(S, wind) = 0.019
 Thus humidity chosen as the feature to be tested at level 2, left node
 No need to expand the middle node (already “pure” - all “yes” training examples )
 Can also verify that wind has the largest IG for the right node
 Note: If a feature has already been tested along a path earlier, we don’t consider it again
When to stop growing the tree?

 Stop expanding a node further (i.e., make it a leaf node) when

 It consist of all training examples having the same label (the node becomes “pure”)
 We run out of features to test along the path to that node
 The DT starts to overfit (can be checked by monitoring
the validation set accuracy)

Decision Tree Course Guide
No ratings yet
Decision Tree Course Guide
37 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
Classification
No ratings yet
Classification
75 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Class Basic
No ratings yet
Class Basic
75 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
08ClassBasic v1
No ratings yet
08ClassBasic v1
46 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Unit 4 DM
No ratings yet
Unit 4 DM
88 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Unit 3
No ratings yet
Unit 3
98 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
ECE Classification Concepts
No ratings yet
ECE Classification Concepts
69 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Concepts and Techniques
No ratings yet
Concepts and Techniques
53 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
DM 3
No ratings yet
DM 3
37 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
08ClassBasic L
No ratings yet
08ClassBasic L
78 pages
DM 4
No ratings yet
DM 4
68 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
05 Classification
No ratings yet
05 Classification
79 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
4 & 5 DWM 2024-25
No ratings yet
4 & 5 DWM 2024-25
32 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
41 pages
Data Classification Basics
No ratings yet
Data Classification Basics
34 pages
04 Classification
No ratings yet
04 Classification
72 pages
Slide 07 Chapter8 Classification Basic Concept
No ratings yet
Slide 07 Chapter8 Classification Basic Concept
55 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
Unit 3-Classification
No ratings yet
Unit 3-Classification
71 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
CH 5
No ratings yet
CH 5
81 pages
8 Classification
No ratings yet
8 Classification
82 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Classification
No ratings yet
Classification
73 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
VII - CS8031 - DMDW - Module 6 - Classification - VBP
No ratings yet
VII - CS8031 - DMDW - Module 6 - Classification - VBP
99 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
83 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Module 4
No ratings yet
Module 4
99 pages
Classification
No ratings yet
Classification
45 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
The Systems Thinking Playbook
100% (10)
The Systems Thinking Playbook
213 pages
The Best ChatGPT
100% (49)
The Best ChatGPT
8 pages
Make It Stick
100% (37)
Make It Stick
328 pages
Systems Thinking
100% (10)
Systems Thinking
62 pages
Thinking in Systems and Mental Models Think Like A Super Thinker by Marcus P. Dawson
92% (13)
Thinking in Systems and Mental Models Think Like A Super Thinker by Marcus P. Dawson
271 pages
Ten Mental Models For Learning Anything
100% (1)
Ten Mental Models For Learning Anything
22 pages
Thinking in Systems Audiobook Supplement
82% (11)
Thinking in Systems Audiobook Supplement
33 pages
Think! - Edward de Bono
97% (30)
Think! - Edward de Bono
155 pages
Strategic Thinking in Complex Problem Solving
100% (16)
Strategic Thinking in Complex Problem Solving
300 pages
Make Useful Notes
100% (10)
Make Useful Notes
231 pages
Mental Model Master List PDF
100% (6)
Mental Model Master List PDF
32 pages
Building A Second Brain - The Illustrated Notes
98% (87)
Building A Second Brain - The Illustrated Notes
26 pages
Uzrah The Habit of Critical Thinking Powerful Routines To Change Your Mind and Sharpen Your Thinking
100% (3)
Uzrah The Habit of Critical Thinking Powerful Routines To Change Your Mind and Sharpen Your Thinking
111 pages
Student Successes With Thinking Maps
98% (45)
Student Successes With Thinking Maps
249 pages
Logic Made Easy (2004)
100% (32)
Logic Made Easy (2004)
260 pages
NickBostrom Superintelligence PDF
96% (56)
NickBostrom Superintelligence PDF
323 pages
Deep Work
98% (43)
Deep Work
212 pages
The Elephant in The Brain Hidden Motives in Everyday Life by Kevin Simler, Robin Hanson
100% (30)
The Elephant in The Brain Hidden Motives in Everyday Life by Kevin Simler, Robin Hanson
358 pages
Limitless Brain Training 2 BOOKS in 1 TH - Robert Reed
100% (6)
Limitless Brain Training 2 BOOKS in 1 TH - Robert Reed
244 pages
Full Life Planner Interactive
100% (81)
Full Life Planner Interactive
265 pages
"Habit Change Strategies Guide"
100% (14)
"Habit Change Strategies Guide"
32 pages
101 Creative Problem Solving Techniques by James M. Higgins
97% (61)
101 Creative Problem Solving Techniques by James M. Higgins
241 pages
The Street Persuasion
92% (13)
The Street Persuasion
24 pages
Predictably Irrational - The Hidden Forces That Shape Our Decisions PDF
97% (39)
Predictably Irrational - The Hidden Forces That Shape Our Decisions PDF
308 pages
Frames of Mind
100% (13)
Frames of Mind
606 pages
Super Thinking - The Big Book of Mental Models
77% (13)
Super Thinking - The Big Book of Mental Models
34 pages
Master Your Mind Critical-Thinking Exercises and Activities To Boost Brain Power and Think Smarter (Danesi PHD, Marcel) (Z-Library)
100% (4)
Master Your Mind Critical-Thinking Exercises and Activities To Boost Brain Power and Think Smarter (Danesi PHD, Marcel) (Z-Library)
170 pages
Data Visualization Charts, Maps, and Interactive Graphics
100% (17)
Data Visualization Charts, Maps, and Interactive Graphics
249 pages
Time Blocking PDF
97% (29)
Time Blocking PDF
65 pages
Better Data Visualizations Scholars
98% (42)
Better Data Visualizations Scholars
464 pages
CSCE 3110 Data Structures & Algorithm Analysis: Rada Mihalcea
No ratings yet
CSCE 3110 Data Structures & Algorithm Analysis: Rada Mihalcea
20 pages
Intro to Machine Learning Systems
No ratings yet
Intro to Machine Learning Systems
3 pages
Vlsi Summer 16
No ratings yet
Vlsi Summer 16
4 pages
D.S Viva Questions-Ktunotes - in
No ratings yet
D.S Viva Questions-Ktunotes - in
7 pages
Gauss-Seidel Method for Linear Equations
No ratings yet
Gauss-Seidel Method for Linear Equations
3 pages
C++ Data Types: Primitive Data Types : These Data Types Are Built-In or Predefined Data Types and
No ratings yet
C++ Data Types: Primitive Data Types : These Data Types Are Built-In or Predefined Data Types and
16 pages
Sheet 2: Problem 1: Matrix Multiplication Using CREW PRAM
No ratings yet
Sheet 2: Problem 1: Matrix Multiplication Using CREW PRAM
3 pages
Domain Relational Calculus
No ratings yet
Domain Relational Calculus
3 pages
Amity - Mod-1 - L - 1introduction To Algorithms
No ratings yet
Amity - Mod-1 - L - 1introduction To Algorithms
25 pages
Lecture 4 - Density Based Methods
No ratings yet
Lecture 4 - Density Based Methods
16 pages
COMP90038 Algorithms and Complexity
No ratings yet
COMP90038 Algorithms and Complexity
15 pages
Transportation or
No ratings yet
Transportation or
5 pages
Recursive Definitions & Regular Expressions
No ratings yet
Recursive Definitions & Regular Expressions
28 pages
Lec01 introductionToToC
No ratings yet
Lec01 introductionToToC
34 pages
Deep Learning Exam With Answers
No ratings yet
Deep Learning Exam With Answers
4 pages
Graph Theory Guide for Programmers
No ratings yet
Graph Theory Guide for Programmers
25 pages
CS 3110 Recitation 17 More Amortized Analysis: Binary Counter
No ratings yet
CS 3110 Recitation 17 More Amortized Analysis: Binary Counter
5 pages
Neural Network and Their Applications
No ratings yet
Neural Network and Their Applications
2 pages
Asymptotic Order
No ratings yet
Asymptotic Order
19 pages
Functions and Algebra for Students
No ratings yet
Functions and Algebra for Students
9 pages
Insem Paper AI
No ratings yet
Insem Paper AI
1 page
Static61de416a3e2596709a9237f6t16108unit 1 Test Linear Syst
No ratings yet
Static61de416a3e2596709a9237f6t16108unit 1 Test Linear Syst
5 pages
Associate Software Developer - Preparation Document
No ratings yet
Associate Software Developer - Preparation Document
3 pages
Quantum Cryptography Insights
No ratings yet
Quantum Cryptography Insights
26 pages
Tafl
No ratings yet
Tafl
5 pages
Ai Lab 7
No ratings yet
Ai Lab 7
3 pages
Fortran Do Loop Construct Tutorialspoint
No ratings yet
Fortran Do Loop Construct Tutorialspoint
4 pages
Unit 2
No ratings yet
Unit 2
31 pages
ES272 ch4b
No ratings yet
ES272 ch4b
18 pages
Dynamic Programming: An Application
No ratings yet
Dynamic Programming: An Application
14 pages