DECISION TREE INDUCTION
Presented by:-
JUHITA KUMARI
OVERVIEW
INTRODUCTION OF DECISION TREE INDUCTION
ITERATIVE DICHOTOMISER 3 (ID3) ALGORITHM
ATTRIBUTE SELECTION MEASURES
INFORMATION GAIN
GAIN RATIO
GINI INDEX
DECISION TREE INDUCTION
A decision tree is a flowchart like a tree structure, where each internal node (non-leaf node) denotes a test on an
attribute, each branch denotes the outcome of a test, and each leaf node holds a class label. The topmost node in
the tree is the root node.
Decision trees are powerful and popular tools for classification and prediction. Decision trees represent
rules, which can be understood by humans and used in knowledge system such as database. Decision tree learning is a
method commonly used in data mining. The goal is to create a model that predicts the value of a target variable based
on several input variables.
Root Node
Internal Node
Leaf Node
A decision tree for the concept buys_computer, indicating whether an AllElectronics customer is likely to purchase a computer.
DECISION TREE INDUCTION
OVERVIEW
INTRODUCTION OF DECISION TREE INDUCTION
ITERATIVE DICHOTOMISER 3 (ID3) ALGORITHM
ATTRIBUTE SELECTION MEASURES
INFORMATION GAIN
GAIN RATIO
GINI INDEX
DECISION TREE ALGORITHM – ID3
Algorithm : Generate_decision_tree. Generate a decision tree from the training tuples of data partition, D.
Input :
Data partition, D, which is a set of training tuples and their associated class labels;
Attribute_list, the set of candidate attributes;
Attribute_selection_method, a procedure to determine the splitting criterion that “best” partitions the data tuples
into individual classes. This criterion consists of a splitting_attribute and , possibly, either a split-point or splitting
subset.
Output: A decision tree
Method:
1. Create a node N;
2. if tuples in D are all of the same class, C, then
3. return N as a leaf node labeled with the class C;
4. if attribute_list is empty then
5. return N as a leaf node labeled with the majority class in D;
6. apply Attribute_selection_method (D,attribute_list) to find the “best” splitting_criterion;
7. label node N with splitting_criterion;
8. if splitting_attribute is dicrete_valued and multiway splits allowed then
9. attribute_list attribute_list – splitting_attribute; //remove splitting_attribute
10. for each outcome j of splitting_criterion //partition the tuples and grow subtree for each partition
11. let Dj be the set of data tuples in D satisfying outcome j; //a partition
12. if Dj is empty then
13. attach a leaf labeled with the majority class in D to node N;
14. else attach the node returned by Generate_decision_tree (Dj, attribute_list) to node N; end for
15. return N
Basic algorithm for inducing a decision tree from training tuples
OVERVIEW
INTRODUCTION OF DECISION TREE INDUCTION
ITERATIVE DICHOTOMISER 3 (ID3) ALGORITHM
ATTRIBUTE SELECTION MEASURES
INFORMATION GAIN
GAIN RATIO
GINI INDEX
ATTRIBUTE SELECTION MEASURES
An attribute selection measure is a heuristic (to find) for selecting the splitting criterion that “best” separates a
given data partition, D, of class-labeled training tuples into individual classes. Attribute selection measures are
also known also known as splitting rules because determine how the tuples at a given node are to be split.
There are three popular attribute selection measures
Information gain
Gain ratio
Gini index
OVERVIEW
INTRODUCTION OF DECISION TREE INDUCTION
ITERATIVE DICHOTOMISER 3 (ID3) ALGORITHM
ATTRIBUTE SELECTION MEASURES
INFORMATION GAIN
GAIN RATIO
GINI INDEX
INFORMATION GAIN
ID3 uses information gain as its attribute selection measure. Information gain is the amount of information that's gained
by knowing the value of the attribute. Information gain is defined as the difference between the original information
requirement (i.e. based on the classes) and the new requirement (i.e. obtained after partitioning on A)
• Select the attribute with the heights information gain.
Expected Information (Entropy) needed to classify a tuple in D;
Information needed (after using A to split D into v partitions) to classify D;
Information gained by branching on attribute A
INFORMATION GAIN
14 Records
• Class ‘Yes’= 9 records
• Class ‘No’=5 records
Similarly,
Gain (income)=0.029
Gain (student)=0.151
Gain (Credit_rating)=0.048
OVERVIEW
INTRODUCTION OF DECISION TREE INDUCTION
ITERATIVE DICHOTOMISER 3 (ID3) ALGORITHM
ATTRIBUTE SELECTION MEASURES
INFORMATION GAIN
GAIN RATIO
GINI INDEX
GAIN RATIO
Information gain ratio biases the decision tree against considering attributes with a large number of distinct values. So it
solves the drawback of information gain—namely, information gain applied to attributes that can take on a large number of
distinct values might learn the training set too well. For example, suppose that we are building a decision tree for some data
describing a business's customers.
Information gain is often used to decide which of the attributes are the most relevant, so they can be tested near
the root of the tree. One of the input attributes might be the customer's credit card number. This attribute has a high
information gain, because it uniquely identifies each customer, but we do not want to include it in the decision tree:
deciding how to treat a customer based on their credit card number is unlikely to generalize to customers we haven't seen
before.
Example:
For attribute income:
Gain(Income)=0.029
Therefore, GainRatio(Income)=0.029/0.926=0.031
OVERVIEW
INTRODUCTION OF DECISION TREE INDUCTION
ITERATIVE DICHOTOMISER 3 (ID3) ALGORITHM
ATTRIBUTE SELECTION MEASURES
INFORMATION GAIN
GAIN RATIO
GINI INDEX
GINI INDEX
The Gini Index is used in CART. It is calculated by subtracting the sum of the squared probabilities of each class
from one. It favors larger partitions. Using the notation previously described, the Gini index measures the
impurity of D, a data partition or set of training tuples, as
𝑘
𝐺 𝐷 = 1 − 𝑝𝑖2
𝑖=1
Suppose, a binary partition on A splits D into 𝐷1 and 𝐷2 , then the weighted average Gini Index of splitting denoted
by 𝐺𝐴 (𝐷) is given by
𝐷1 𝐷2
𝐺𝐴 𝐷 = . 𝐺 𝐷1 + . 𝐺(𝐷2 )
𝐷 𝐷
This binary partition of D reduces the impurity and the reduction in impurity is measured by
𝛾 𝐴, 𝐷 = 𝐺 𝐷 − 𝐺𝐴 𝐷
For the EMP data set,
𝐺 𝐸𝑀𝑃 = 1 − σ2𝑖=1 𝑝𝑖2
9 2 5 2
= 1− +
14 14
= 𝟎. 𝟒𝟓𝟗
Now let us consider the calculation of 𝐺𝐴 𝐸𝑀𝑃 for Age, Salary, Job and Performance.
7 7
𝐺𝑗𝑜𝑏 𝐷 = 𝐺 𝐷1 + 𝐺(𝐷2 )
14 14
7 3 2 4 2 7 6 2 1 2
= 1− − + 1− − = 0.443
14 7 7 14 7 7
𝛾 𝑗𝑜𝑏, 𝐷 = ?
VISUAL MINING FOR DECISION TREE INDUCTION
“Are there any interactive approaches to decision tree induction that allow us to visualize the data and the tree as it
is being constructed? Can we use any knowledge of our data to help in building the tree?”
VISUAL MINING FOR DECISION TREE INDUCTION
Perception-based classification (PCB) is an
interactive approach based on multidimensional
visualization techniques and allows the user to
incorporate background knowledge about the
data when building a decision tree. By visually
interacting with the data, the user is also likely to
develop a deeper understanding of the data. The
resulting trees tend to be smaller than those
built using traditional decision tree induction
methods and so are easier to interpret, while
achieving about the same accuracy.
A screenshot of PCB, a system for interactive decision tree construction
“How can the data be visualized to support interactive decision tree construction?”
PBC uses a pixel-oriented approach to view multidimensional data with its class label information. The circle
segments approach is adapted, which maps d-dimensional data objects to a circle that is partitioned into d
segments, each representing one attribute. Here, an attribute value of a data object is mapped to one
colored pixel, reflecting the object’s class label. This mapping is done for each attribute to determine the
arrangement order within a segment.
For example, attribute values within a given segment may be organized so as to display
homogeneous regions within the same attribute value. The amount of training data that can be visualized at
one point is approximately determined by the product of the number of attributes and the number of data
objects.
The PCB system displays a split screen, consisting of Data Interaction Window and a Knowledge
Interaction Window. The Data Interaction Window displays the circle segments of the data under
examination, while the Knowledge Interaction Window displays an empty decision tree.
THANK YOU!