0% found this document useful (0 votes)

216 views23 pages

Decision Tree Induction

Decision tree induction is an iterative process that uses measures like information gain, gain ratio, and Gini index to select attributes for splitting nodes. Perception-based classification (PCB) allows interactive visualization of multidimensional data to support decision tree construction. PCB displays data in a circle partitioned by attributes, with class labels represented by pixel color. This interactive approach incorporates user knowledge and tends to produce smaller, more interpretable trees with similar accuracy to automatic methods.

Uploaded by

Juhita Kumari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

216 views23 pages

Decision Tree Induction

Uploaded by

Juhita Kumari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

DECISION TREE INDUCTION

Presented by:-
JUHITA KUMARI
OVERVIEW

 INTRODUCTION OF DECISION TREE INDUCTION

 ITERATIVE DICHOTOMISER 3 (ID3) ALGORITHM

 ATTRIBUTE SELECTION MEASURES

 INFORMATION GAIN

 GAIN RATIO

 GINI INDEX
DECISION TREE INDUCTION
A decision tree is a flowchart like a tree structure, where each internal node (non-leaf node) denotes a test on an
attribute, each branch denotes the outcome of a test, and each leaf node holds a class label. The topmost node in
the tree is the root node.
Decision trees are powerful and popular tools for classification and prediction. Decision trees represent
rules, which can be understood by humans and used in knowledge system such as database. Decision tree learning is a
method commonly used in data mining. The goal is to create a model that predicts the value of a target variable based
on several input variables.

Root Node

Internal Node
Leaf Node

A decision tree for the concept buys_computer, indicating whether an AllElectronics customer is likely to purchase a computer.
DECISION TREE INDUCTION
OVERVIEW

 INTRODUCTION OF DECISION TREE INDUCTION

 ITERATIVE DICHOTOMISER 3 (ID3) ALGORITHM

 ATTRIBUTE SELECTION MEASURES

 INFORMATION GAIN

 GAIN RATIO

 GINI INDEX
DECISION TREE ALGORITHM – ID3

Algorithm : Generate_decision_tree. Generate a decision tree from the training tuples of data partition, D.

Input :
 Data partition, D, which is a set of training tuples and their associated class labels;
 Attribute_list, the set of candidate attributes;
 Attribute_selection_method, a procedure to determine the splitting criterion that “best” partitions the data tuples
into individual classes. This criterion consists of a splitting_attribute and , possibly, either a split-point or splitting
subset.

Output: A decision tree

Method:
1. Create a node N;
2. if tuples in D are all of the same class, C, then
3. return N as a leaf node labeled with the class C;
4. if attribute_list is empty then
5. return N as a leaf node labeled with the majority class in D;
6. apply Attribute_selection_method (D,attribute_list) to find the “best” splitting_criterion;
7. label node N with splitting_criterion;
8. if splitting_attribute is dicrete_valued and multiway splits allowed then
9. attribute_list  attribute_list – splitting_attribute; //remove splitting_attribute
10. for each outcome j of splitting_criterion //partition the tuples and grow subtree for each partition
11. let Dj be the set of data tuples in D satisfying outcome j; //a partition
12. if Dj is empty then
13. attach a leaf labeled with the majority class in D to node N;
14. else attach the node returned by Generate_decision_tree (Dj, attribute_list) to node N; end for
15. return N
Basic algorithm for inducing a decision tree from training tuples
OVERVIEW

 INTRODUCTION OF DECISION TREE INDUCTION

 ITERATIVE DICHOTOMISER 3 (ID3) ALGORITHM

 ATTRIBUTE SELECTION MEASURES

 INFORMATION GAIN

 GAIN RATIO

 GINI INDEX
ATTRIBUTE SELECTION MEASURES

An attribute selection measure is a heuristic (to find) for selecting the splitting criterion that “best” separates a
given data partition, D, of class-labeled training tuples into individual classes. Attribute selection measures are
also known also known as splitting rules because determine how the tuples at a given node are to be split.
There are three popular attribute selection measures
 Information gain
 Gain ratio
 Gini index
OVERVIEW

 INTRODUCTION OF DECISION TREE INDUCTION

 ITERATIVE DICHOTOMISER 3 (ID3) ALGORITHM

 ATTRIBUTE SELECTION MEASURES

 INFORMATION GAIN

 GAIN RATIO

 GINI INDEX
INFORMATION GAIN
ID3 uses information gain as its attribute selection measure. Information gain is the amount of information that's gained
by knowing the value of the attribute. Information gain is defined as the difference between the original information
requirement (i.e. based on the classes) and the new requirement (i.e. obtained after partitioning on A)

• Select the attribute with the heights information gain.

 Expected Information (Entropy) needed to classify a tuple in D;

 Information needed (after using A to split D into v partitions) to classify D;

 Information gained by branching on attribute A

INFORMATION GAIN
 14 Records
• Class ‘Yes’= 9 records
• Class ‘No’=5 records

Similarly,
Gain (income)=0.029
Gain (student)=0.151
Gain (Credit_rating)=0.048
OVERVIEW

 INTRODUCTION OF DECISION TREE INDUCTION

 ITERATIVE DICHOTOMISER 3 (ID3) ALGORITHM

 ATTRIBUTE SELECTION MEASURES

 INFORMATION GAIN

 GAIN RATIO

 GINI INDEX
GAIN RATIO
Information gain ratio biases the decision tree against considering attributes with a large number of distinct values. So it
solves the drawback of information gain—namely, information gain applied to attributes that can take on a large number of
distinct values might learn the training set too well. For example, suppose that we are building a decision tree for some data
describing a business's customers.
Information gain is often used to decide which of the attributes are the most relevant, so they can be tested near
the root of the tree. One of the input attributes might be the customer's credit card number. This attribute has a high
information gain, because it uniquely identifies each customer, but we do not want to include it in the decision tree:
deciding how to treat a customer based on their credit card number is unlikely to generalize to customers we haven't seen
before.
Example:

 For attribute income:

 Gain(Income)=0.029

 Therefore, GainRatio(Income)=0.029/0.926=0.031
OVERVIEW

 INTRODUCTION OF DECISION TREE INDUCTION

 ITERATIVE DICHOTOMISER 3 (ID3) ALGORITHM

 ATTRIBUTE SELECTION MEASURES

 INFORMATION GAIN

 GAIN RATIO

 GINI INDEX
GINI INDEX
The Gini Index is used in CART. It is calculated by subtracting the sum of the squared probabilities of each class
from one. It favors larger partitions. Using the notation previously described, the Gini index measures the
impurity of D, a data partition or set of training tuples, as
𝑘

𝐺 𝐷 = 1 − ෍ 𝑝𝑖2
𝑖=1

Suppose, a binary partition on A splits D into 𝐷1 and 𝐷2 , then the weighted average Gini Index of splitting denoted
by 𝐺𝐴 (𝐷) is given by
𝐷1 𝐷2
𝐺𝐴 𝐷 = . 𝐺 𝐷1 + . 𝐺(𝐷2 )
𝐷 𝐷

This binary partition of D reduces the impurity and the reduction in impurity is measured by

𝛾 𝐴, 𝐷 = 𝐺 𝐷 − 𝐺𝐴 𝐷
For the EMP data set,

𝐺 𝐸𝑀𝑃 = 1 − σ2𝑖=1 𝑝𝑖2

9 2 5 2
= 1− +
14 14

= 𝟎. 𝟒𝟓𝟗

Now let us consider the calculation of 𝐺𝐴 𝐸𝑀𝑃 for Age, Salary, Job and Performance.
7 7
𝐺𝑗𝑜𝑏 𝐷 = 𝐺 𝐷1 + 𝐺(𝐷2 )
14 14

7 3 2 4 2 7 6 2 1 2
= 1− − + 1− − = 0.443
14 7 7 14 7 7

𝛾 𝑗𝑜𝑏, 𝐷 = ?
VISUAL MINING FOR DECISION TREE INDUCTION

“Are there any interactive approaches to decision tree induction that allow us to visualize the data and the tree as it

is being constructed? Can we use any knowledge of our data to help in building the tree?”
VISUAL MINING FOR DECISION TREE INDUCTION

Perception-based classification (PCB) is an

interactive approach based on multidimensional
visualization techniques and allows the user to
incorporate background knowledge about the
data when building a decision tree. By visually
interacting with the data, the user is also likely to
develop a deeper understanding of the data. The
resulting trees tend to be smaller than those
built using traditional decision tree induction
methods and so are easier to interpret, while
achieving about the same accuracy.
A screenshot of PCB, a system for interactive decision tree construction
“How can the data be visualized to support interactive decision tree construction?”

PBC uses a pixel-oriented approach to view multidimensional data with its class label information. The circle
segments approach is adapted, which maps d-dimensional data objects to a circle that is partitioned into d
segments, each representing one attribute. Here, an attribute value of a data object is mapped to one
colored pixel, reflecting the object’s class label. This mapping is done for each attribute to determine the
arrangement order within a segment.
For example, attribute values within a given segment may be organized so as to display
homogeneous regions within the same attribute value. The amount of training data that can be visualized at
one point is approximately determined by the product of the number of attributes and the number of data
objects.
The PCB system displays a split screen, consisting of Data Interaction Window and a Knowledge
Interaction Window. The Data Interaction Window displays the circle segments of the data under
examination, while the Knowledge Interaction Window displays an empty decision tree.
THANK YOU!

Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
Unit 4
No ratings yet
Unit 4
4 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Unit-4 Part-1 ML Ai&Ml r23
No ratings yet
Unit-4 Part-1 ML Ai&Ml r23
20 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
Unit 4 - Association Analysis
No ratings yet
Unit 4 - Association Analysis
12 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
10 - Sugeno-TSK Model
100% (2)
10 - Sugeno-TSK Model
23 pages
SOFT COMPUTING - NOTES - UNIT 4 and UNIT 5
No ratings yet
SOFT COMPUTING - NOTES - UNIT 4 and UNIT 5
32 pages
AI Basics for Tech Enthusiasts
No ratings yet
AI Basics for Tech Enthusiasts
125 pages
FP Tree Example
No ratings yet
FP Tree Example
11 pages
Data Discretization Techniques
No ratings yet
Data Discretization Techniques
21 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
Data Mining Models - GeeksforGeeks
No ratings yet
Data Mining Models - GeeksforGeeks
4 pages
Lecture 01 05.08.2024 AI-ML Introduction
No ratings yet
Lecture 01 05.08.2024 AI-ML Introduction
46 pages
Ai Unit-V Expert Systems
No ratings yet
Ai Unit-V Expert Systems
20 pages
Gaussian Mixture Models Unit-III
No ratings yet
Gaussian Mixture Models Unit-III
13 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
2 pages
FSD Unit III
No ratings yet
FSD Unit III
22 pages
Lab Program
100% (1)
Lab Program
15 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Constraint Satisfaction Problems: AIMA: Chapter 6
No ratings yet
Constraint Satisfaction Problems: AIMA: Chapter 6
64 pages
Circle Generation Algorithm
No ratings yet
Circle Generation Algorithm
10 pages
Class Notes Unit 2 ML Material
No ratings yet
Class Notes Unit 2 ML Material
31 pages
Single Layer & Multilayer Perceptron
No ratings yet
Single Layer & Multilayer Perceptron
14 pages
Dataming T PDF
No ratings yet
Dataming T PDF
48 pages
Data Mining:: Concepts and Techniques
100% (1)
Data Mining:: Concepts and Techniques
63 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
16 pages
Data Warehousing & Mining Guide
No ratings yet
Data Warehousing & Mining Guide
142 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Unit 3 - Basic Search and Traversal Techniques
100% (2)
Unit 3 - Basic Search and Traversal Techniques
113 pages
Data Mining - Tasks: Data Characterization Data Discrimination
No ratings yet
Data Mining - Tasks: Data Characterization Data Discrimination
4 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
No ratings yet
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
9 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
Data Mining and Business Intelligence Lab Manual
No ratings yet
Data Mining and Business Intelligence Lab Manual
52 pages
Retail Data Insights & Strategies
No ratings yet
Retail Data Insights & Strategies
24 pages
Practical 5: Introduction To Weka For Classfication
100% (1)
Practical 5: Introduction To Weka For Classfication
4 pages
Studocu DAA Unit 5 Notes
No ratings yet
Studocu DAA Unit 5 Notes
23 pages
Data Mining and Ware Housing
No ratings yet
Data Mining and Ware Housing
130 pages
Artificial Intelligence: Adversarial Search
No ratings yet
Artificial Intelligence: Adversarial Search
36 pages
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
4 pages
PPT1
No ratings yet
PPT1
93 pages
Data Warehousing & Mining Exam
No ratings yet
Data Warehousing & Mining Exam
3 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
No ratings yet
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
46 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
23 pages
Partitioning Methods
100% (1)
Partitioning Methods
3 pages
Unit 3
100% (1)
Unit 3
21 pages
Chapter4 Associative Memory
No ratings yet
Chapter4 Associative Memory
27 pages
Unit 4
No ratings yet
Unit 4
12 pages
Stock Market Big Data Insights
No ratings yet
Stock Market Big Data Insights
3 pages
Unsupervised Learning Notes
No ratings yet
Unsupervised Learning Notes
21 pages
2 Marks
No ratings yet
2 Marks
11 pages
C Programming: Salary, Repeated Count, Max Sum, Product, Cricketer ID, Temp Conversion
100% (3)
C Programming: Salary, Repeated Count, Max Sum, Product, Cricketer ID, Temp Conversion
256 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
MLT 3 UNIT-Part-1
No ratings yet
MLT 3 UNIT-Part-1
28 pages
Concepts and Techniques
No ratings yet
Concepts and Techniques
53 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Photogrammetry Manual PDF
No ratings yet
Photogrammetry Manual PDF
103 pages
P35 Portable Dewpoint Meter Datasheet 1898 Iss7
No ratings yet
P35 Portable Dewpoint Meter Datasheet 1898 Iss7
3 pages
5.1 Chemical Formulae, Equations, Calculations (1C) QP Part 2
No ratings yet
5.1 Chemical Formulae, Equations, Calculations (1C) QP Part 2
12 pages
Error Identification - PT3
No ratings yet
Error Identification - PT3
1 page
Manual Allplan BCM Quantities
No ratings yet
Manual Allplan BCM Quantities
193 pages
Bis Two Mark Questions
No ratings yet
Bis Two Mark Questions
3 pages
GU Student Manual 2 Schemas
No ratings yet
GU Student Manual 2 Schemas
11 pages
WLP Q1 G11-Philosophy
No ratings yet
WLP Q1 G11-Philosophy
8 pages
Surbacon Maple Brochure May 2012
No ratings yet
Surbacon Maple Brochure May 2012
14 pages
Assignment 2023
No ratings yet
Assignment 2023
5 pages
RX200A-3-25-1D-MRZ 200mm Pedestrian + Acoustic Device
No ratings yet
RX200A-3-25-1D-MRZ 200mm Pedestrian + Acoustic Device
4 pages
Grade 9 Math Curriculum Guide
No ratings yet
Grade 9 Math Curriculum Guide
2 pages
Product Guide: Hyundai Construction Equipment
100% (1)
Product Guide: Hyundai Construction Equipment
26 pages
Contemporary Professional Nursing Final
No ratings yet
Contemporary Professional Nursing Final
17 pages
Price of AIO Solar Street Light
No ratings yet
Price of AIO Solar Street Light
3 pages
ServiceManuals LG Fridge GRL257NI GR-L257NI Service Manual
100% (1)
ServiceManuals LG Fridge GRL257NI GR-L257NI Service Manual
128 pages
Applied Economics
No ratings yet
Applied Economics
11 pages
Database Lab: EER Diagrams
No ratings yet
Database Lab: EER Diagrams
9 pages
Unit 2 Lesson 1: Opening The Lesson 5 MN
No ratings yet
Unit 2 Lesson 1: Opening The Lesson 5 MN
8 pages
IOQM Counting Techniques Guide
No ratings yet
IOQM Counting Techniques Guide
4 pages
Job Focused
No ratings yet
Job Focused
4 pages
Steel Products for Industry Use
No ratings yet
Steel Products for Industry Use
38 pages
Alloy Steel Tubes Material Data
No ratings yet
Alloy Steel Tubes Material Data
3 pages
IsoFlap Product Information Sheet en
No ratings yet
IsoFlap Product Information Sheet en
2 pages
Error Peskin
No ratings yet
Error Peskin
4 pages
Assignment 8 - Professional Learning Plan
No ratings yet
Assignment 8 - Professional Learning Plan
8 pages
Diversity Models and Dimensions Guide
No ratings yet
Diversity Models and Dimensions Guide
4 pages
Ultrasonic Humidifier
No ratings yet
Ultrasonic Humidifier
3 pages
Assessing Wildfire Vulnerability of Vegetated Serpentine Soils in The Balkan Peninsula
No ratings yet
Assessing Wildfire Vulnerability of Vegetated Serpentine Soils in The Balkan Peninsula
13 pages
Deloitte Uk Deloitte Financial Advisory Global Brand Guidelines and Toolkit PDF
100% (1)
Deloitte Uk Deloitte Financial Advisory Global Brand Guidelines and Toolkit PDF
73 pages