Decision Tree

The document discusses decision trees and decision rules as effective data-mining methodologies for classification problems, emphasizing the process of mapping data items to predefined classes. It explains the decision tree learning process, including the ID3 and C4.5 algorithms, which utilize attribute selection based on information entropy to create classifiers. Key requirements for applying inductive-learning methods are also outlined, including the need for a flat-file data structure, predefined and discrete classes, sufficient data, and logical classification models.

Uploaded by

hmannat114

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views2 pages

Decision Tree

Uploaded by

hmannat114

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

DATA MINING.

COURSE INSTRUCTOR : Sheza Naeem

Lecture# 19

Decision trees and decision rules are data-mining methodologies applied in many realworld applications
as a powerful solution to classification problems. Therefore, at the beginning, let us briefly summarize
the basic principles of classification. In general,
classification is a process of learning a function that maps a data item into one of several predefined
classes. Every classification based on inductive-learning algorithms is given as input a set of samples
that consist of vectors of attribute values (also called
feature vectors) and a corresponding class. The goal of learning is to create a classification model,
known as a classifier, which will predict, with the values of its available input attributes, the class for
some entity (a given sample). In other words,
classification is the process of assigning a discrete label value (class) to an unlabeled record, and a
classifier is a model (a result of classification) that predicts one attribute—class of a sample—when the
other attributes are given. In doing so, samples
are divided into predefined groups. For example, a simple classification might group customer billing
records into two specific classes: those who pay their bills within 30 days and those who takes longer
than 30 days to pay. Different classification methodologies are applied today in almost every discipline
where the task of classification, because of the large amount of data, requires automation of the
process. Examples of classification methods used as a part of data-mining applications include
classifying trends in financial market and identifying objects in large image databases. A more
formalized approach to classification problems is given through its graphical interpretation. A data set
with n features may be thought of as a collection of discrete points (one per example) in an n-
dimensional space. A classification rule is a hypercube that contains one or more of these points.

DECISION TREES
A particularly efficient method for producing classifiers from data is to generate a
decision tree. The decision-tree representation is the most widely used logic method. There is a large
number of decision-tree induction algorithms described primarily in the machine-learning and applied-
statistics literature. They are supervised learning methods that construct decision trees from a set of
input–output samples. It is an efficient nonparametric method for classification and regression. A
decision tree is a hierarchical model for supervised learning where the local region is identified in a
sequence of recursive splits through decision nodes with test function. A decision tree is also a
nonparametric model in the sense that we do not assume any parametric form for the class density.
A typical decision-tree learning system adopts a top-down strategy that searches for a
solution in a part of the search space. It guarantees that a simple, but not necessarily the simplest,
tree will be found. A decision tree consists of nodes where attributes are tested. In a univariate tree,
for each internal node, the test uses only one of the attributes for testing. The outgoing branches of a
node correspond to all the possible outcomes of the test at the node. A simple decision tree for
classification of samples with two input attributes X and Y is given in Figure 6.2.

All samples with feature values X > 1 and Y = B belong to Class2, while the samples with values X < 1
belong to Class1, whatever the value for feature Y. The samples, at a nonleaf node in the tree
structure, are thus partitioned along the branches, and each child node gets its corresponding subset of
samples. Decision trees that use univariate splits have a simple representational form, making it
relatively easy for the user to understand the inferred model; at the same time, they represent a
restriction on the expressiveness of the model. In general, any restriction on a particular tree
representation can significantly restrict the functional form and thus the approximation power of the
model. A wellknown tree-growing algorithm for generating decision trees based on univariate splits is
Quinlan’s ID3 with an extended version called C4.5. Greedy searchmethods, which involve growing and
pruning decision-tree structures, are typically employed in these algorithms to explore the exponential
space of possible models.
The ID3 algorithm starts with all the training samples at the root node of the tree. An attribute is
selected to partition these samples. For each value of the attribute, a branch is created, and the
corresponding subset of samples that have the attribute
value specified by the branch is moved to the newly created child node. The algorithm is applied
recursively to each child node until all samples at a node are of one class.
Every path to the leaf in the decision tree represents a classification rule. Note that the critical
decision in such a top-down decision-tree-generation algorithm is the choice of attribute at a node.
Attribute selection in ID3 and C4.5 algorithms is based on minimizing an information entropy measure
applied to the examples at a node. The approach based on information entropy insists on minimizing
the number of tests that will allow a sample to classify in a database. The attribute selection part of ID3
is based on the assumption that the complexity of the decision tree is strongly related to the amount of
information conveyed by the value of the given attribute. An information- based heuristic selects the
attribute providing the highest information gain,
i.e., the attribute that minimizes the information needed in the resulting subtree to classify the sample.
An extension of ID3 is the C4.5 algorithm, which extends the domain of classification from categorical
attributes to numeric ones. The measure favors attributes that result in partitioning the data into
subsets that have low class entropy, i.e., when the majority of examples in it belong to a single class.
The algorithm basically chooses the attribute that provides the maximum degree of discrimination
between classes locally.
To apply some of the methods, which are based on the inductive-learning approach, several key
requirements have to be satisfied:

1. Attribute-value description—The data to be analyzed must be in a flat-file form—all information

about one object or example must be expressible in terms of a fixed collection of properties or
attributes. Each attribute may have either discrete or numeric values, but the attributes used to
describe samples must not vary from one case to another. This restriction rules out domains in
which samples have an inherently variable structure.

2. Predefined classes—The categories to which samples are to be assigned must have been established
beforehand. In the terminology of machine learning, is supervised learning.

3. Discrete classes—The classes must be sharply delineated: a case either does or does not belong to a
particular class. It is expected that there will be far more samples than classes.

4. Sufficient data—Inductive generalization given in the form of decision tree proceeds by identifying
patterns in data. The approach is valid if enough number of robust patterns can be distinguished from
chance coincidences. As this differentiation usually depends on statistical tests, there must be sufficient
number of samples to allow these tests to be effective. The amount of data required is affected by
factors such as the number of properties and classes
and the complexity of the classification model. As these factors increase, more data will be needed to
construct a reliable model.

5. “Logical” classification models—These methods construct only such classifiers that can be expressed
as decision trees or decision rules. These forms essentially restrict the description of a class to a logical
expression whose primitives are statements about the values of particular attributes. Some
applications require weighted attributes or their arithmetic combinations for a reliable description of
classes. In these situations logical models become very complex, and, in general, they are not
effective.

Pattern Recognition and Computer Vision NOTES
No ratings yet
Pattern Recognition and Computer Vision NOTES
27 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
DMT Unit-3
No ratings yet
DMT Unit-3
14 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
2013 Facilitating Decision Support Through Decision Tree
No ratings yet
2013 Facilitating Decision Support Through Decision Tree
5 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Module 3 Notes
No ratings yet
Module 3 Notes
31 pages
Module 3
No ratings yet
Module 3
64 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
23 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Unit 3
No ratings yet
Unit 3
31 pages
ASSIGNMEnt 3
No ratings yet
ASSIGNMEnt 3
26 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
ID3 Algorithm
100% (1)
ID3 Algorithm
3 pages
Decision Tree Algorithms for Data Mining
No ratings yet
Decision Tree Algorithms for Data Mining
5 pages
DM Module 4
No ratings yet
DM Module 4
12 pages
4 Classification
No ratings yet
4 Classification
20 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
Data Mining: Decision Trees Explained
No ratings yet
Data Mining: Decision Trees Explained
8 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
Unit 3
No ratings yet
Unit 3
95 pages
DM Mod 3
No ratings yet
DM Mod 3
14 pages
Decision Tree Learning (8 Hours)
No ratings yet
Decision Tree Learning (8 Hours)
141 pages
Unit-4 DM
No ratings yet
Unit-4 DM
15 pages
Decision Tree DT
No ratings yet
Decision Tree DT
20 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
Bhabesh - Chapter 3 Complete Editing Including Summary
No ratings yet
Bhabesh - Chapter 3 Complete Editing Including Summary
18 pages
DWDM Module IV
No ratings yet
DWDM Module IV
57 pages
Unit-Iii: Classification and Prediction
No ratings yet
Unit-Iii: Classification and Prediction
21 pages
Decision Trees and How To Build and Optimize Decision Tree Classifier
No ratings yet
Decision Trees and How To Build and Optimize Decision Tree Classifier
16 pages
Knowledge Mining Using Classification Through Clustering
No ratings yet
Knowledge Mining Using Classification Through Clustering
6 pages
Updated DM Unit 3
No ratings yet
Updated DM Unit 3
28 pages
Decision Trees & Kernel Machines
No ratings yet
Decision Trees & Kernel Machines
39 pages
Decision Tree
No ratings yet
Decision Tree
28 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
11 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
Unit-3 Decision Tree Learning (Februray 26, 2024)
No ratings yet
Unit-3 Decision Tree Learning (Februray 26, 2024)
51 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
Business Intelligence Unit 5
No ratings yet
Business Intelligence Unit 5
12 pages
IML Trees
No ratings yet
IML Trees
66 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
15 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
DMDW Classification
No ratings yet
DMDW Classification
18 pages
Decision Trees for Data Enthusiasts
No ratings yet
Decision Trees for Data Enthusiasts
52 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
Module 04
No ratings yet
Module 04
75 pages
Fuzzy Set Operation
No ratings yet
Fuzzy Set Operation
4 pages
Common Lisps
No ratings yet
Common Lisps
32 pages
Ai Notes
No ratings yet
Ai Notes
20 pages
Introduction To Mandarin Chinese
No ratings yet
Introduction To Mandarin Chinese
2 pages
SR No
No ratings yet
SR No
1 page
CS Outlines VI
No ratings yet
CS Outlines VI
5 pages
TRC Southwire GFCI 32360001 41240001
No ratings yet
TRC Southwire GFCI 32360001 41240001
1 page
SERVICE FACTOR & Load Class by AGMA-4
No ratings yet
SERVICE FACTOR & Load Class by AGMA-4
1 page
Grade 7 PE VPA Paper 1 Midyear 2024
75% (4)
Grade 7 PE VPA Paper 1 Midyear 2024
4 pages
SEI NCE DB 2016 Kenya Clean Cooking
No ratings yet
SEI NCE DB 2016 Kenya Clean Cooking
6 pages
Bommer Et Al 2015 A Sshac Level 3 Probabilistic Seismic Hazard Analysis For A New Build Nuclear Site in South Africa
No ratings yet
Bommer Et Al 2015 A Sshac Level 3 Probabilistic Seismic Hazard Analysis For A New Build Nuclear Site in South Africa
38 pages
365 Magic Items
100% (5)
365 Magic Items
71 pages
Dutch Flower Industry Analysis
No ratings yet
Dutch Flower Industry Analysis
13 pages
Elms Activity 2
No ratings yet
Elms Activity 2
2 pages
LP Day 2
No ratings yet
LP Day 2
5 pages
What Is The Reformed Faith
No ratings yet
What Is The Reformed Faith
5 pages
Vehicle Manual for Technicians
No ratings yet
Vehicle Manual for Technicians
1 page
ESL Brains Theyre My Friends To Be Positive SV 7941 2
No ratings yet
ESL Brains Theyre My Friends To Be Positive SV 7941 2
5 pages
Exadata PM Process
No ratings yet
Exadata PM Process
33 pages
Sect 3. Emergency Procedures
100% (1)
Sect 3. Emergency Procedures
108 pages
BMC Control-M For ZOS 9.0.19 User Guide
100% (1)
BMC Control-M For ZOS 9.0.19 User Guide
846 pages
MP Material by Sravan
No ratings yet
MP Material by Sravan
189 pages
Chapter 4 Thinkers Beliefs and Buildings Notes
100% (1)
Chapter 4 Thinkers Beliefs and Buildings Notes
32 pages
Office Forms 1
No ratings yet
Office Forms 1
18 pages
Baking Tips for Beginners
No ratings yet
Baking Tips for Beginners
7 pages
Tutorial 2 State Space Continuous Time System: T X T y T y T y
No ratings yet
Tutorial 2 State Space Continuous Time System: T X T y T y T y
3 pages
Applied Radiological Anatomy 2nd Semester
No ratings yet
Applied Radiological Anatomy 2nd Semester
7 pages
Lattner Boiler Company - Instruction Manual For He Boilers
100% (1)
Lattner Boiler Company - Instruction Manual For He Boilers
41 pages
Biomass 1
No ratings yet
Biomass 1
22 pages
Commentary 1 Bang Pham
No ratings yet
Commentary 1 Bang Pham
8 pages
VAP Prevention in Critical Care
No ratings yet
VAP Prevention in Critical Care
8 pages
400 DS PRO 310 Rev F Datasheet of VRU
No ratings yet
400 DS PRO 310 Rev F Datasheet of VRU
18 pages
SDG Quiz Answers
100% (2)
SDG Quiz Answers
2 pages
Suction Unit Hospivac User Manual
No ratings yet
Suction Unit Hospivac User Manual
10 pages
Underwater Navigation for Divers
No ratings yet
Underwater Navigation for Divers
2 pages
MHD Power Generation
No ratings yet
MHD Power Generation
15 pages

Decision Tree

Uploaded by

Decision Tree

Uploaded by

DATA MINING.

COURSE INSTRUCTOR : Sheza Naeem

1. Attribute-value description—The data to be analyzed must be in a flat-file form—all information

You might also like