KEMBAR78
decision tree DECISION TREE IN MACHINE .pptx
Decision Tree
Decision Tree
• Decision tree is a supervised machine
learning
technique, based on the divide and conquer paradigm.
• The basic idea behind decision trees is to partition the space
into patches and to fit a model to a patch.
• A decision tree is a tree structure, where each internal node
(non-leaf node) denotes a test on an attribute, each branch
represents an outcome of the test, and each leaf node (or
terminal node) holds a class label.
• A decision tree is a classifier expressed as a recursive partition
of the instance space.
• Decision trees are used for classification task.
Classification Problem in 2D view
Building Decision Tree
The core algorithm for building decision trees called ID3 employs a top-
down, greedy search through the space of possible branches with no
backtracking. ID3 uses Entropy and Information Gain to construct a
decision tree.
• The tree starts as a single node, N, representing the training records in
D.
• If the records in D are all of the same class, then node N becomes a
leaf and is labeled with that class.
• Otherwise, the algorithm calls Attribute selection method to determine
the splitting criterion. The splitting criterion tells us which attribute to
test at node N by determining the “best” way to separate or partition
the tuples in D into individual classes.
Entropy
Information Gain
• Information gain is Gain(D,A) for a set D is the effective change
in entropy after deciding on a particular attribute A.
𝑮𝒂𝒊𝒏( , ) = ( ) − ( , )
𝑫 𝑨 𝑬 𝑫 𝑬 𝑫 𝑨
• The information gain is the decrease in entropy after a dataset
is split on an attribute.
• Constructing a decision tree is all about finding attribute that
returns the highest information gain (i.e., the most
homogeneous branches).
Decision Tree - Example
Training set
15 Rain High Weak ?
Predict will John
play tennis ?
Day Outlook Humidity Wind Play
1 Sunny High Weak No
2 Sunny High Strong No
3 Cloudy High Weak Yes
4 Rain High Weak Yes
5 Rain Normal Weak Yes
6 Rain Normal Strong No
7 Cloudy Normal Strong Yes
8 Sunny High Weak No
9 Sunny Normal Weak Yes
10 Rain Normal Weak Yes
11 Sunny Normal Strong Yes
12 Cloudy High Strong Yes
13 Cloudy Normal Weak Yes
14 Rain High Strong No
Algorithm
Decision tree
Prediction
Outlook ?
sunny rain
cloudy
Yes
Humidity ?
Yes
No
Wind ?
Yes
No
high normal strong weak
Day Outlook Humidity Wind Play
15 Rain High Weak ?
Predict will John
play tennis ?
Yes
RID Age Salary Employee feedback Purchase
1 <=30 High No Fair No
2 <=30 High No Excellent No
3 31..40 High No Fair Yes
4 >40 Medium No Fair Yes
5 >40 Low Yes Fair Yes
6 >40 Low Yes Excellent No
7 31..40 Low Yes Excellent Yes
8 <=30 Medium No Fair No
9 <=30 Low Yes Fair Yes
10 >40 Medium Yes Fair Yes
11 <=30 Medium Yes Excellent Yes
12 31..40 Medium No Excellent Yes
13 31..40 High Yes Fair Yes
Sample MCQ
Internal nodes of a decision tree correspond to:
• A. Decision
• B. Classes
• C. Data instances
• D. None of the above
Correct Answer: Decision
Leaf nodes of a decision tree correspond to:
• A. Decision
• B. Classes
• C. Data instances
• D. None of the above Accepted
Correct Answer: Classes
Consider the following small data table for two classes of woods. Using
information gain, construct a decision tree to classify the data set.
Answer the following question for the resulting tree.
Which attribute would information gain choose as the root of the tree?
A. Density B. Grain C. Hardness D. None of the above
Correct Answer: Hardness
________is a decision support tool that uses a tree-like graph or
model of decisions and their possible consequences, including
chance event outcomes, resource costs, and utility.
(a) Decision tree
(b) Graphs
(c) Trees
(d) Networks
Correct Answer: Decision tree
Age Competition Type Profi
t
Old Yes Software Dow
n
Old No Software Dow
n
Old No Hardware Dow
n
Mid Yes Software Dow
n
Mid Yes Hardware Dow
n
Mid No Hardware Up
Mid No Software Up
New Yes Software Up
New No Hardware Up
New No Software Up
Exercise

decision tree DECISION TREE IN MACHINE .pptx

  • 1.
  • 2.
    Decision Tree • Decisiontree is a supervised machine learning technique, based on the divide and conquer paradigm. • The basic idea behind decision trees is to partition the space into patches and to fit a model to a patch. • A decision tree is a tree structure, where each internal node (non-leaf node) denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a class label. • A decision tree is a classifier expressed as a recursive partition of the instance space. • Decision trees are used for classification task.
  • 3.
  • 4.
    Building Decision Tree Thecore algorithm for building decision trees called ID3 employs a top- down, greedy search through the space of possible branches with no backtracking. ID3 uses Entropy and Information Gain to construct a decision tree. • The tree starts as a single node, N, representing the training records in D. • If the records in D are all of the same class, then node N becomes a leaf and is labeled with that class. • Otherwise, the algorithm calls Attribute selection method to determine the splitting criterion. The splitting criterion tells us which attribute to test at node N by determining the “best” way to separate or partition the tuples in D into individual classes.
  • 5.
  • 6.
    Information Gain • Informationgain is Gain(D,A) for a set D is the effective change in entropy after deciding on a particular attribute A. 𝑮𝒂𝒊𝒏( , ) = ( ) − ( , ) 𝑫 𝑨 𝑬 𝑫 𝑬 𝑫 𝑨 • The information gain is the decrease in entropy after a dataset is split on an attribute. • Constructing a decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneous branches).
  • 7.
  • 8.
    Training set 15 RainHigh Weak ? Predict will John play tennis ? Day Outlook Humidity Wind Play 1 Sunny High Weak No 2 Sunny High Strong No 3 Cloudy High Weak Yes 4 Rain High Weak Yes 5 Rain Normal Weak Yes 6 Rain Normal Strong No 7 Cloudy Normal Strong Yes 8 Sunny High Weak No 9 Sunny Normal Weak Yes 10 Rain Normal Weak Yes 11 Sunny Normal Strong Yes 12 Cloudy High Strong Yes 13 Cloudy Normal Weak Yes 14 Rain High Strong No
  • 9.
  • 19.
  • 20.
    Prediction Outlook ? sunny rain cloudy Yes Humidity? Yes No Wind ? Yes No high normal strong weak Day Outlook Humidity Wind Play 15 Rain High Weak ? Predict will John play tennis ? Yes
  • 21.
    RID Age SalaryEmployee feedback Purchase 1 <=30 High No Fair No 2 <=30 High No Excellent No 3 31..40 High No Fair Yes 4 >40 Medium No Fair Yes 5 >40 Low Yes Fair Yes 6 >40 Low Yes Excellent No 7 31..40 Low Yes Excellent Yes 8 <=30 Medium No Fair No 9 <=30 Low Yes Fair Yes 10 >40 Medium Yes Fair Yes 11 <=30 Medium Yes Excellent Yes 12 31..40 Medium No Excellent Yes 13 31..40 High Yes Fair Yes
  • 22.
    Sample MCQ Internal nodesof a decision tree correspond to: • A. Decision • B. Classes • C. Data instances • D. None of the above Correct Answer: Decision
  • 23.
    Leaf nodes ofa decision tree correspond to: • A. Decision • B. Classes • C. Data instances • D. None of the above Accepted Correct Answer: Classes
  • 24.
    Consider the followingsmall data table for two classes of woods. Using information gain, construct a decision tree to classify the data set. Answer the following question for the resulting tree. Which attribute would information gain choose as the root of the tree? A. Density B. Grain C. Hardness D. None of the above Correct Answer: Hardness
  • 25.
    ________is a decisionsupport tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. (a) Decision tree (b) Graphs (c) Trees (d) Networks Correct Answer: Decision tree
  • 26.
    Age Competition TypeProfi t Old Yes Software Dow n Old No Software Dow n Old No Hardware Dow n Mid Yes Software Dow n Mid Yes Hardware Dow n Mid No Hardware Up Mid No Software Up New Yes Software Up New No Hardware Up New No Software Up Exercise