0% found this document useful (0 votes)

47 views43 pages

Lecture 6 - Decision Trees

This document discusses supervised machine learning for classification tasks using decision trees. It explains how decision trees are constructed from training data and can then be used to classify new test data by traversing the tree from the root node down to a leaf node to make a prediction. Decision trees provide an efficient and accurate classification model that can be applied to problems like predicting loan approvals, medical diagnoses, and more.

Uploaded by

Waseem Sajjad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views43 pages

Lecture 6 - Decision Trees

Uploaded by

Waseem Sajjad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

High Impact Skills Development Program

in Artificial Intelligence, Data Science, and Blockchain

Module 1: AI Fundamentals
Lecture 6: Decision Trees

Instructor: Dr Syed Imran Ali

Assistant Professor, SEECS, NUST

Courtesy: Dr. Ahsan Sadaat, Dr. Faisal Shafait and Dr. Adnan ul Hasan 1
Supervised Learning
- Regression
- Classification

2
Classification

• Human learning from past experiences

• A computer does not have “experiences”
• A computer system learns from data, which represent some “past
experiences” of an application domain
• Our focus: learn a target function that can be used to predict the
values of a discrete class attribute, e.g., approve or not-approved,
and high-risk or low risk.
• The task is commonly called: Supervised learning, classification, or
inductive learning.

3
Classification
• Supervised learning: classification is seen as supervised learning
from examples.
• Supervision: The data (observations, measurements, etc.) are
labeled with pre-defined classes.
• Test data are classified into these classes too.
• Goal: To learn a classification model from the data that can be used
to predict the classes of new (future, or test) cases/instances
4
Supervised learning process: two
steps
 Learning (training): Learn a model using the training data
 Testing: Test the model using unseen test data to assess the
model accuracy

Number of correct classifications

Accuracy 
Total number of test cases 5
An example: data (loan application)
Approved or not

6
An example
• Data: Loan application data
• Task: Predict whether a loan should be approved or not.
• Performance measure: accuracy.

No learning: classify all future applications (test data) to the

majority class (i.e., Yes):
Accuracy = 9/15 = 60%.
• We can do better than 60% with learning.
7
Classification: Definition
• Given a collection of records (training set )
• Each record contains a set of attributes, one of the attributes is the
class.
• Find a model for class attribute as a function of the values of other
attributes.
• Goal: previously unseen records should be assigned a class as accurately
as possible.
• A test set is used to determine the accuracy of the model. Usually, the
given data set is divided into training and test sets, with training set
used to build the model and test set used to validate it.
8
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

14 No Small 95K ?

15 No Large 67K ?
10

Test Set 9
Examples of Classification Task
• Predicting tumor cells as benign or malignant

• Classifying credit card transactions

as legitimate or fraudulent

• Classifying secondary structures of protein

as alpha-helix, beta-sheet, or random
coil

• Categorizing news stories as finance,

weather, entertainment, sports, etc 10
Classification Techniques
• Decision Tree based Methods

• Naïve Bayes and Bayesian Belief Networks

• Memory based reasoning

• Neural Networks

• Support Vector Machines

• Rule-based Methods
11
Decision Tree Learning

• Decision tree learning is one of the most widely used

techniques for classification.

• Its classification accuracy is competitive with other methods,

and
• it is very efficient.

• The classification model is a tree, called decision tree.

12
Example of a Decision Tree

Tid Refund Marital Taxable Splitting Attributes

Status Income Cheat

1 Yes Single 125K No

2 No Married 100K No Refund
No
Yes No
3 No Single 70K
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Single, Divorced Married
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data Model: Decision Tree

13
Another Example of Decision Tree

MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than
10
10 No Single 90K Yes
one tree that fits the same
data! 14
Decision Tree Classification Task

15
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No