0% found this document useful (0 votes)

17 views59 pages

CSC 325 AI Lecture07 Supervised Learning

The document covers Lecture 7 of CSC-325 on Supervised Learning, detailing its definition, algorithms, and classification methods. It explains how supervised learning infers functions from training data and introduces decision trees as a classification method. Additionally, it discusses the ID3 algorithm for decision tree induction and various attribute selection measures.

Uploaded by

Abdul Basit Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views59 pages

CSC 325 AI Lecture07 Supervised Learning

Uploaded by

Abdul Basit Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

CSC-325

Artificial Intelligence

Lecture - 7:
Supervised Learning

Dr. Muhammad Tariq Siddique

Department of Computer Sciences
Bahria University, Karachi Campus, Pakistan
tariqsiddique.bukc@bahria.edu.pk
Faculty room 14, 2nd Floor, Iqbal Block
Topics to be covered
Looping
1 Learning

2 Supervised Learning

3 Classification

4 Decision Tree

5 Regression

6 Linear Regression

7 Tutorial

8
Overview
Section - 1
Develop predictive
model based on Machine Learning Discover an internal
both input and representation from
output data input data only

Reinforcement
Supervised Unsupervised
(Algorithms learn to react
(Task Driven) (Data Driven)
an environment)

Regression Dimension
Classification Clustering Decision Process
Reduction

Reward System
Decision Tree, NN, kMean, Kmedoids,
Naïve Bayes, KNN, Fuzzy C-means,
SVM, Discriminant Hierarchical, SOM,
Analysis, Ensemble Hidden Markov Model,
Methods, Random
Forest
Gaussian Mixture Recommendation
Ordinary LSR, Linear Systems
Regression,
Logistic PCA, LDA
Regression, MARS,
LOESS
Traditional Programming vs. Machine Learning
multiply a number by itself
int square(int x) Not an efficient
{ approach when
return x*x; rules (mathematical
description)
}
becomes complex!

Column_1 Column_2
2 4
Training
Set 11 121
25 625 multiply a number by itself

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 5

What is Learning
▪ Webster's definition of “to learn”
“To gain knowledge or understanding of, or skill in by study, instruction or
experience''
❑ Learning a set of new facts
❑ Learning HOW to do something
❑ Improving ability of something already learned

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 6

Why “Learn” ?
▪ There is no need to “learn” to calculate payroll
▪ Learning is used when:
• Human expertise does not exist
• Humans are unable to explain their expertise (speech recognition)
• Solution changes in time (routing on a computer network)
• Solution needs to be adapted to particular cases (user biometrics)

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 7

Learning
▪ Examples
• Walking (motor skills)
• Riding a bike (motor skills)
• Telephone number (memorizing)
• Playing backgammon (strategy)
• Develop scientific theory (abstraction)
• Language
• Recognize fraudulent credit card transactions
• Etc.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 8

Supervised Learning
Section - 2
How Supervised Machine Learning Works?

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 10

Supervised Learning
▪ Supervised learning is the machine learning task of inferring a function
from supervised training data.
▪ The training data consist of a set of training examples.
▪ In supervised learning, each example is a pair consisting of an input
object (typically a vector) and a desired output value (also called the
supervisory signal).

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 11

Supervised Learning Algorithms
A supervised learning algorithm analyzes the training data and produces

▪ (Inferred function) classifier

• If the output is discrete.
OR
▪ Regression function
• If the output is continuous

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 12

Working of supervised learning algorithms

Given a set of training examples of the form:

{ (x1,y1), . . ., (xN, yN)}
a learning algorithm seeks a function
g:X→Y
where X is the input space and Y is the output space and the function g is
an element of some space of possible functions G, usually called the
hypothesis space.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 13

Classification
Section - 3
Introduction to classification
❖ Classification is a supervised learning method.
❖ It assigns items in a collection to target categories or classes.
❖ The goal of classification is to accurately predict the target class for each case in the
data.
❖ For example, a classification model could be used to identify loan applicants as low,
medium, or high credit risks.
❖ Suppose a Database D is given as D = {t1,t2,..tn} and a set of desired classes are
C={C1,…,Cm}.
❖ The classification problem is to define the mapping m in such a way that which tuple
of database D belongs to which class of C.
❖ Actually we divides D into equivalence classes.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 15

Classification Example
▪ Identify individuals with credit risks (high, low, medium or unknown).
▪ In cricket (batsman, bowler, all-rounder)
▪ Websites (educational, sports, music)
▪ Teachers classify students grades as A,B,C,D or F.
• How teachers give grades to students based on their obtained marks? x
<90 >=90
❑ If x >= 90 then A grade.
❑ If 80 =< x < 90 then B grade. x A
<80 >=80
❑ If 70 =< x < 80 then C grade.
x B
❑ If 60 =< x < 70 then D grade.
<70 >=70
❑ If x < 60 then F grade.
x C
<60 >=60
F D

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 16

Classification : a two step process
1) Model Construction
Classification
Algorithms
Training
Data

Name Rank Year Tenured

Classifier
s
(Model)
Mike Asst. Prof. 3 No
Mary Asst. Prof. 7 Yes
Bill Prof. 2 Yes If Rank = ‘professor’
Jim Asso. Prof. 7 Yes OR year > 6
THEN tenured = ‘yes’
Dave Asst. Prof. 6 No
Anne Asso. Prof. 3 No

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 17

Classification : a two step process (Cont..)
2) Model Usage
Classifier
Testing
Data

Unseen
Data
Name Rank Year Tenured
s
Tom Asst. Prof. 2 No
Merlisa Asso. Prof. 7 No (Jeff, Professor, 4)

George Prof. 5 Yes

Tenured?
Joseph Asst. Prof. 7 Yes
Yes

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 18

Classification : a two step process (Cont..)
1) Model Construction
• Describing a set of predetermined classes :
o Each tuple/sample is assumed to belong to a predefined class, as determined by the class
label attribute.
o The set of tuples used for model construction is called as training set.
o The model is represented as classification rules, decision trees, or mathematical formulae.
2) Model Usage
• For classifying future or unknown objects
o Estimate accuracy of the model
o The known label of test sample is compared with the classified result from the model.
o Accuracy rate is the percentage of test set samples that are correctly classified by the model.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 19

Different Types of Classifiers
▪ Back propagation
▪ Bayesian Classifiers
▪ Decision Trees
▪ Density estimation methods
▪ Fuzzy set theory
▪ Linear discriminant analysis (LDA)
▪ Logistic regression
▪ Naive bayes classifier
▪ Nearest Neighborhood Classification
▪ Neural networks
▪ Quadratic discriminant analysis (QDA)
▪ Support Vector Machine
▪ many more…
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 20
Decision Tree
Section - 4
Definition
• Decision tree induction is the learning of
decision trees from class-labeled training
tuples.
• A decision tree is a flowchart-like tree
structure, where each internal node (non-
leaf node) denotes a test on an attribute.
• Each branch represents an outcome of
the test.
• Each leaf node (or terminal node) holds a
class label.
• The topmost node in a tree is the root
node.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 22

ID3(Iterative Dichotomizer) Tree
Algorithm
Section - 5
ID3 Tree Algorithm
• A decision tree algorithm developed by J.Ross Quinlan in the early 1980s.
• A greedy approach in which decision trees are constructed in a top-down
recursive divide-and-conquer manner.
• Top-down approach starts with a training set of tuples and their associated
class labels.
▪ The training set is recursively partitioned into smaller subsets as the tree is
being built.
▪ The algorithm is called with 3 parameters:
❑ D (the data partition),
❑ Attribute_list, and
❑ Attribute_selection_method.
• Initially the D is the complete set of training tuples and their associated class labels.
• Attribute_list is a list of attributes describing the tuples.
• Attribute_selection_method is the procedure for selecting the attribute that best discriminates
the given tuples according to class.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 24

Decision Tree Induction - Attribute Selection Measures
▪ An attribute selection measure is a heuristic for selecting the splitting
criterion that “best” separates a given data partition, D, of class-labeled
training tuples into individual classes.
▪ Also known as splitting rules as they determine how the tuples at a given
node are to be split.
▪ The tree node created for partition D is labeled with the splitting criterion,
branches are grown for each outcome of the criterion, and the tuples are
partitioned accordingly.
▪ Three popular attribute selection measures
1. Information gain
2. Gain ratio
3. Gini index

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 25

Information Gain
▪ ID3 uses information gain as its attribute selection measure.
▪ Entropy, characterizes the (im)purity of an arbitrary collection of examples.
▪ Let pi be the probability that a tuple belongs to class Ci, estimated by |Ci,D|/|D|
▪ Expected information (entropy) needed to classify a tuple in D:
𝑚

𝐼𝑛𝑓𝑜 𝐷 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = − ෍ 𝑝𝑖 log 2 ( 𝑝𝑖 )

𝑖=1

▪ Information needed (after using A to split D into v partitions) to classify D:

𝑣
|𝐷𝑗 |
𝐼𝑛𝑓𝑜𝐴 (𝐷) = ෍ × 𝐼𝑛𝑓𝑜(𝐷𝑗 )
|𝐷|
𝑗=1

▪ Information gained by branching on attribute A:

𝐺𝑎𝑖𝑛(𝐴) = 𝐼𝑛𝑓𝑜(𝐷) − 𝐼𝑛𝑓𝑜𝐴 (𝐷)
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 26
Decision Tree Induction – Training Data and Output

RID age income student Credit_rating Class: buys_computer

1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes A decision tree for the concept buys computer,
13 middle aged high yes fair yes indicating whether an AllElectronics customer is likely
14 senior medium no excellent no to purchase a computer. Each internal (nonleaf) node
represents a test on an attribute. Each leaf node
Class-labelled training tuples from AllElectronics represents a class (either buys computer = yes or buys
Customer Database computer = no).
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 27
Decision Tree Induction - Attribute Selection Measures

RID age income student Credit_rating Class: buys_computer  Class P: buys_computer = “yes”
1 youth high no fair no
 Class N: buys_computer = “no”
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes P = Yes = 9 N = No = 5
6 senior low yes excellent no
7 middle aged low yes excellent yes 9 9 5 5
8 youth medium no fair no
Info( D) = I (9,5) = − log 2 ( ) − log 2 ( ) =0.940
14 14 14 14
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 28

Decision Tree Induction - Attribute Selection Measures

RID age income student Credit_rating Class: buys_computer

1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes S[9+, 5-] youth [2+,3-]
8 youth medium no fair no Age middle aged [4+,0-]
9 youth low yes fair yes senior [3+,2-]
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 29

Decision Tree Induction - Attribute Selection Measures

RID age income student Credit_rating Class: buys_computer

1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes S[9+, 5-] youth [2+,3-]
8 youth medium no fair no Age middle aged [4+,0-]
9 youth low yes fair yes senior [3+,2-]
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 30

Decision Tree Induction - Attribute Selection Measures

RID age income student Credit_rating Class: buys_computer

1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes S[9+, 5-] youth [2+,3-]
8 youth medium no fair no Age middle aged [4+,0-]
9 youth low yes fair yes senior [3+,2-]
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 31

Decision Tree Induction - Attribute Selection Measures

RID age income student Credit_rating Class: buys_computer

1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
S[9+, 5-] youth [2+,3-]
5 senior low yes fair yes
Age middle aged [4+,0-]
6 senior low yes excellent no
senior [3+,2-]
7 middle aged low yes excellent yes
8 youth medium no fair no Gain(Age) = 0.94 – 5/14[-2/5(log2(2/5))-3/5(log2(3/5))]
9 youth low yes fair yes – 4/14[-4/4(log2(4/4))-0/4(log2(0/4))]
10 senior medium yes fair yes – 5/14[-3/5(log2(3/5))-2/5(log2(2/5))]
11 youth medium yes excellent yes
= 0.94 – 0.69 = 0.25
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 32

Decision Tree Induction - Attribute Selection Measures
S[9+, 5-] high [2+,2-]
RID age income student Credit_rating Class: buys_computer
income medium [4+,2-]
1 youth high no fair no
2 youth high no excellent no low [3+,1-]
3 middle aged high no fair yes Gain(income) = 0.94 – 4/14[-2/4(log2(2/4))-2/4(log2(2/4))]
4 senior medium no fair yes – 6/14[-4/6(log2(4/6))-2/6(log2(2/6))]
5 senior low yes fair yes – 4/14[-3/4(log2(3/4))-1/4(log2(1/4))]
6 senior low yes excellent no
7 middle aged low yes excellent yes = 0.94 – 0.91 = 0.03
8 youth medium no fair no yes [6+, 1-]
9 youth low yes fair yes S[9+, 5-] student
10 senior medium yes fair yes no [3+, 4-]
11 youth medium yes excellent yes
12 middle aged medium no excellent yes Gain(student) = 0.94 – 7/14[-6/7(log2(6/7))-1/7(log2(1/7))]
13 middle aged high yes fair yes – 7/14[-3/7(log2(3/7))-4/7(log2(4/7))]
14 senior medium no excellent no = 0.94 – 0.78 = 0.16

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 33

Decision Tree Induction - Attribute Selection Measures

RID age income student Credit_rating Class: buys_computer

1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes S[9+, 5-] fair [6+, 2-]
5 senior low yes fair yes credit_rating
6 senior low yes excellent no excellent [3+, 3-]
7 middle aged low yes excellent yes
8 youth medium no fair no Gain(credit_rating) = 0.94 – 8/14[-6/8(log2(6/8))-2/8(log2(2/8))]
9 youth low yes fair yes – 6/14[-3/6(log2(3/6))-3/6(log2(3/6))]
10 senior medium yes fair yes = 0.94 – 0.89 = 0.05
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 34

Decision Tree Induction - Attribute Selection Measures
Attribute Information
Gain
Age 0.25
Income 0.03
Student 0.15
Credit_rating 0.05

Since Age has the highest

Information Gain we start
splitting the dataset using
the age attribute.

labeled yes

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 35

Decision Tree Induction - Attribute Selection Measures

Since all records under the branch

middle_aged are all of the class, Yes, we
can replace the leaf with class, Yes

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 36

Decision Tree Induction - Attribute Selection Measures

Now build the decision tree for the left subtree:

The mutual information is

I(2 Yes, 3 No)= I(2,3)= -2/5 log2(2/5) – 3/5 log2(3/5)=0.97

yes [2+, 0-]

S[2+, 3-] high [0+,2-]
S[2+, 3-] student
income medium [1+,1-]
no [0+, 3-]
low [1+,0-]

Gain(income) = 0.97 – 2/5[-0/2(log2(0/2))-2/2(log2(2/2))] Gain(student) = 0.97 – 2/5[-2/2(log2(2/2))-0/2(log2(0/2))]

– 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] – 3/5[-0/3(log2(0/3))-3/3(log2(3/3))]
– 1/5[-1/1(log2(1/1))-0/1(log2(0/1))] = 0.97 – 0.0 = 0.97
= 0.97 – 0.40 = 0.57

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 37

Decision Tree Induction - Attribute Selection Measures

Since these two new branches are from distinct classes, we make them into leaf nodes with their
respective class as label.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 38

Decision Tree Induction - Attribute Selection Measures

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 39

Decision Tree Induction - Attribute Selection Measures

Now build the decision tree for the right subtree

The mutual information is

I(3,2)= -3/5 log2(3/5) – 2/5 log2(2/5)=0.97

S[3+, 2-] medium [2,1-]

income
low [1+,1-]

Gain(income) = 0.97 – 3/5[-2/3(log2(2/3))-1/3(log2(1/3))]

– 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] fair [3+, 0-]
= 0.97 – 0.95 = 0.02 S[3+, 2-] credit_rating
excellent [0+, 2-]
yes [2+, 1-] Gain(credit_rating) = 0.97 – 3/5[-3/3(log2(3/3))-0/3(log2(0/3))]
S[3+, 2-] student
no [1+, 1-] – 2/5[-0/2(log2(0/2))-2/2(log2(2/2))]
= 0.97 – 0.00 = 0.97
Gain(student) = 0.97 – 3/5[-2/3(log2(2/3))-1/3(log2(1/3))]
– 2/5[-1/2(log2(1/2))-1/2(log2(1/2))]
= 0.97 – 0.95 = 0.02
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 40
Decision Tree Induction – Attribute Selection Measure

Decision Tree Induction - Attribute Selection Measures
• We then split based on credit_rating.
• These splits give partitions each with records from the same class
• make these into leaf nodes with their class label attached

New example: age<=30, income=medium, student=yes, credit-rating=fair

Follow branch(age<=30) then student=yes we predict Class=yes Buys_computer = yes

Decision Tree Induction - Tree Pruning
• Overfitting: An induced tree may overfit the training data
– Too many branches, some may reflect anomalies due to noise or outliers
– Poor accuracy for unseen samples
• Two approaches to avoid overfitting
– Prepruning: Halt tree construction early
– do not split a node if this would result in the goodness measure falling below a
threshold
Difficult to choose an appropriate threshold
•
– Postpruning: Remove branches from a “fully grown” tree
– get a sequence of progressively pruned trees
• Use a set of data different from the training data to decide which is the “best
pruned tree”

Decision Tree Induction - Tree Pruning

Rule Extraction from a Decision Tree
▪ Rules are easier to understand than large trees.
▪ One rule is created for each path form the root to
leaf
▪ Each attribute-value pair along a path forms a
conjunction(ANDed): the leaf holds the class
prediction (THEN)
▪ Rules are mutually exclusive and exhaustive
Example: Rule extraction from our buy_computer decision-tree

R1: IF age = youth AND student = no THEN buy_computer = no

R2: IF age = youth AND student = yes THEN buy_computer = yes
R3: IF age = middle aged THEN buy_computer = yes
R4: IF age = senior AND credit rating = fair THEN buy_computer = yes
R5: IF age = senior AND credit rating = excellent THEN buy_computer = no

Rule Extraction from a Decision Tree

• Rules represent information and knowledge

• IF you study well THEN you’ll succeed

• IF you’re a student AND you have 5000 USD THEN you most probably will buy an
iPad (confidence?)

• How to assess the goodness of a rule?

𝑛𝑐𝑜𝑣𝑒𝑟𝑠
𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒 𝑅 =
𝐷
𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅 =
𝑛𝑐𝑜𝑣𝑒𝑟𝑠
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 46
Rule Extraction from a Decision Tree

RID age income student Credit_rating Class: buys_computer

1 youth high no fair no
R2: If (age =youth) AND (student =yes)
2 youth high no excellent no
3 middle aged high no fair yes THEN (buys computer = yes)
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no 2
𝒄𝒐𝒗𝒆𝒓𝒂𝒈𝒆 𝑅1 = = 14.28%
7 middle aged low yes excellent yes 14
8 youth medium no fair no
9 youth low yes fair yes 2
𝒂𝒄𝒄𝒖𝒓𝒂𝒄𝒚 𝑅1 = = 100%
10 senior medium yes fair yes 2
11 youth medium yes excellent yes
12 middle aged medium no excellent yes X: (age = youth, income = medium, student
13 middle aged high yes fair yes = yes, credit_rating=fair)
14 senior medium no excellent no

Prediction
Section - 5
What Is Prediction?
▪ Prediction is similar to classification
• First, construct a model
• Second, use model to predict unknown value
❑ Major method for prediction is regression
• Linear and multiple regression
• Non-linear regression

▪ Prediction is different from classification

• Classification refers to predict categorical class label
• Prediction models continuous-valued functions

Linear and Multiple Regression Analysis
▪ Linear regression: Y =  +  X
• Two parameters ,  and  specify the line and are to be estimated by using the data at
hand.
• using the least squares criterion to the known values of Y1, Y2, …, X1, X2, ….
s _ _

 ( x − x)( yi i − y)
= i =1
s _

 i
( x
i =1
− x ) 2

_ _
 = y−  x
▪ Multiple regression: Y = b0 + b1 X1 + b2 X2.
• Many nonlinear functions can be transformed into the above.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 50
Example: Linear Regression Analysis
• Given data
x (Years y x - 𝑥ҧ y - 𝑦ത (x - 𝑥ҧ ) * (y - 𝑦)
ത (x - 𝑥ҧ )2
β = 1281.3/358.9
Experience) (Salery) = 3.5
3 30K -6.1 -25.4 154.94 37.21 α = 55.4 – (3.5 * 9.1) = 23.6
8 57K -1.1 1.6 -1.76 1.21
9 64K -0.1 8.6 -0.86 0.01
13 75K 3.9 19.6 76.44 15.21
3 36K -6.1 -19.4 118.34 37.21
6 43K -3.1 -12.4 38.44 9.61
11 59K 1.9 3.6 6.84 3.61
21 90K 11.9 34.6 411.74 141.61
1 20K -8.1 -35.4 286.74 65.61
16 83K 6.9 27.6 190.44 47.61 y = 23.6 + 3.5x
𝑥ҧ = 9.1 𝑦ത = 55.4 Σ 1281.3 358.9

Example: Linear Regression Analysis
X (Years Y (Salary) Y (Predicted) Mean Absolute Error Root Mean Squared Error
experience) [Y=23.6 +3.5X]

3 30K 34.1K abs(Y-Yp)=4.1 Sqr(Y-Yp)=16.81

8 57K 51.6K 5.4 29.16
9 64K 55.1K 8.9 79.21
13 75K 69.1K 5.9 34.81
3 36K 34.1K 1.9 3.61
6 43K 44.6K 1.6 2.56
11 59K 62.1K 3.1 9.61
21 90K 97.1K 7.1 50.41
1 20K 27.1K 7.1 50.41
16 83K 79.6K 3.4 11.56

4.85 5.37

Exercise (Do it yourself)
The data below are the midterm and final scores of 20 students from an online AI
course. We want to know how well final exam scores can be predicted based on the
midterm scores.

Midterm, x 15 35 45 50 50 55 60 60 65 70
Final, y 22 35 46 68 39 56 48 92 92 56
Midterm, x 70 70 70 80 80 85 85 85 95 100
Final, y 48 81 50 51 67 88 72 88 88 100

▪ Fit the linear regression line relating the midterm and the final scores.
▪ Plot the data and regression line.
▪ Estimate the mean absolute and root mean square errors for the predicted final
scores.

Nonlinear Regression
▪ Often the relationship between x and y
cannot be approximated with a straight
line or curve for that nonlinear regression
technique may be used.
▪ Alternatively, the data could be
preprocessed to make the relationship
linear.

Logistic Regression
▪ A linear regression is not appropriate for
predicting the value of a binary variable for two
reasons:
• A linear regression will predict values outside the
acceptable range (e.g. predicting probabilities
outside the range 0 to 1).
• Since the experiments can only have one of two
possible values for each experiment, the
residuals(random errors) will not be normally
distributed about the predicted line.
▪ A logistic regression produces a logistic curve,
which is limited to values between 0 and 1.
▪ Logistic regression is similar to a linear
regression, but the curve is constructed using the
natural logarithm “odds” of the target variable,
rather than the probability.

Classification vs Regression
• Classification means to group the output • Regression means to predict the
into a class. output value using training data.
• classification to predict the type of
• regression to predict the house
tumor i.e. harmful or not harmful using
training data price from training data
• if it is discrete/categorical variable, then • if it is a real number/continuous,
it is classification problem. then it is regression problem.

Tutorial
Section - 6
TUTORIAL 07
1. Compare traditional programming with machine learning.
2. What is learning? Why to learn?
3. What is supervised learning? What are supervised learning algorithms?
4. What is classifier? Explain with an example.
5. What is decision tree? What are pros and cons of decision tree?
6. What are attribute selection measures?
7. What are the criteria to prune a tree?
8. How are rules extracted from decision tree? Given an example.
9. How the goodness of rule is measure?
10. What is prediction? How it is different from classification?

Artificial Intelligence CSC-325

Lecture – 7 Thank Any

Supervised Learning Questions ?
You

Dr. Muhammad Tariq Siddique

Department of Computer Sciences
Bahria University, Karachi Campus, Pakistan
tariqsiddique.bukc@bahria.edu.pk
Faculty room 14, 2nd Floor, Iqbal Block

Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
60 pages
Classification Algorithms
No ratings yet
Classification Algorithms
23 pages
Machine Learning and Decision Trees
No ratings yet
Machine Learning and Decision Trees
30 pages
Learning AI
No ratings yet
Learning AI
34 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
141 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Classification and Prediction
No ratings yet
Classification and Prediction
130 pages
Data Mining Book
No ratings yet
Data Mining Book
84 pages
Classification
No ratings yet
Classification
33 pages
CCST9017 (2023-24lecture11printed Version) MachineLearning
No ratings yet
CCST9017 (2023-24lecture11printed Version) MachineLearning
55 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Artificial Intelligence: Slide 6
100% (1)
Artificial Intelligence: Slide 6
42 pages
Supervised Learning
No ratings yet
Supervised Learning
41 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
Classification
No ratings yet
Classification
81 pages
Learning
No ratings yet
Learning
51 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Classification-1
No ratings yet
Classification-1
48 pages
Machine Learning - Iii
No ratings yet
Machine Learning - Iii
53 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Unit 6 Classification and Prediction
No ratings yet
Unit 6 Classification and Prediction
66 pages
Decision Tree - 1
No ratings yet
Decision Tree - 1
31 pages
Data Mining and Classification Basics
No ratings yet
Data Mining and Classification Basics
129 pages
Classification
No ratings yet
Classification
23 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
41 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
37 pages
ML Notes - 2025
No ratings yet
ML Notes - 2025
145 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
L11 Slides
No ratings yet
L11 Slides
28 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
DMDW 11 Classification Basic
No ratings yet
DMDW 11 Classification Basic
41 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
79 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
83 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Introduction To ML
No ratings yet
Introduction To ML
31 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
19 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
Naive Baysian SVM and K Nearest Neighbour
No ratings yet
Naive Baysian SVM and K Nearest Neighbour
166 pages
Machine Learning Colloquium Guide
No ratings yet
Machine Learning Colloquium Guide
12 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Supervised Learning Part1
No ratings yet
Supervised Learning Part1
42 pages
Preference Form
No ratings yet
Preference Form
2 pages
Spedding 1988
No ratings yet
Spedding 1988
12 pages
BS-1868 2010
100% (1)
BS-1868 2010
28 pages
Addressing Modes
No ratings yet
Addressing Modes
7 pages
Ce 365 MCQ 2025 PRELIM REVISED
No ratings yet
Ce 365 MCQ 2025 PRELIM REVISED
48 pages
Waterstop Solutions for Construction
No ratings yet
Waterstop Solutions for Construction
12 pages
Cube-Voyager - Technical Brochure
No ratings yet
Cube-Voyager - Technical Brochure
3 pages
Narayana 16-06-2022 - Outgoing SR - Jee Main Model Gtm-11 - Sol
No ratings yet
Narayana 16-06-2022 - Outgoing SR - Jee Main Model Gtm-11 - Sol
20 pages
Lec 13
No ratings yet
Lec 13
18 pages
Zeroth Review PPT Template (20-24)
No ratings yet
Zeroth Review PPT Template (20-24)
15 pages
Ambarella CV2S66 Preliminary Datasheet
No ratings yet
Ambarella CV2S66 Preliminary Datasheet
88 pages
Module 1 Highway and Railroad Engg
100% (1)
Module 1 Highway and Railroad Engg
21 pages
Bochaver Et Al 20221687066806424
No ratings yet
Bochaver Et Al 20221687066806424
17 pages
Pre-Concept Design Report PDF
No ratings yet
Pre-Concept Design Report PDF
434 pages
Getting More From Less: Large Language Models Are Good Spontaneous Multilingual Learners
No ratings yet
Getting More From Less: Large Language Models Are Good Spontaneous Multilingual Learners
14 pages
Pma - ks98 2 2 Us 1802 - Dat
No ratings yet
Pma - ks98 2 2 Us 1802 - Dat
10 pages
Thyristor Three-Phase Rectifier/Inverter Guide
100% (1)
Thyristor Three-Phase Rectifier/Inverter Guide
8 pages
Special Purpose Diodes Overview
No ratings yet
Special Purpose Diodes Overview
131 pages
CNC Machine Control Guide
No ratings yet
CNC Machine Control Guide
2 pages
Ciit VC Date Sheet (1st Sessional April 2014)
No ratings yet
Ciit VC Date Sheet (1st Sessional April 2014)
11 pages
Rectifiers & Voltage Regulating Filters
No ratings yet
Rectifiers & Voltage Regulating Filters
10 pages
Method 8242 Heterotrophic Bacteria
No ratings yet
Method 8242 Heterotrophic Bacteria
6 pages
Numericals
No ratings yet
Numericals
41 pages
Motion Concepts for Students
No ratings yet
Motion Concepts for Students
58 pages
Grade 8 Cbse Math 2nd Term Sample Paper 1
100% (1)
Grade 8 Cbse Math 2nd Term Sample Paper 1
2 pages
Bms Major Project
No ratings yet
Bms Major Project
11 pages
Process Costing Weighted-Average Worksheet
No ratings yet
Process Costing Weighted-Average Worksheet
5 pages
L 17 - Thermodynamics (2) : Today's Topics
No ratings yet
L 17 - Thermodynamics (2) : Today's Topics
25 pages
SEMIKRON DataSheet SKiiP 613 GD123 3DUL V3 20452211
No ratings yet
SEMIKRON DataSheet SKiiP 613 GD123 3DUL V3 20452211
7 pages
O Level Physics 5054 - 21 Paper 2 May - June 2023
No ratings yet
O Level Physics 5054 - 21 Paper 2 May - June 2023
19 pages

CSC 325 AI Lecture07 Supervised Learning

Uploaded by

CSC 325 AI Lecture07 Supervised Learning

Uploaded by

CSC-325

Dr. Muhammad Tariq Siddique

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 5

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 6

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 7

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 8

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 10

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 11

▪ (Inferred function) classifier

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 12

Given a set of training examples of the form:

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 13

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 15

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 16

Name Rank Year Tenured

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 17

George Prof. 5 Yes

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 18

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 19

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 22

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 24

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 25

𝐼𝑛𝑓𝑜 𝐷 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = − ෍ 𝑝𝑖 log 2 ( 𝑝𝑖 )

▪ Information needed (after using A to split D into v partitions) to classify D:

▪ Information gained by branching on attribute A:

RID age income student Credit_rating Class: buys_computer

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 28

RID age income student Credit_rating Class: buys_computer

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 29

RID age income student Credit_rating Class: buys_computer

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 30

RID age income student Credit_rating Class: buys_computer

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 31

RID age income student Credit_rating Class: buys_computer

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 32

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 33

RID age income student Credit_rating Class: buys_computer

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 34

Since Age has the highest

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 35

Since all records under the branch

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 36

Now build the decision tree for the left subtree:

The mutual information is

yes [2+, 0-]

Gain(income) = 0.97 – 2/5[-0/2(log2(0/2))-2/2(log2(2/2))] Gain(student) = 0.97 – 2/5[-2/2(log2(2/2))-0/2(log2(0/2))]

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 37

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 38

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 39

Now build the decision tree for the right subtree

The mutual information is

S[3+, 2-] medium [2,1-]

Gain(income) = 0.97 – 3/5[-2/3(log2(2/3))-1/3(log2(1/3))]

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 41

New example: age<=30, income=medium, student=yes, credit-rating=fair

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 42

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 43

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 44

R1: IF age = youth AND student = no THEN buy_computer = no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 45

• Rules represent information and knowledge

• How to assess the goodness of a rule?

RID age income student Credit_rating Class: buys_computer

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 47

▪ Prediction is different from classification

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 49

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 51

3 30K 34.1K abs(Y-Yp)=4.1 Sqr(Y-Yp)=16.81

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 52

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 53

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 54

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 55

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 56

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 58

Lecture – 7 Thank Any

Dr. Muhammad Tariq Siddique