KEMBAR78
CSC 325 AI Lecture07 Supervised Learning | PDF | Machine Learning | Statistical Classification
0% found this document useful (0 votes)
17 views59 pages

CSC 325 AI Lecture07 Supervised Learning

The document covers Lecture 7 of CSC-325 on Supervised Learning, detailing its definition, algorithms, and classification methods. It explains how supervised learning infers functions from training data and introduces decision trees as a classification method. Additionally, it discusses the ID3 algorithm for decision tree induction and various attribute selection measures.

Uploaded by

Abdul Basit Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views59 pages

CSC 325 AI Lecture07 Supervised Learning

The document covers Lecture 7 of CSC-325 on Supervised Learning, detailing its definition, algorithms, and classification methods. It explains how supervised learning infers functions from training data and introduces decision trees as a classification method. Additionally, it discusses the ID3 algorithm for decision tree induction and various attribute selection measures.

Uploaded by

Abdul Basit Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

CSC-325

Artificial Intelligence

Lecture - 7:
Supervised Learning

Dr. Muhammad Tariq Siddique


Department of Computer Sciences
Bahria University, Karachi Campus, Pakistan
tariqsiddique.bukc@bahria.edu.pk
Faculty room 14, 2nd Floor, Iqbal Block
Topics to be covered
Looping
1 Learning

2 Supervised Learning

3 Classification

4 Decision Tree

5 Regression

6 Linear Regression

7 Tutorial

8
Overview
Section - 1
Develop predictive
model based on Machine Learning Discover an internal
both input and representation from
output data input data only

Reinforcement
Supervised Unsupervised
(Algorithms learn to react
(Task Driven) (Data Driven)
an environment)

Regression Dimension
Classification Clustering Decision Process
Reduction

Reward System
Decision Tree, NN, kMean, Kmedoids,
Naïve Bayes, KNN, Fuzzy C-means,
SVM, Discriminant Hierarchical, SOM,
Analysis, Ensemble Hidden Markov Model,
Methods, Random
Forest
Gaussian Mixture Recommendation
Ordinary LSR, Linear Systems
Regression,
Logistic PCA, LDA
Regression, MARS,
LOESS
Traditional Programming vs. Machine Learning
multiply a number by itself
int square(int x) Not an efficient
{ approach when
return x*x; rules (mathematical
description)
}
becomes complex!

Column_1 Column_2
2 4
Training
Set 11 121
25 625 multiply a number by itself

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 5


What is Learning
▪ Webster's definition of “to learn”
“To gain knowledge or understanding of, or skill in by study, instruction or
experience''
❑ Learning a set of new facts
❑ Learning HOW to do something
❑ Improving ability of something already learned

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 6


Why “Learn” ?
▪ There is no need to “learn” to calculate payroll
▪ Learning is used when:
• Human expertise does not exist
• Humans are unable to explain their expertise (speech recognition)
• Solution changes in time (routing on a computer network)
• Solution needs to be adapted to particular cases (user biometrics)

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 7


Learning
▪ Examples
• Walking (motor skills)
• Riding a bike (motor skills)
• Telephone number (memorizing)
• Playing backgammon (strategy)
• Develop scientific theory (abstraction)
• Language
• Recognize fraudulent credit card transactions
• Etc.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 8


Supervised Learning
Section - 2
How Supervised Machine Learning Works?

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 10


Supervised Learning
▪ Supervised learning is the machine learning task of inferring a function
from supervised training data.
▪ The training data consist of a set of training examples.
▪ In supervised learning, each example is a pair consisting of an input
object (typically a vector) and a desired output value (also called the
supervisory signal).

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 11


Supervised Learning Algorithms
A supervised learning algorithm analyzes the training data and produces

▪ (Inferred function) classifier


• If the output is discrete.
OR
▪ Regression function
• If the output is continuous

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 12


Working of supervised learning algorithms

Given a set of training examples of the form:


{ (x1,y1), . . ., (xN, yN)}
a learning algorithm seeks a function
g:X→Y
where X is the input space and Y is the output space and the function g is
an element of some space of possible functions G, usually called the
hypothesis space.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 13


Classification
Section - 3
Introduction to classification
❖ Classification is a supervised learning method.
❖ It assigns items in a collection to target categories or classes.
❖ The goal of classification is to accurately predict the target class for each case in the
data.
❖ For example, a classification model could be used to identify loan applicants as low,
medium, or high credit risks.
❖ Suppose a Database D is given as D = {t1,t2,..tn} and a set of desired classes are
C={C1,…,Cm}.
❖ The classification problem is to define the mapping m in such a way that which tuple
of database D belongs to which class of C.
❖ Actually we divides D into equivalence classes.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 15


Classification Example
▪ Identify individuals with credit risks (high, low, medium or unknown).
▪ In cricket (batsman, bowler, all-rounder)
▪ Websites (educational, sports, music)
▪ Teachers classify students grades as A,B,C,D or F.
• How teachers give grades to students based on their obtained marks? x
<90 >=90
❑ If x >= 90 then A grade.
❑ If 80 =< x < 90 then B grade. x A
<80 >=80
❑ If 70 =< x < 80 then C grade.
x B
❑ If 60 =< x < 70 then D grade.
<70 >=70
❑ If x < 60 then F grade.
x C
<60 >=60
F D

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 16


Classification : a two step process
1) Model Construction
Classification
Algorithms
Training
Data

Name Rank Year Tenured


Classifier
s
(Model)
Mike Asst. Prof. 3 No
Mary Asst. Prof. 7 Yes
Bill Prof. 2 Yes If Rank = ‘professor’
Jim Asso. Prof. 7 Yes OR year > 6
THEN tenured = ‘yes’
Dave Asst. Prof. 6 No
Anne Asso. Prof. 3 No

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 17


Classification : a two step process (Cont..)
2) Model Usage
Classifier
Testing
Data

Unseen
Data
Name Rank Year Tenured
s
Tom Asst. Prof. 2 No
Merlisa Asso. Prof. 7 No (Jeff, Professor, 4)

George Prof. 5 Yes


Tenured?
Joseph Asst. Prof. 7 Yes
Yes

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 18


Classification : a two step process (Cont..)
1) Model Construction
• Describing a set of predetermined classes :
o Each tuple/sample is assumed to belong to a predefined class, as determined by the class
label attribute.
o The set of tuples used for model construction is called as training set.
o The model is represented as classification rules, decision trees, or mathematical formulae.
2) Model Usage
• For classifying future or unknown objects
o Estimate accuracy of the model
o The known label of test sample is compared with the classified result from the model.
o Accuracy rate is the percentage of test set samples that are correctly classified by the model.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 19


Different Types of Classifiers
▪ Back propagation
▪ Bayesian Classifiers
▪ Decision Trees
▪ Density estimation methods
▪ Fuzzy set theory
▪ Linear discriminant analysis (LDA)
▪ Logistic regression
▪ Naive bayes classifier
▪ Nearest Neighborhood Classification
▪ Neural networks
▪ Quadratic discriminant analysis (QDA)
▪ Support Vector Machine
▪ many more…
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 20
Decision Tree
Section - 4
Definition
• Decision tree induction is the learning of
decision trees from class-labeled training
tuples.
• A decision tree is a flowchart-like tree
structure, where each internal node (non-
leaf node) denotes a test on an attribute.
• Each branch represents an outcome of
the test.
• Each leaf node (or terminal node) holds a
class label.
• The topmost node in a tree is the root
node.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 22


ID3(Iterative Dichotomizer) Tree
Algorithm
Section - 5
ID3 Tree Algorithm
• A decision tree algorithm developed by J.Ross Quinlan in the early 1980s.
• A greedy approach in which decision trees are constructed in a top-down
recursive divide-and-conquer manner.
• Top-down approach starts with a training set of tuples and their associated
class labels.
▪ The training set is recursively partitioned into smaller subsets as the tree is
being built.
▪ The algorithm is called with 3 parameters:
❑ D (the data partition),
❑ Attribute_list, and
❑ Attribute_selection_method.
• Initially the D is the complete set of training tuples and their associated class labels.
• Attribute_list is a list of attributes describing the tuples.
• Attribute_selection_method is the procedure for selecting the attribute that best discriminates
the given tuples according to class.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 24


Decision Tree Induction - Attribute Selection Measures
▪ An attribute selection measure is a heuristic for selecting the splitting
criterion that “best” separates a given data partition, D, of class-labeled
training tuples into individual classes.
▪ Also known as splitting rules as they determine how the tuples at a given
node are to be split.
▪ The tree node created for partition D is labeled with the splitting criterion,
branches are grown for each outcome of the criterion, and the tuples are
partitioned accordingly.
▪ Three popular attribute selection measures
1. Information gain
2. Gain ratio
3. Gini index

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 25


Information Gain
▪ ID3 uses information gain as its attribute selection measure.
▪ Entropy, characterizes the (im)purity of an arbitrary collection of examples.
▪ Let pi be the probability that a tuple belongs to class Ci, estimated by |Ci,D|/|D|
▪ Expected information (entropy) needed to classify a tuple in D:
𝑚

𝐼𝑛𝑓𝑜 𝐷 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = − ෍ 𝑝𝑖 log 2 ( 𝑝𝑖 )


𝑖=1

▪ Information needed (after using A to split D into v partitions) to classify D:


𝑣
|𝐷𝑗 |
𝐼𝑛𝑓𝑜𝐴 (𝐷) = ෍ × 𝐼𝑛𝑓𝑜(𝐷𝑗 )
|𝐷|
𝑗=1

▪ Information gained by branching on attribute A:


𝐺𝑎𝑖𝑛(𝐴) = 𝐼𝑛𝑓𝑜(𝐷) − 𝐼𝑛𝑓𝑜𝐴 (𝐷)
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 26
Decision Tree Induction – Training Data and Output

RID age income student Credit_rating Class: buys_computer


1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes A decision tree for the concept buys computer,
13 middle aged high yes fair yes indicating whether an AllElectronics customer is likely
14 senior medium no excellent no to purchase a computer. Each internal (nonleaf) node
represents a test on an attribute. Each leaf node
Class-labelled training tuples from AllElectronics represents a class (either buys computer = yes or buys
Customer Database computer = no).
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 27
Decision Tree Induction - Attribute Selection Measures

RID age income student Credit_rating Class: buys_computer  Class P: buys_computer = “yes”
1 youth high no fair no
 Class N: buys_computer = “no”
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes P = Yes = 9 N = No = 5
6 senior low yes excellent no
7 middle aged low yes excellent yes 9 9 5 5
8 youth medium no fair no
Info( D) = I (9,5) = − log 2 ( ) − log 2 ( ) =0.940
14 14 14 14
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 28


Decision Tree Induction - Attribute Selection Measures

RID age income student Credit_rating Class: buys_computer


1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes S[9+, 5-] youth [2+,3-]
8 youth medium no fair no Age middle aged [4+,0-]
9 youth low yes fair yes senior [3+,2-]
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 29


Decision Tree Induction - Attribute Selection Measures

RID age income student Credit_rating Class: buys_computer


1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes S[9+, 5-] youth [2+,3-]
8 youth medium no fair no Age middle aged [4+,0-]
9 youth low yes fair yes senior [3+,2-]
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 30


Decision Tree Induction - Attribute Selection Measures

RID age income student Credit_rating Class: buys_computer


1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes S[9+, 5-] youth [2+,3-]
8 youth medium no fair no Age middle aged [4+,0-]
9 youth low yes fair yes senior [3+,2-]
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 31


Decision Tree Induction - Attribute Selection Measures

RID age income student Credit_rating Class: buys_computer


1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
S[9+, 5-] youth [2+,3-]
5 senior low yes fair yes
Age middle aged [4+,0-]
6 senior low yes excellent no
senior [3+,2-]
7 middle aged low yes excellent yes
8 youth medium no fair no Gain(Age) = 0.94 – 5/14[-2/5(log2(2/5))-3/5(log2(3/5))]
9 youth low yes fair yes – 4/14[-4/4(log2(4/4))-0/4(log2(0/4))]
10 senior medium yes fair yes – 5/14[-3/5(log2(3/5))-2/5(log2(2/5))]
11 youth medium yes excellent yes
= 0.94 – 0.69 = 0.25
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 32


Decision Tree Induction - Attribute Selection Measures
S[9+, 5-] high [2+,2-]
RID age income student Credit_rating Class: buys_computer
income medium [4+,2-]
1 youth high no fair no
2 youth high no excellent no low [3+,1-]
3 middle aged high no fair yes Gain(income) = 0.94 – 4/14[-2/4(log2(2/4))-2/4(log2(2/4))]
4 senior medium no fair yes – 6/14[-4/6(log2(4/6))-2/6(log2(2/6))]
5 senior low yes fair yes – 4/14[-3/4(log2(3/4))-1/4(log2(1/4))]
6 senior low yes excellent no
7 middle aged low yes excellent yes = 0.94 – 0.91 = 0.03
8 youth medium no fair no yes [6+, 1-]
9 youth low yes fair yes S[9+, 5-] student
10 senior medium yes fair yes no [3+, 4-]
11 youth medium yes excellent yes
12 middle aged medium no excellent yes Gain(student) = 0.94 – 7/14[-6/7(log2(6/7))-1/7(log2(1/7))]
13 middle aged high yes fair yes – 7/14[-3/7(log2(3/7))-4/7(log2(4/7))]
14 senior medium no excellent no = 0.94 – 0.78 = 0.16

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 33


Decision Tree Induction - Attribute Selection Measures

RID age income student Credit_rating Class: buys_computer


1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes S[9+, 5-] fair [6+, 2-]
5 senior low yes fair yes credit_rating
6 senior low yes excellent no excellent [3+, 3-]
7 middle aged low yes excellent yes
8 youth medium no fair no Gain(credit_rating) = 0.94 – 8/14[-6/8(log2(6/8))-2/8(log2(2/8))]
9 youth low yes fair yes – 6/14[-3/6(log2(3/6))-3/6(log2(3/6))]
10 senior medium yes fair yes = 0.94 – 0.89 = 0.05
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 34


Decision Tree Induction - Attribute Selection Measures
Attribute Information
Gain
Age 0.25
Income 0.03
Student 0.15
Credit_rating 0.05

Since Age has the highest


Information Gain we start
splitting the dataset using
the age attribute.

labeled yes

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 35


Decision Tree Induction - Attribute Selection Measures

Since all records under the branch


middle_aged are all of the class, Yes, we
can replace the leaf with class, Yes

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 36


Decision Tree Induction - Attribute Selection Measures

Now build the decision tree for the left subtree:

The mutual information is


I(2 Yes, 3 No)= I(2,3)= -2/5 log2(2/5) – 3/5 log2(3/5)=0.97

yes [2+, 0-]


S[2+, 3-] high [0+,2-]
S[2+, 3-] student
income medium [1+,1-]
no [0+, 3-]
low [1+,0-]

Gain(income) = 0.97 – 2/5[-0/2(log2(0/2))-2/2(log2(2/2))] Gain(student) = 0.97 – 2/5[-2/2(log2(2/2))-0/2(log2(0/2))]


– 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] – 3/5[-0/3(log2(0/3))-3/3(log2(3/3))]
– 1/5[-1/1(log2(1/1))-0/1(log2(0/1))] = 0.97 – 0.0 = 0.97
= 0.97 – 0.40 = 0.57

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 37


Decision Tree Induction - Attribute Selection Measures

Since these two new branches are from distinct classes, we make them into leaf nodes with their
respective class as label.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 38


Decision Tree Induction - Attribute Selection Measures

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 39


Decision Tree Induction - Attribute Selection Measures

Now build the decision tree for the right subtree

The mutual information is


I(3,2)= -3/5 log2(3/5) – 2/5 log2(2/5)=0.97

S[3+, 2-] medium [2,1-]


income
low [1+,1-]

Gain(income) = 0.97 – 3/5[-2/3(log2(2/3))-1/3(log2(1/3))]


– 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] fair [3+, 0-]
= 0.97 – 0.95 = 0.02 S[3+, 2-] credit_rating
excellent [0+, 2-]
yes [2+, 1-] Gain(credit_rating) = 0.97 – 3/5[-3/3(log2(3/3))-0/3(log2(0/3))]
S[3+, 2-] student
no [1+, 1-] – 2/5[-0/2(log2(0/2))-2/2(log2(2/2))]
= 0.97 – 0.00 = 0.97
Gain(student) = 0.97 – 3/5[-2/3(log2(2/3))-1/3(log2(1/3))]
– 2/5[-1/2(log2(1/2))-1/2(log2(1/2))]
= 0.97 – 0.95 = 0.02
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 40
Decision Tree Induction – Attribute Selection Measure

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 41


Decision Tree Induction - Attribute Selection Measures
• We then split based on credit_rating.
• These splits give partitions each with records from the same class
• make these into leaf nodes with their class label attached

New example: age<=30, income=medium, student=yes, credit-rating=fair


Follow branch(age<=30) then student=yes we predict Class=yes Buys_computer = yes

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 42


Decision Tree Induction - Tree Pruning
• Overfitting: An induced tree may overfit the training data
– Too many branches, some may reflect anomalies due to noise or outliers
– Poor accuracy for unseen samples
• Two approaches to avoid overfitting
– Prepruning: Halt tree construction early
– do not split a node if this would result in the goodness measure falling below a
threshold
Difficult to choose an appropriate threshold

– Postpruning: Remove branches from a “fully grown” tree
– get a sequence of progressively pruned trees
• Use a set of data different from the training data to decide which is the “best
pruned tree”

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 43


Decision Tree Induction - Tree Pruning

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 44


Rule Extraction from a Decision Tree
▪ Rules are easier to understand than large trees.
▪ One rule is created for each path form the root to
leaf
▪ Each attribute-value pair along a path forms a
conjunction(ANDed): the leaf holds the class
prediction (THEN)
▪ Rules are mutually exclusive and exhaustive
Example: Rule extraction from our buy_computer decision-tree

R1: IF age = youth AND student = no THEN buy_computer = no


R2: IF age = youth AND student = yes THEN buy_computer = yes
R3: IF age = middle aged THEN buy_computer = yes
R4: IF age = senior AND credit rating = fair THEN buy_computer = yes
R5: IF age = senior AND credit rating = excellent THEN buy_computer = no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 45


Rule Extraction from a Decision Tree

• Rules represent information and knowledge


• IF you study well THEN you’ll succeed

• IF you’re a student AND you have 5000 USD THEN you most probably will buy an
iPad (confidence?)

• How to assess the goodness of a rule?


𝑛𝑐𝑜𝑣𝑒𝑟𝑠
𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒 𝑅 =
𝐷
𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅 =
𝑛𝑐𝑜𝑣𝑒𝑟𝑠
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 46
Rule Extraction from a Decision Tree

RID age income student Credit_rating Class: buys_computer


1 youth high no fair no
R2: If (age =youth) AND (student =yes)
2 youth high no excellent no
3 middle aged high no fair yes THEN (buys computer = yes)
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no 2
𝒄𝒐𝒗𝒆𝒓𝒂𝒈𝒆 𝑅1 = = 14.28%
7 middle aged low yes excellent yes 14
8 youth medium no fair no
9 youth low yes fair yes 2
𝒂𝒄𝒄𝒖𝒓𝒂𝒄𝒚 𝑅1 = = 100%
10 senior medium yes fair yes 2
11 youth medium yes excellent yes
12 middle aged medium no excellent yes X: (age = youth, income = medium, student
13 middle aged high yes fair yes = yes, credit_rating=fair)
14 senior medium no excellent no

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 47


Prediction
Section - 5
What Is Prediction?
▪ Prediction is similar to classification
• First, construct a model
• Second, use model to predict unknown value
❑ Major method for prediction is regression
• Linear and multiple regression
• Non-linear regression

▪ Prediction is different from classification


• Classification refers to predict categorical class label
• Prediction models continuous-valued functions

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 49


Linear and Multiple Regression Analysis
▪ Linear regression: Y =  +  X
• Two parameters ,  and  specify the line and are to be estimated by using the data at
hand.
• using the least squares criterion to the known values of Y1, Y2, …, X1, X2, ….
s _ _

 ( x − x)( yi i − y)
= i =1
s _

 i
( x
i =1
− x ) 2

_ _
 = y−  x
▪ Multiple regression: Y = b0 + b1 X1 + b2 X2.
• Many nonlinear functions can be transformed into the above.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 50
Example: Linear Regression Analysis
• Given data
x (Years y x - 𝑥ҧ y - 𝑦ത (x - 𝑥ҧ ) * (y - 𝑦)
ത (x - 𝑥ҧ )2
β = 1281.3/358.9
Experience) (Salery) = 3.5
3 30K -6.1 -25.4 154.94 37.21 α = 55.4 – (3.5 * 9.1) = 23.6
8 57K -1.1 1.6 -1.76 1.21
9 64K -0.1 8.6 -0.86 0.01
13 75K 3.9 19.6 76.44 15.21
3 36K -6.1 -19.4 118.34 37.21
6 43K -3.1 -12.4 38.44 9.61
11 59K 1.9 3.6 6.84 3.61
21 90K 11.9 34.6 411.74 141.61
1 20K -8.1 -35.4 286.74 65.61
16 83K 6.9 27.6 190.44 47.61 y = 23.6 + 3.5x
𝑥ҧ = 9.1 𝑦ത = 55.4 Σ 1281.3 358.9

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 51


Example: Linear Regression Analysis
X (Years Y (Salary) Y (Predicted) Mean Absolute Error Root Mean Squared Error
experience) [Y=23.6 +3.5X]

3 30K 34.1K abs(Y-Yp)=4.1 Sqr(Y-Yp)=16.81


8 57K 51.6K 5.4 29.16
9 64K 55.1K 8.9 79.21
13 75K 69.1K 5.9 34.81
3 36K 34.1K 1.9 3.61
6 43K 44.6K 1.6 2.56
11 59K 62.1K 3.1 9.61
21 90K 97.1K 7.1 50.41
1 20K 27.1K 7.1 50.41
16 83K 79.6K 3.4 11.56

4.85 5.37

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 52


Exercise (Do it yourself)
The data below are the midterm and final scores of 20 students from an online AI
course. We want to know how well final exam scores can be predicted based on the
midterm scores.

Midterm, x 15 35 45 50 50 55 60 60 65 70
Final, y 22 35 46 68 39 56 48 92 92 56
Midterm, x 70 70 70 80 80 85 85 85 95 100
Final, y 48 81 50 51 67 88 72 88 88 100

▪ Fit the linear regression line relating the midterm and the final scores.
▪ Plot the data and regression line.
▪ Estimate the mean absolute and root mean square errors for the predicted final
scores.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 53


Nonlinear Regression
▪ Often the relationship between x and y
cannot be approximated with a straight
line or curve for that nonlinear regression
technique may be used.
▪ Alternatively, the data could be
preprocessed to make the relationship
linear.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 54


Logistic Regression
▪ A linear regression is not appropriate for
predicting the value of a binary variable for two
reasons:
• A linear regression will predict values outside the
acceptable range (e.g. predicting probabilities
outside the range 0 to 1).
• Since the experiments can only have one of two
possible values for each experiment, the
residuals(random errors) will not be normally
distributed about the predicted line.
▪ A logistic regression produces a logistic curve,
which is limited to values between 0 and 1.
▪ Logistic regression is similar to a linear
regression, but the curve is constructed using the
natural logarithm “odds” of the target variable,
rather than the probability.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 55


Classification vs Regression
• Classification means to group the output • Regression means to predict the
into a class. output value using training data.
• classification to predict the type of
• regression to predict the house
tumor i.e. harmful or not harmful using
training data price from training data
• if it is discrete/categorical variable, then • if it is a real number/continuous,
it is classification problem. then it is regression problem.

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 56


Tutorial
Section - 6
TUTORIAL 07
1. Compare traditional programming with machine learning.
2. What is learning? Why to learn?
3. What is supervised learning? What are supervised learning algorithms?
4. What is classifier? Explain with an example.
5. What is decision tree? What are pros and cons of decision tree?
6. What are attribute selection measures?
7. What are the criteria to prune a tree?
8. How are rules extracted from decision tree? Given an example.
9. How the goodness of rule is measure?
10. What is prediction? How it is different from classification?

© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 58


Artificial Intelligence CSC-325

Lecture – 7 Thank Any


Supervised Learning Questions ?
You

Dr. Muhammad Tariq Siddique


Department of Computer Sciences
Bahria University, Karachi Campus, Pakistan
tariqsiddique.bukc@bahria.edu.pk
Faculty room 14, 2nd Floor, Iqbal Block

You might also like