CSC-325
Artificial Intelligence
Lecture - 7:
Supervised Learning
Dr. Muhammad Tariq Siddique
Department of Computer Sciences
Bahria University, Karachi Campus, Pakistan
tariqsiddique.bukc@bahria.edu.pk
Faculty room 14, 2nd Floor, Iqbal Block
Topics to be covered
Looping
1 Learning
2 Supervised Learning
3 Classification
4 Decision Tree
5 Regression
6 Linear Regression
7 Tutorial
8
Overview
Section - 1
Develop predictive
model based on Machine Learning Discover an internal
both input and representation from
output data input data only
Reinforcement
Supervised Unsupervised
(Algorithms learn to react
(Task Driven) (Data Driven)
an environment)
Regression Dimension
Classification Clustering Decision Process
Reduction
Reward System
Decision Tree, NN, kMean, Kmedoids,
Naïve Bayes, KNN, Fuzzy C-means,
SVM, Discriminant Hierarchical, SOM,
Analysis, Ensemble Hidden Markov Model,
Methods, Random
Forest
Gaussian Mixture Recommendation
Ordinary LSR, Linear Systems
Regression,
Logistic PCA, LDA
Regression, MARS,
LOESS
Traditional Programming vs. Machine Learning
multiply a number by itself
int square(int x) Not an efficient
{ approach when
return x*x; rules (mathematical
description)
}
becomes complex!
Column_1 Column_2
2 4
Training
Set 11 121
25 625 multiply a number by itself
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 5
What is Learning
▪ Webster's definition of “to learn”
“To gain knowledge or understanding of, or skill in by study, instruction or
experience''
❑ Learning a set of new facts
❑ Learning HOW to do something
❑ Improving ability of something already learned
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 6
Why “Learn” ?
▪ There is no need to “learn” to calculate payroll
▪ Learning is used when:
• Human expertise does not exist
• Humans are unable to explain their expertise (speech recognition)
• Solution changes in time (routing on a computer network)
• Solution needs to be adapted to particular cases (user biometrics)
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 7
Learning
▪ Examples
• Walking (motor skills)
• Riding a bike (motor skills)
• Telephone number (memorizing)
• Playing backgammon (strategy)
• Develop scientific theory (abstraction)
• Language
• Recognize fraudulent credit card transactions
• Etc.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 8
Supervised Learning
Section - 2
How Supervised Machine Learning Works?
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 10
Supervised Learning
▪ Supervised learning is the machine learning task of inferring a function
from supervised training data.
▪ The training data consist of a set of training examples.
▪ In supervised learning, each example is a pair consisting of an input
object (typically a vector) and a desired output value (also called the
supervisory signal).
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 11
Supervised Learning Algorithms
A supervised learning algorithm analyzes the training data and produces
▪ (Inferred function) classifier
• If the output is discrete.
OR
▪ Regression function
• If the output is continuous
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 12
Working of supervised learning algorithms
Given a set of training examples of the form:
{ (x1,y1), . . ., (xN, yN)}
a learning algorithm seeks a function
g:X→Y
where X is the input space and Y is the output space and the function g is
an element of some space of possible functions G, usually called the
hypothesis space.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 13
Classification
Section - 3
Introduction to classification
❖ Classification is a supervised learning method.
❖ It assigns items in a collection to target categories or classes.
❖ The goal of classification is to accurately predict the target class for each case in the
data.
❖ For example, a classification model could be used to identify loan applicants as low,
medium, or high credit risks.
❖ Suppose a Database D is given as D = {t1,t2,..tn} and a set of desired classes are
C={C1,…,Cm}.
❖ The classification problem is to define the mapping m in such a way that which tuple
of database D belongs to which class of C.
❖ Actually we divides D into equivalence classes.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 15
Classification Example
▪ Identify individuals with credit risks (high, low, medium or unknown).
▪ In cricket (batsman, bowler, all-rounder)
▪ Websites (educational, sports, music)
▪ Teachers classify students grades as A,B,C,D or F.
• How teachers give grades to students based on their obtained marks? x
<90 >=90
❑ If x >= 90 then A grade.
❑ If 80 =< x < 90 then B grade. x A
<80 >=80
❑ If 70 =< x < 80 then C grade.
x B
❑ If 60 =< x < 70 then D grade.
<70 >=70
❑ If x < 60 then F grade.
x C
<60 >=60
F D
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 16
Classification : a two step process
1) Model Construction
Classification
Algorithms
Training
Data
Name Rank Year Tenured
Classifier
s
(Model)
Mike Asst. Prof. 3 No
Mary Asst. Prof. 7 Yes
Bill Prof. 2 Yes If Rank = ‘professor’
Jim Asso. Prof. 7 Yes OR year > 6
THEN tenured = ‘yes’
Dave Asst. Prof. 6 No
Anne Asso. Prof. 3 No
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 17
Classification : a two step process (Cont..)
2) Model Usage
Classifier
Testing
Data
Unseen
Data
Name Rank Year Tenured
s
Tom Asst. Prof. 2 No
Merlisa Asso. Prof. 7 No (Jeff, Professor, 4)
George Prof. 5 Yes
Tenured?
Joseph Asst. Prof. 7 Yes
Yes
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 18
Classification : a two step process (Cont..)
1) Model Construction
• Describing a set of predetermined classes :
o Each tuple/sample is assumed to belong to a predefined class, as determined by the class
label attribute.
o The set of tuples used for model construction is called as training set.
o The model is represented as classification rules, decision trees, or mathematical formulae.
2) Model Usage
• For classifying future or unknown objects
o Estimate accuracy of the model
o The known label of test sample is compared with the classified result from the model.
o Accuracy rate is the percentage of test set samples that are correctly classified by the model.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 19
Different Types of Classifiers
▪ Back propagation
▪ Bayesian Classifiers
▪ Decision Trees
▪ Density estimation methods
▪ Fuzzy set theory
▪ Linear discriminant analysis (LDA)
▪ Logistic regression
▪ Naive bayes classifier
▪ Nearest Neighborhood Classification
▪ Neural networks
▪ Quadratic discriminant analysis (QDA)
▪ Support Vector Machine
▪ many more…
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 20
Decision Tree
Section - 4
Definition
• Decision tree induction is the learning of
decision trees from class-labeled training
tuples.
• A decision tree is a flowchart-like tree
structure, where each internal node (non-
leaf node) denotes a test on an attribute.
• Each branch represents an outcome of
the test.
• Each leaf node (or terminal node) holds a
class label.
• The topmost node in a tree is the root
node.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 22
ID3(Iterative Dichotomizer) Tree
Algorithm
Section - 5
ID3 Tree Algorithm
• A decision tree algorithm developed by J.Ross Quinlan in the early 1980s.
• A greedy approach in which decision trees are constructed in a top-down
recursive divide-and-conquer manner.
• Top-down approach starts with a training set of tuples and their associated
class labels.
▪ The training set is recursively partitioned into smaller subsets as the tree is
being built.
▪ The algorithm is called with 3 parameters:
❑ D (the data partition),
❑ Attribute_list, and
❑ Attribute_selection_method.
• Initially the D is the complete set of training tuples and their associated class labels.
• Attribute_list is a list of attributes describing the tuples.
• Attribute_selection_method is the procedure for selecting the attribute that best discriminates
the given tuples according to class.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 24
Decision Tree Induction - Attribute Selection Measures
▪ An attribute selection measure is a heuristic for selecting the splitting
criterion that “best” separates a given data partition, D, of class-labeled
training tuples into individual classes.
▪ Also known as splitting rules as they determine how the tuples at a given
node are to be split.
▪ The tree node created for partition D is labeled with the splitting criterion,
branches are grown for each outcome of the criterion, and the tuples are
partitioned accordingly.
▪ Three popular attribute selection measures
1. Information gain
2. Gain ratio
3. Gini index
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 25
Information Gain
▪ ID3 uses information gain as its attribute selection measure.
▪ Entropy, characterizes the (im)purity of an arbitrary collection of examples.
▪ Let pi be the probability that a tuple belongs to class Ci, estimated by |Ci,D|/|D|
▪ Expected information (entropy) needed to classify a tuple in D:
𝑚
𝐼𝑛𝑓𝑜 𝐷 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = − 𝑝𝑖 log 2 ( 𝑝𝑖 )
𝑖=1
▪ Information needed (after using A to split D into v partitions) to classify D:
𝑣
|𝐷𝑗 |
𝐼𝑛𝑓𝑜𝐴 (𝐷) = × 𝐼𝑛𝑓𝑜(𝐷𝑗 )
|𝐷|
𝑗=1
▪ Information gained by branching on attribute A:
𝐺𝑎𝑖𝑛(𝐴) = 𝐼𝑛𝑓𝑜(𝐷) − 𝐼𝑛𝑓𝑜𝐴 (𝐷)
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 26
Decision Tree Induction – Training Data and Output
RID age income student Credit_rating Class: buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes A decision tree for the concept buys computer,
13 middle aged high yes fair yes indicating whether an AllElectronics customer is likely
14 senior medium no excellent no to purchase a computer. Each internal (nonleaf) node
represents a test on an attribute. Each leaf node
Class-labelled training tuples from AllElectronics represents a class (either buys computer = yes or buys
Customer Database computer = no).
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 27
Decision Tree Induction - Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer Class P: buys_computer = “yes”
1 youth high no fair no
Class N: buys_computer = “no”
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes P = Yes = 9 N = No = 5
6 senior low yes excellent no
7 middle aged low yes excellent yes 9 9 5 5
8 youth medium no fair no
Info( D) = I (9,5) = − log 2 ( ) − log 2 ( ) =0.940
14 14 14 14
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 28
Decision Tree Induction - Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes S[9+, 5-] youth [2+,3-]
8 youth medium no fair no Age middle aged [4+,0-]
9 youth low yes fair yes senior [3+,2-]
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 29
Decision Tree Induction - Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes S[9+, 5-] youth [2+,3-]
8 youth medium no fair no Age middle aged [4+,0-]
9 youth low yes fair yes senior [3+,2-]
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 30
Decision Tree Induction - Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes S[9+, 5-] youth [2+,3-]
8 youth medium no fair no Age middle aged [4+,0-]
9 youth low yes fair yes senior [3+,2-]
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 31
Decision Tree Induction - Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
S[9+, 5-] youth [2+,3-]
5 senior low yes fair yes
Age middle aged [4+,0-]
6 senior low yes excellent no
senior [3+,2-]
7 middle aged low yes excellent yes
8 youth medium no fair no Gain(Age) = 0.94 – 5/14[-2/5(log2(2/5))-3/5(log2(3/5))]
9 youth low yes fair yes – 4/14[-4/4(log2(4/4))-0/4(log2(0/4))]
10 senior medium yes fair yes – 5/14[-3/5(log2(3/5))-2/5(log2(2/5))]
11 youth medium yes excellent yes
= 0.94 – 0.69 = 0.25
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 32
Decision Tree Induction - Attribute Selection Measures
S[9+, 5-] high [2+,2-]
RID age income student Credit_rating Class: buys_computer
income medium [4+,2-]
1 youth high no fair no
2 youth high no excellent no low [3+,1-]
3 middle aged high no fair yes Gain(income) = 0.94 – 4/14[-2/4(log2(2/4))-2/4(log2(2/4))]
4 senior medium no fair yes – 6/14[-4/6(log2(4/6))-2/6(log2(2/6))]
5 senior low yes fair yes – 4/14[-3/4(log2(3/4))-1/4(log2(1/4))]
6 senior low yes excellent no
7 middle aged low yes excellent yes = 0.94 – 0.91 = 0.03
8 youth medium no fair no yes [6+, 1-]
9 youth low yes fair yes S[9+, 5-] student
10 senior medium yes fair yes no [3+, 4-]
11 youth medium yes excellent yes
12 middle aged medium no excellent yes Gain(student) = 0.94 – 7/14[-6/7(log2(6/7))-1/7(log2(1/7))]
13 middle aged high yes fair yes – 7/14[-3/7(log2(3/7))-4/7(log2(4/7))]
14 senior medium no excellent no = 0.94 – 0.78 = 0.16
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 33
Decision Tree Induction - Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes S[9+, 5-] fair [6+, 2-]
5 senior low yes fair yes credit_rating
6 senior low yes excellent no excellent [3+, 3-]
7 middle aged low yes excellent yes
8 youth medium no fair no Gain(credit_rating) = 0.94 – 8/14[-6/8(log2(6/8))-2/8(log2(2/8))]
9 youth low yes fair yes – 6/14[-3/6(log2(3/6))-3/6(log2(3/6))]
10 senior medium yes fair yes = 0.94 – 0.89 = 0.05
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 34
Decision Tree Induction - Attribute Selection Measures
Attribute Information
Gain
Age 0.25
Income 0.03
Student 0.15
Credit_rating 0.05
Since Age has the highest
Information Gain we start
splitting the dataset using
the age attribute.
labeled yes
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 35
Decision Tree Induction - Attribute Selection Measures
Since all records under the branch
middle_aged are all of the class, Yes, we
can replace the leaf with class, Yes
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 36
Decision Tree Induction - Attribute Selection Measures
Now build the decision tree for the left subtree:
The mutual information is
I(2 Yes, 3 No)= I(2,3)= -2/5 log2(2/5) – 3/5 log2(3/5)=0.97
yes [2+, 0-]
S[2+, 3-] high [0+,2-]
S[2+, 3-] student
income medium [1+,1-]
no [0+, 3-]
low [1+,0-]
Gain(income) = 0.97 – 2/5[-0/2(log2(0/2))-2/2(log2(2/2))] Gain(student) = 0.97 – 2/5[-2/2(log2(2/2))-0/2(log2(0/2))]
– 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] – 3/5[-0/3(log2(0/3))-3/3(log2(3/3))]
– 1/5[-1/1(log2(1/1))-0/1(log2(0/1))] = 0.97 – 0.0 = 0.97
= 0.97 – 0.40 = 0.57
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 37
Decision Tree Induction - Attribute Selection Measures
Since these two new branches are from distinct classes, we make them into leaf nodes with their
respective class as label.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 38
Decision Tree Induction - Attribute Selection Measures
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 39
Decision Tree Induction - Attribute Selection Measures
Now build the decision tree for the right subtree
The mutual information is
I(3,2)= -3/5 log2(3/5) – 2/5 log2(2/5)=0.97
S[3+, 2-] medium [2,1-]
income
low [1+,1-]
Gain(income) = 0.97 – 3/5[-2/3(log2(2/3))-1/3(log2(1/3))]
– 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] fair [3+, 0-]
= 0.97 – 0.95 = 0.02 S[3+, 2-] credit_rating
excellent [0+, 2-]
yes [2+, 1-] Gain(credit_rating) = 0.97 – 3/5[-3/3(log2(3/3))-0/3(log2(0/3))]
S[3+, 2-] student
no [1+, 1-] – 2/5[-0/2(log2(0/2))-2/2(log2(2/2))]
= 0.97 – 0.00 = 0.97
Gain(student) = 0.97 – 3/5[-2/3(log2(2/3))-1/3(log2(1/3))]
– 2/5[-1/2(log2(1/2))-1/2(log2(1/2))]
= 0.97 – 0.95 = 0.02
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 40
Decision Tree Induction – Attribute Selection Measure
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 41
Decision Tree Induction - Attribute Selection Measures
• We then split based on credit_rating.
• These splits give partitions each with records from the same class
• make these into leaf nodes with their class label attached
New example: age<=30, income=medium, student=yes, credit-rating=fair
Follow branch(age<=30) then student=yes we predict Class=yes Buys_computer = yes
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 42
Decision Tree Induction - Tree Pruning
• Overfitting: An induced tree may overfit the training data
– Too many branches, some may reflect anomalies due to noise or outliers
– Poor accuracy for unseen samples
• Two approaches to avoid overfitting
– Prepruning: Halt tree construction early
– do not split a node if this would result in the goodness measure falling below a
threshold
Difficult to choose an appropriate threshold
•
– Postpruning: Remove branches from a “fully grown” tree
– get a sequence of progressively pruned trees
• Use a set of data different from the training data to decide which is the “best
pruned tree”
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 43
Decision Tree Induction - Tree Pruning
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 44
Rule Extraction from a Decision Tree
▪ Rules are easier to understand than large trees.
▪ One rule is created for each path form the root to
leaf
▪ Each attribute-value pair along a path forms a
conjunction(ANDed): the leaf holds the class
prediction (THEN)
▪ Rules are mutually exclusive and exhaustive
Example: Rule extraction from our buy_computer decision-tree
R1: IF age = youth AND student = no THEN buy_computer = no
R2: IF age = youth AND student = yes THEN buy_computer = yes
R3: IF age = middle aged THEN buy_computer = yes
R4: IF age = senior AND credit rating = fair THEN buy_computer = yes
R5: IF age = senior AND credit rating = excellent THEN buy_computer = no
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 45
Rule Extraction from a Decision Tree
• Rules represent information and knowledge
• IF you study well THEN you’ll succeed
• IF you’re a student AND you have 5000 USD THEN you most probably will buy an
iPad (confidence?)
• How to assess the goodness of a rule?
𝑛𝑐𝑜𝑣𝑒𝑟𝑠
𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒 𝑅 =
𝐷
𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅 =
𝑛𝑐𝑜𝑣𝑒𝑟𝑠
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 46
Rule Extraction from a Decision Tree
RID age income student Credit_rating Class: buys_computer
1 youth high no fair no
R2: If (age =youth) AND (student =yes)
2 youth high no excellent no
3 middle aged high no fair yes THEN (buys computer = yes)
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no 2
𝒄𝒐𝒗𝒆𝒓𝒂𝒈𝒆 𝑅1 = = 14.28%
7 middle aged low yes excellent yes 14
8 youth medium no fair no
9 youth low yes fair yes 2
𝒂𝒄𝒄𝒖𝒓𝒂𝒄𝒚 𝑅1 = = 100%
10 senior medium yes fair yes 2
11 youth medium yes excellent yes
12 middle aged medium no excellent yes X: (age = youth, income = medium, student
13 middle aged high yes fair yes = yes, credit_rating=fair)
14 senior medium no excellent no
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 47
Prediction
Section - 5
What Is Prediction?
▪ Prediction is similar to classification
• First, construct a model
• Second, use model to predict unknown value
❑ Major method for prediction is regression
• Linear and multiple regression
• Non-linear regression
▪ Prediction is different from classification
• Classification refers to predict categorical class label
• Prediction models continuous-valued functions
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 49
Linear and Multiple Regression Analysis
▪ Linear regression: Y = + X
• Two parameters , and specify the line and are to be estimated by using the data at
hand.
• using the least squares criterion to the known values of Y1, Y2, …, X1, X2, ….
s _ _
( x − x)( yi i − y)
= i =1
s _
i
( x
i =1
− x ) 2
_ _
= y− x
▪ Multiple regression: Y = b0 + b1 X1 + b2 X2.
• Many nonlinear functions can be transformed into the above.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 50
Example: Linear Regression Analysis
• Given data
x (Years y x - 𝑥ҧ y - 𝑦ത (x - 𝑥ҧ ) * (y - 𝑦)
ത (x - 𝑥ҧ )2
β = 1281.3/358.9
Experience) (Salery) = 3.5
3 30K -6.1 -25.4 154.94 37.21 α = 55.4 – (3.5 * 9.1) = 23.6
8 57K -1.1 1.6 -1.76 1.21
9 64K -0.1 8.6 -0.86 0.01
13 75K 3.9 19.6 76.44 15.21
3 36K -6.1 -19.4 118.34 37.21
6 43K -3.1 -12.4 38.44 9.61
11 59K 1.9 3.6 6.84 3.61
21 90K 11.9 34.6 411.74 141.61
1 20K -8.1 -35.4 286.74 65.61
16 83K 6.9 27.6 190.44 47.61 y = 23.6 + 3.5x
𝑥ҧ = 9.1 𝑦ത = 55.4 Σ 1281.3 358.9
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 51
Example: Linear Regression Analysis
X (Years Y (Salary) Y (Predicted) Mean Absolute Error Root Mean Squared Error
experience) [Y=23.6 +3.5X]
3 30K 34.1K abs(Y-Yp)=4.1 Sqr(Y-Yp)=16.81
8 57K 51.6K 5.4 29.16
9 64K 55.1K 8.9 79.21
13 75K 69.1K 5.9 34.81
3 36K 34.1K 1.9 3.61
6 43K 44.6K 1.6 2.56
11 59K 62.1K 3.1 9.61
21 90K 97.1K 7.1 50.41
1 20K 27.1K 7.1 50.41
16 83K 79.6K 3.4 11.56
4.85 5.37
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 52
Exercise (Do it yourself)
The data below are the midterm and final scores of 20 students from an online AI
course. We want to know how well final exam scores can be predicted based on the
midterm scores.
Midterm, x 15 35 45 50 50 55 60 60 65 70
Final, y 22 35 46 68 39 56 48 92 92 56
Midterm, x 70 70 70 80 80 85 85 85 95 100
Final, y 48 81 50 51 67 88 72 88 88 100
▪ Fit the linear regression line relating the midterm and the final scores.
▪ Plot the data and regression line.
▪ Estimate the mean absolute and root mean square errors for the predicted final
scores.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 53
Nonlinear Regression
▪ Often the relationship between x and y
cannot be approximated with a straight
line or curve for that nonlinear regression
technique may be used.
▪ Alternatively, the data could be
preprocessed to make the relationship
linear.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 54
Logistic Regression
▪ A linear regression is not appropriate for
predicting the value of a binary variable for two
reasons:
• A linear regression will predict values outside the
acceptable range (e.g. predicting probabilities
outside the range 0 to 1).
• Since the experiments can only have one of two
possible values for each experiment, the
residuals(random errors) will not be normally
distributed about the predicted line.
▪ A logistic regression produces a logistic curve,
which is limited to values between 0 and 1.
▪ Logistic regression is similar to a linear
regression, but the curve is constructed using the
natural logarithm “odds” of the target variable,
rather than the probability.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 55
Classification vs Regression
• Classification means to group the output • Regression means to predict the
into a class. output value using training data.
• classification to predict the type of
• regression to predict the house
tumor i.e. harmful or not harmful using
training data price from training data
• if it is discrete/categorical variable, then • if it is a real number/continuous,
it is classification problem. then it is regression problem.
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 56
Tutorial
Section - 6
TUTORIAL 07
1. Compare traditional programming with machine learning.
2. What is learning? Why to learn?
3. What is supervised learning? What are supervised learning algorithms?
4. What is classifier? Explain with an example.
5. What is decision tree? What are pros and cons of decision tree?
6. What are attribute selection measures?
7. What are the criteria to prune a tree?
8. How are rules extracted from decision tree? Given an example.
9. How the goodness of rule is measure?
10. What is prediction? How it is different from classification?
© Dr. Tariq 2025 Department of Computer Sciences | Bahria University 58
Artificial Intelligence CSC-325
Lecture – 7 Thank Any
Supervised Learning Questions ?
You
Dr. Muhammad Tariq Siddique
Department of Computer Sciences
Bahria University, Karachi Campus, Pakistan
tariqsiddique.bukc@bahria.edu.pk
Faculty room 14, 2nd Floor, Iqbal Block