KEMBAR78
Classification Algorithms | PDF | Statistical Classification | Bayesian Inference
0% found this document useful (0 votes)
4 views23 pages

Classification Algorithms

Classification Algorithms in AI

Uploaded by

khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views23 pages

Classification Algorithms

Classification Algorithms in AI

Uploaded by

khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

Classification Algorithms

 Linear Regression
 Decision Tree Induction
 Bayes Classification Methods
 Rule-Based Classification
 Model Evaluation and Selection
 Techniques to Improve Classification Accuracy:
Ensemble Methods
 Summary
1
Supervised vs. Unsupervised
Learning

 Supervised learning (classification)


 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels indicating
the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels are unknown
 Given a set of measurements, observations, etc. with the
aim of establishing the existence of classes or clusters in
the data
2
Prediction Problems:
Classification vs. Numeric
Prediction
 Classification
 predicts categorical class labels (discrete or nominal)
 classifies data (constructs a model) based on the training
set and the values (class labels) in a classifying attribute
and uses it in classifying new data
 Numeric Prediction
 models continuous-valued functions, i.e., predicts unknown
or missing values

3
Classification—A Two-Step
Process
 Model construction: describing a set of predetermined classes
 Each tuple/sample is assumed to belong to a predefined class, as determined by
the class label attribute
 The set of tuples used for model construction is training set
 The model is represented as classification rules, decision trees, or mathematical
formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model
 The known label of test sample is compared with the classified result from
the model
 Accuracy rate is the percentage of test set samples that are correctly
classified by the model
 Test set is independent of training set (otherwise overfitting)
 If the accuracy is acceptable, use the model to classify new data
 Note: If the test set is used to select models, it is called validation (test) set

4
Process (1): Model
Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
5
Process (2): Using the Model in
Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
6
Chapter 8. Classification: Basic
Concepts
 Classification: Basic Concepts
 Decision Tree Induction
 Bayes Classification Methods
 Rule-Based Classification
 Model Evaluation and Selection
 Techniques to Improve Classification Accuracy:
Ensemble Methods
 Summary
7
Decision Tree Induction: An
Example
age income student credit_rating buys_computer
<=30 high no fair no
Training data set: Buys_computer <=30 high no excellent no
Resulting tree: 31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
age? <=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
<=30 overcast
31..40 >40
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

student? yes credit rating?

no yes excellent fair

no yes no yes
8
Algorithm for Decision Tree
Induction
 Basic algorithm (a greedy algorithm)
 Tree is constructed in a top-down recursive divide-and-conquer
manner
 At start, all the training examples are at the root
 Attributes are categorical (if continuous-valued, they are discretized
in advance)
Examples are Examples partitioned recursively based on selected attributes
 Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
 Conditions for stopping partitioning
 All samples for a given node belong to the same class
 There are no remaining attributes for further partitioning
 There are no samples left

9
Brief Review of Entropy

10
Attribute Selection Measure:
Information Gain (ID3/C4.5)

 Expected information (entropy) needed to classify a tuple in D:


 Information needed (after using A to split D into v partitions) to
m
classify D:
Info( D )=−∑ pi log 2 ( pi )
i= 1
 Information gained by branching on attribute A

v
|D j|
Info A ( D )=∑ × Info( D j )
j= 1 |D|

G ain ( A )= Info ( D )− Info A ( D )

11
Attribute Selection Measure:
Information Gain (ID3/C4.5)

 Expected information (entropy) needed to classify a tuple in D:


 Information needed (after using A to split
m D into v partitions) to
classify D: Info( D )=−∑ pi log 2 ( pi )
i= 1

 Information gained by branching on attribute A


v
|D j|
Info A ( D )=∑ × Info( D j )
j= 1 |D|

G ain ( A )= Info ( D )− Info A ( D )

12
Attribute Selection: Information
Gain
 Class P: buys_computer = “yes” 5 4
Info age ( D )= I (2,3 )+ I ( 4,0)
 Class N: buys_computer = “no” 14 14
5
+ I (3,2)=0 . 694
9 9 5 5 14
Info( D )=I (9,5 )=− log 2 ( )− log 2 ( )=0 . 940
14 14 14 14
5
age income student credit_rating buys_computer I (2,3 ) means “age <=30” has 5 out of
<=30 high no fair no 14
14 samples, with 2 yes’es and 3
<=30 high no excellent no
31…40 high no fair yes no’s. Hence
>40 medium no fair yes
>40 low yes fair yes G ain ( age )= Info ( D )− Info age ( D )= 0 . 246
>40 low yes excellent no
31…40 low yes excellent yes Similarly,
<=30 medium no fair no
<=30 low yes fair yes Gain( income )=0 . 029
>40 medium yes fair yes
<=30 medium yes excellent yes Gain( student )=0 . 151
31…40 medium no excellent yes Gain( credit rating )=0 . 048
31…40 high yes fair yes
>40 medium no excellent no 13
Overfitting and Tree Pruning
 Overfitting: An induced tree may overfit the training data
 Too many branches, some may reflect anomalies due to noise or
outliers
 Poor accuracy for unseen samples
 Two approaches to avoid overfitting
 Prepruning: Halt tree construction early ̵ do not split a node if this
would result in the goodness measure falling below a threshold
 Difficult to choose an appropriate threshold
 Postpruning: Remove branches from a “fully grown” tree—get a
sequence of progressively pruned trees
 Use a set of data different from the training data to decide
which is the “best pruned tree”

14
Chapter 8. Classification: Basic
Concepts
 Classification: Basic Concepts
 Decision Tree Induction
 Bayes Classification Methods
 Rule-Based Classification
 Model Evaluation and Selection
 Techniques to Improve Classification Accuracy:
Ensemble Methods
 Summary
15
Bayesian Classification:
Why?
 A statistical classifier: performs probabilistic prediction, i.e., predicts
class membership probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A simple Bayesian classifier, naïve Bayesian classifier,
has comparable performance with decision tree and selected neural
network classifiers
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct — prior
knowledge can be combined with observed data
 Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision making
against which other methods can be measured

16
Bayes’ Theorem: Basics
M
 Total probability Theorem: P( B )=∑ P (B|A i )P ( A i )
i=1

P ( X |H ) P ( H )
 Bayes’ Theorem: P ( H |X )= =P ( X |H )× P ( H )/ P ( X )
P( X )

 Let X be a data sample (“evidence”): class label is unknown


 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), (i.e., posteriori probability): the
probability that the hypothesis holds given the observed data sample X
 P(H) (prior probability): the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed
 P(X|H) (likelihood): the probability of observing the sample X, given that
the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is 31..40,
medium income
17
Prediction Based on Bayes’
Theorem
 Given training data X, posteriori probability of a hypothesis H,
P(H|X), follows the Bayes’ theorem
P ( X |H ) P ( H )
P ( H |X )= =P ( X |H )× P ( H )/ P ( X )
P( X )

 Informally, this can be viewed as


posteriori = likelihood x prior/evidence
 Predicts X belongs to Ci iff the probability P(Ci|X) is the highest
among all the P(Ck|X) for all the k classes
 Practical difficulty: It requires initial knowledge of many
probabilities, involving significant computational cost
18
Classification Is to Derive the Maximum
Posteriori
 Let D be a training set of tuples and their associated class labels,
and each tuple is represented by an n-D attribute vector X = (x1,
x2, …, xn)
 Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
 This can be derived from Bayes’ theorem

P( X|C i ) P(C i )
P( C i|X )=
P( X )
 Since P(X) is constant for all classes, only

needs to be maximized
P ( C i|X )=P ( X |C i ) P ( C i )
19
Naïve Bayes Classifier: Training
Dataset
age income studentcredit_rating
buys_compu
<=30 high no fair no
Class: <=30 high no excellent no
C1:buys_computer = ‘yes’ 31…40 high no fair yes
C2:buys_computer = ‘no’ >40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
Data to be classified:
31…40 low yes excellent yes
X = (age <=30,
<=30 medium no fair no
Income = medium, <=30 low yes fair yes
Student = yes >40 medium yes fair yes
Credit_rating = Fair) <=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
20
Naïve Bayes Classifier: An
Example
 P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 age
<=30
income studentcredit_rating
high no fair
buys_computer
no
<=30 high no excellent no
P(buys_computer = “no”) = 5/14= 0.357 31…40
>40
high
medium
no fair
no fair
yes
yes

 Compute P(X|Ci) for each class >40


>40
low
low
yes fair
yes excellent
yes
no
31…40 low yes excellent yes

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 <=30


<=30
medium
low
no fair
yes fair
no
yes
>40 medium yes fair yes
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 <=30 medium yes excellent yes
31…40 medium no excellent yes
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 31…40
>40
high
medium
yes fair
no excellent
yes
no

P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4


P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)

21
X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
Comparison of Decision Tree and
Baysian Classification

23

You might also like