KEMBAR78
Bayesian | PDF | Probability Density Function | Applied Mathematics
0% found this document useful (0 votes)
20 views21 pages

Bayesian

Uploaded by

p.bindushreedora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views21 pages

Bayesian

Uploaded by

p.bindushreedora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Bayesian decision theory

Assume , two class problem.


Example :- An automatic system for quality measurement of a product industry.
Acceptance class = 𝑤1 , Reject class = 𝑤2
Based on previous record,

Probability of acceptance = 𝑝 𝑤1 known


Prior Probability
Probability of rejection = 𝑝 𝑤2 known
We can make simple decision rule:-

If 𝑝 𝑤1 > 𝑝 𝑤2 , then decide class 𝑤1

If 𝑝 𝑤2 > 𝑝 𝑤1 , then decide class 𝑤2


I can find out the probabilistic measure or probability density function (PDF) of variable 𝑥 for object which belongs
to class 𝑤1 and 𝑤2 seperately.

𝑝(𝑥 𝑤1 ) and 𝑝(𝑥 𝑤2 )

Class conditional PDF


Our objective is to calculate:-

𝑝(𝑤1 𝑥) and 𝑝(𝑤2 𝑥)

Posterior probability

Joint probability density function,


𝑝 𝑤𝑖 , 𝑥 = 𝑝 𝑤𝑖 𝑥 . 𝑝(𝑥)

= 𝑝(𝑥 𝑤𝑖 ). 𝑝(𝑤𝑖 )

2
⇒ 𝑝 𝑤𝑖 𝑥 . 𝑝 𝑥 = 𝑝(𝑥 𝑤𝑖 ). 𝑝(𝑤𝑖 )

𝑝 𝑥 𝑤𝑖 . 𝑝(𝑤𝑖 )
⇒ 𝑝 𝑤𝑖 𝑥 =
𝑝(𝑥)

Likelihood x Prior
Posterior =
Evidence

𝑝 𝑥 𝑤𝑖 . 𝑝(𝑤𝑖 )
𝑝 𝑤𝑖 𝑥 =
𝑝(𝑥) Bayes rule

𝑝 𝑥 = 𝑝 𝑥 𝑤𝑖 . 𝑝(𝑤𝑖 ) If 𝑝(𝑤1 𝑥)> 𝑝(𝑤2 𝑥), then decide class 𝑤1


𝑖=1 If 𝑝(𝑤2 𝑥) >𝑝(𝑤1 𝑥), then decide class 𝑤2
By expanding,

If 𝑝(𝑥 𝑤1 )𝑝 𝑤1 > 𝑝(𝑥 𝑤2 )𝑝 𝑤2 , then decide 𝑤1

If 𝑝(𝑥 𝑤2 )𝑝 𝑤2 > 𝑝(𝑥 𝑤1 )𝑝 𝑤1 , then decide 𝑤2

If 𝑝(𝑥 𝑤1 ) 𝑝 𝑤1 = 𝑝(𝑥 𝑤2 )𝑝 𝑤2 , then decision will based on 𝑝 𝑤1 and 𝑝 𝑤2 .

Error in this case:- PDF of class 𝑤1


PDF of class 𝑤2

If 𝑥1 ∈ 𝑤2 , then error 𝑝(𝑤1 𝑥) Error


𝑝 𝑤𝑖 𝑥
If 𝑥2 ∈ 𝑤1 , then error 𝑝(𝑤2 𝑥)

If I decide in favour of class 𝑤1 then probability of error = 𝑝(𝑤2 𝑥)

If I decide in favour of class 𝑤2 then probability of error = 𝑝(𝑤1 𝑥) 𝑥1 𝑥2 𝑥


Decision boundary
∞ ∞
Total error = −∞
𝑝 𝑒𝑟𝑟𝑜𝑟, 𝑥 𝑑𝑥 = −∞
𝑝(𝑒𝑟𝑟𝑜𝑟 𝑥). 𝑝 𝑥 𝑑𝑥

𝑝 𝑒𝑟𝑟𝑜𝑟, 𝑥 = min 𝑝(𝑤1 𝑥), 𝑝(𝑤2 𝑥)

𝑝 𝑥 𝑤𝑖 . 𝑝(𝑤𝑖 ) 𝑝 𝑥 𝑤𝑖 . 𝑝(𝑤𝑖 )
𝑝(𝑤𝑖 𝑥) = 2 = 𝑝(𝑤𝑖 𝑥) =
𝑖=1 𝑝 𝑥 𝑤𝑖 . 𝑝(𝑤𝑖 ) 𝑝(𝑥)

If 𝑝(𝑤1 𝑥)>𝑝(𝑤2 𝑥) ,then decide class 𝑤1

If 𝑝(𝑤2 𝑥)>𝑝(𝑤1 𝑥) ,then decide class 𝑤2

5
Generalized bayes classifier
 Use more than two states of nature.
 More than one feature .
 More action to consider.
 Loss function.

c No. of classes
𝑤1 , 𝑤2 , … … 𝑤𝑐

No. of actions = 𝛼1 , 𝛼2 , … 𝛼𝑎

Loss function:- 𝜆 𝛼𝑖 𝑤𝑗 , loss is occurred for taking action 𝛼𝑖 when state of class is 𝑤𝑗 .

𝑥 is 𝑑 – dimensional vector

6
𝑐

𝑅 𝛼𝑖 𝑥 = 𝜆 𝛼𝑖 𝑤𝑗 𝑝(𝑤𝑗 𝑥)
𝑗=1
Risk function/conditional risk/expected loss.

Minimum Risk classifier:-

Two category case :- 𝑤1 and 𝑤2 and actions 𝛼1 and 𝛼2

For simplicity , 𝜆 𝛼𝑖 𝑤𝑗 = 𝜆𝑖𝑗

In general, 𝑐

𝑅 𝛼𝑖 𝑥 = 𝜆 𝛼𝑖 𝑤𝑗 𝑝(𝑤𝑗 𝑥)
𝑗=1
𝜆𝑖𝑗

7
For two class problem.

For action 𝛼1 , 𝑅 𝛼1 𝑥 = 𝜆11 𝑝(𝑤1 𝑥) + 𝜆12 𝑝(𝑤2 𝑥)

𝜆 𝛼1 𝑤1 𝜆 𝛼1 𝑤2
If 𝑅 𝛼1 𝑥 < 𝑅 𝛼2 𝑥 , then in favour of 𝛼1
For action 𝛼2 , 𝑅 𝛼2 𝑥 = 𝜆21 𝑝(𝑤1 𝑥) + 𝜆22 𝑝(𝑤2 𝑥)
If 𝑅 𝛼1 𝑥 > 𝑅 𝛼2 𝑥 , then in favour of 𝛼2

𝜆 𝛼2 𝑤1 𝜆 𝛼2 𝑤2

𝜆21 𝑝(𝑤1 𝑥) + 𝜆22 𝑝(𝑤2 𝑥) > 𝜆11 𝑝(𝑤1 𝑥) + 𝜆12 𝑝(𝑤2 𝑥)

for decision in favour of 𝑤1 or action 𝛼1


= (𝜆21 - 𝜆11 )𝑝(𝑤1 𝑥)>(𝜆12 − 𝜆22 )𝑝(𝑤2 𝑥) If both , (𝜆21 - 𝜆11 ) > 0 and 𝜆12 − 𝜆22 > 0
And 𝑝(𝑤1 𝑥) > 𝑝(𝑤2 𝑥), then decide class 𝑤1 .
Multi-category class

𝑔1 (𝑥)

𝑋 𝑔2 (𝑥) 𝑋 ∈ !!

𝑔𝑐 (𝑥)

𝑐 = No. of classes 𝑔 𝑥 = discriminant function


𝑤1 , 𝑤2 , … … 𝑤𝑐 , are 𝑐 no of classes

𝑔𝑖 𝑥 ; 𝑖 = 1,2, . . , 𝑐.

If 𝑔𝑖 𝑥 > 𝑔𝑗 𝑥 ∀ 𝑗 ≠ 𝑖 decide 𝑥 ∈ 𝑤𝑖 .

9
Minimum risk classifier

We can let 𝑔𝑖 𝑥 = −𝑅 𝛼𝑖 𝑥
As we know , 𝑅 𝛼𝑖 𝑥 = 1 − 𝑝(𝑤𝑖 𝑥)

Then, 𝑔𝑖 𝑥 = 𝑝(𝑤𝑖 𝑥)

𝑓 𝑔𝑖 𝑥 = monotonically increasing function.

Minimum error rate classification:-


𝑝 𝑥 𝑤𝑖 . 𝑝(𝑤𝑖 )
𝑔𝑖 𝑥 = 𝑝(𝑤𝑖 𝑥) = 2
𝑗=1 𝑝 𝑥 𝑤𝑖 . 𝑝(𝑤𝑗 ) 𝑝(𝑥)

𝑔𝑖 𝑥 = 𝑝 𝑥 𝑤𝑖 . 𝑝(𝑤𝑖 )

Logarithmic function monotonically increasing function.


𝑔𝑖 𝑥 = ln 𝑝(𝑤𝑖 𝑥) = ln 𝑝 𝑥 𝑤𝑖 + ln 𝑝(𝑤𝑖 )

10
Two category case:- Class 𝑤1
We have now two discriminant function 𝑔1 𝑥 and 𝑔2 𝑥 .
Class 𝑤2

If 𝑔1 𝑥 > 𝑔2 (𝑥) decide class 𝑤1


If 𝑔1 𝑥 < 𝑔2 (𝑥) decide class 𝑤2 Decision boundary 𝑔1 𝑥 − 𝑔2 𝑥 = 0

Single discriminant function:-

𝑔 𝑥 = 𝑔1 𝑥 − 𝑔2 𝑥 = ln 𝑝(𝑤1 𝑥) − ln 𝑝(𝑤2 𝑥)
= ln 𝑝 𝑥 𝑤1 + ln 𝑝(𝑤1 ) − ln 𝑝 𝑥 𝑤2 − ln 𝑝(𝑤2 )

𝑝 𝑥 𝑤1 𝑝(𝑤1 )
= ln + ln
𝑝 𝑥 𝑤2 𝑝(𝑤2 )

11
The Normal Density
Univariate Density:- We begin with the continuous univariate normal or Gaussian density,

1 1 𝑥−𝜇 2
𝑝 𝑥 = exp −
2𝜋𝜎 2 𝜎

𝜇 = expected value of 𝑥

𝜇=𝐸 𝑥 = 𝑥𝑝 𝑥 𝑑𝑥 𝑝(𝑥)
−∞

𝜎 2 = variance

𝜎 2 = 𝐸[ 𝑥 − 𝜇)2 = (𝑥 − 𝜇)2 𝑝 𝑥 𝑑𝑥 = 𝑁(𝜇, 𝜎 2 )
−∞ 𝜇 𝑥

12
Multivariate Density:- The general multivariate normal density in 𝑑 dimensions is written as

1 1 𝑡 −1
𝑝 𝑥 = exp − (𝑥 − 𝜇) 𝛴 (𝑥 − 𝜇)
(2𝜋)𝑑/2 𝛴 1/2 2

𝑥 = feature vector with dimension 𝑑


𝜇 = expected value of dimension 𝑑

𝜇=𝐸 𝑥 = 𝑥𝑝 𝑥 𝑑𝑥
−∞
𝛴 = covariance matrix

𝛴 = 𝐸[ 𝑥 − 𝜇 𝑥 − 𝜇)𝑡 = 𝑥 − 𝜇 (𝑥 − 𝜇)𝑡 𝑝 𝑥 𝑑𝑥 𝑥 − 𝜇 = (𝑑 × 1)
−∞ (𝑥 − 𝜇)𝑡 = (1 × 𝑑)
𝛴 = 𝐸 𝑥 = (𝑑 × 𝑑)
𝑖𝑡ℎ component 𝜇𝑖 = 𝐸[𝑥𝑖 ]

𝑖, 𝑗 𝑡ℎ component 𝜎𝑖,𝑗 = 𝐸[ 𝑥𝑖 − 𝜇𝑖 𝑥𝑗 − 𝜇𝑗 )𝑡

Diagonal component,𝜎𝑖,𝑖 = 𝐸[ 𝑥𝑖 − 𝜇𝑖 )2 = 𝜎𝑖2

13
Bivariate normal density function:-

𝑋 = two dimensional feature vector


𝑥1
𝑋= 𝑥
2

1 1 𝑥1 − 𝜇1 2 𝑥2 − 𝜇2 2
𝑝 𝑥 = 1/2
exp − {( ) + ( ) }
(2𝜋) 𝛴 2 𝜎1 𝜎2

𝜎12 0 𝜇1
𝛴= 𝜇= 𝜇
0 𝜎22 2

1 1 𝑥1 −𝜇1 2 𝑥2 −𝜇2 2
𝑝 𝑥 = exp − ( ) exp ( )
(2𝜋) 𝛴 1/2 2 𝜎1 𝜎2

14
Physical Interpretation:-
(i) First case:- 𝜎1 = 𝜎2
𝑝(𝑥)
𝑝(𝑥)
𝑥2
Loci of point having
constant density

𝑥1
For bivariate function I want to trace the loci of constant density i.e. all value of 𝑥 for which 𝑝(𝑥) is constant , those loci
is nothing but circle.
Along with these circles , I have more probability of occurrence of set of points which are drawn from a single population
arbitrary.

15
(ii) Second case:- 𝜎1 2 ≠ 𝜎2 2

𝜎12 𝜎12
𝛴=
𝜎21 𝜎22

If all samples are statistically independent,


𝜎12 = 𝜎21 = 0
then 𝜎12 0
𝛴=
0 𝜎22

(iii) Third case:-Data are not statistically independent .

𝑒1 = eigen vectors of the covariance matrix 𝛴

16
Discriminant Functions for the Normal Density
We know that the discriminant functions given as:-

𝑔𝑖 𝑥 = 𝑙𝑛𝑝(𝑤𝑖 𝑥) = 𝑙𝑛𝑝 𝑥 𝑤𝑖 + 𝑙𝑛𝑝(𝑤𝑖 )

Multivariate Density:-
1 1 𝑡 𝛴 −1 (𝑥 − 𝜇 )
𝑝 𝑥 𝑤𝑖 = exp − (𝑥 − 𝜇𝑖 ) 𝑖 𝑖
(2𝜋)𝑑/2 𝛴𝑖 1/2 2
Discriminant Functions:-
1 𝑑 1
𝑔𝑖 𝑥 = − 2 (𝑥 − 𝜇𝑖 )𝑡 𝛴𝑖−1 (𝑥 − 𝜇𝑖 ) − 2 𝑙𝑛2𝜋 − 2 ln 𝛴𝑖 + 𝑙𝑛𝑝(𝑤𝑖 )

Let us examine the discriminant function and resulting classification for a no. of special cases.

17
Case :-
𝛴𝑖 = 𝜎 2 𝐼 𝐼 = Identity matrix (𝑑 × 𝑑)

𝜎𝑖,𝑗 = 0, different components are statistically independent.

𝛴𝑖 = 𝜎 2𝑑
1
𝛴𝑖−1 = 2 𝐼
𝜎
1 𝑑 1
𝑔𝑖 𝑥 = − 2 (𝑥 − 𝜇𝑖 )𝑡 𝛴𝑖−1 (𝑥 − 𝜇𝑖 ) − 2 𝑙𝑛2𝜋 − 2 ln 𝛴𝑖 + ln 𝑝(𝑤𝑖 )

constant Independent of 𝑖,so they are ignored.


Thus we obtain the simple discriminant functions:-

𝑥 − 𝜇𝑖 2
𝑔𝑖 𝑥 = − + 𝑙𝑛𝑝(𝑤𝑖 )
2𝜎 2
where . is the Euclidean norm that is ,
𝑥 − 𝜇𝑖 2 = (𝑥 − 𝜇𝑖 )𝑡 (𝑥 − 𝜇𝑖 )

18
Expansion of the quadratic form (𝑥 − 𝜇𝑖 )𝑡 𝑥 − 𝜇𝑖 yields
1
𝑔𝑖 𝑥 = − 2 𝑥 𝑡 𝑥 − 2𝜇𝑖𝑡 𝑥 + 𝜇𝑖𝑡 𝜇𝑖 + ln 𝑝(𝑤𝑖 )
2𝜎

which appears to be quadratic function of 𝑥. However ,the quadratic term 𝑥 𝑡 𝑥 is independent for all 𝑖,
making it an ignorable additive constant. then,
1 𝑡 𝑡
𝑔𝑖 𝑥 = − −2𝜇 𝑖 𝑥 + 𝜇𝑖 𝜇𝑖 + ln 𝑝(𝑤𝑖 )
2𝜎 2
Thus we obtain the equivalent linear discriminant functions

𝑔𝑖 𝑥 = 𝑤𝑖𝑡 𝑥 + 𝑤𝑖0
where
1 1 𝑡
𝑤𝑖 = 𝜇 , 𝑤 = − 𝜇 𝜇 + ln 𝑝(𝑤𝑖 )
𝜎 2 𝑖 𝑖0 2𝜎 2 𝑖 𝑖

If 𝑔𝑖 𝑥 > 𝑔𝑗 𝑥 ,then 𝑥 ∈ class 𝑖


If 𝑔𝑗 𝑥 > 𝑔𝑖 𝑥 ,then 𝑥 ∈ class 𝑗
19
𝑔𝑖 𝑥 = 𝑔𝑗 𝑥 or 𝑔𝑖 𝑥 − 𝑔𝑗 𝑥 = 0

𝑔𝑖 𝑥 = 𝑤𝑖𝑡 𝑥 + 𝑤𝑖0

𝑔𝑗 𝑥 = 𝑤𝑗𝑡 𝑥 + 𝑤𝑗0

𝑔𝑖 𝑥 − 𝑔𝑗 𝑥 = 0

⇒ 𝑤𝑖 − 𝑤𝑗 𝑡 𝑥 + 𝑤𝑖0 + 𝑤𝑗0 = 0
1 𝑡
1 𝑡 1 𝑡
⇒ 2 (𝜇𝑖 −𝜇𝑗 ) 𝑥 − 2 𝜇𝑖 𝜇𝑖 + ln 𝑝 𝑤𝑖 + 2 𝜇𝑗 𝜇𝑗 − ln 𝑝 𝑤𝑗 = 0
𝜎 2𝜎 2𝜎
1 ln 𝑝 𝑤𝑖
⇒ (𝜇𝑖 −𝜇𝑗 )𝑡 𝑥 − (𝜇𝑖𝑡 𝜇𝑖 − 𝜇𝑗𝑡 𝜇𝑗 ) + 𝜎 2 =0
2 ln 𝑝 𝑤𝑗
2
𝑡
1 𝜎 ln 𝑝 𝑤𝑖
⇒ (𝜇𝑖 −𝜇𝑗 ) 𝑥 − 2 𝜇𝑖 + 𝜇𝑗 − 2
(𝜇𝑖 −𝜇𝑗 ) =0
𝜇𝑖 − 𝜇𝑗 ln 𝑝 𝑤𝑗

⇒ 𝑤𝑡 𝑥 − 𝑥0 = 0

20
1 1 𝜎2 ln 𝑝 𝑤𝑖
𝑤 = (𝜇𝑖 −𝜇𝑗 ) 𝑥0 = 𝜇𝑖 + 𝜇𝑗 − (𝜇𝑖 −𝜇𝑗 )
2 2 𝜇𝑖 − 𝜇𝑗 2 ln 𝑝 𝑤𝑗

1
If 𝑝 𝑤𝑖 = 𝑝 𝑤𝑗 ; 𝑥0 = 𝜇𝑖 + 𝜇𝑗
2

𝑤 Bisector for the line joining between 𝜇𝑖 and 𝜇𝑗

𝜇𝑖 𝜇𝑗

Orthogonal to the line joining between 𝜇𝑖 and 𝜇𝑗

21

You might also like