See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/321245699
Gaussian Mixture Model - method and application
Presentation · November 2017
DOI: 10.13140/RG.2.2.32667.77602
CITATION READS
1 5,065
1 author:
Jesús Zambrano
The MathWorks
65 PUBLICATIONS 375 CITATIONS
SEE PROFILE
All content following this page was uploaded by Jesús Zambrano on 23 November 2017.
The user has requested enhancement of the downloaded file.
Gaussian Mixture Models
– method and applications
Jesús Zambrano
PostDoctoral Researcher
School of Business, Society and Engineering
www.mdh.se
FUDIPO project. Machine Learning course. Oct.-Dec. 2017
Outline
● Method
● Introduction to Gaussian Mixture Process (GMM)
● Standard construction of GMM
● Clustering (Silhouette and Akaike criterion)
● Case studies
● Monitoring a secondary settler tank
● Residual and fault detection criteria
● Conclusions
Gaussian Mixture Model (GMM)
- standard construction
𝜇𝜇𝑘𝑘 : mean
A linear superposition of K-Gaussians 𝜎𝜎𝑘𝑘 : covariance
is called a Gaussian mixture (GM). The mixture coefficient
satisfies
Interpretation: The density is the probability
of , given that component was chosen. The probability of
choosing component is given by the prior probability .
GMM - standard construction (cont.)
For example, consider the following GMM:
GMM - standard construction (cont.)
The form of the GM distribution is governed by the parameters
𝝅𝝅, 𝝁𝝁 and 𝝈𝝈. One way to get them is by maximum likelihood.
Given 𝑁𝑁 observations , the log-likelihood function is
There is no closed-form solution available (due to the sum
inside the logarithm).
This problem can be separated into two simple problems using
the expectation-maximization (EM) algorithm.
GMM - standard construction (cont.)
Conditions to be satisfied at a maximum of the likelihood function
which gives
Maximize with respect to (using Lagrange multipliers)
gives
For more details of EM and GMM see: C. Bishop, Pattern Recognition and Machine Learning, Springer, 2007.
GMM - standard construction (cont.)
A simple Matlab example
● Matlab functions:
● fitgmdist (Fit a Gaussian mixture distribution to data)
● pdf (Density function of a specific ditribution)
Raw data Data model with 2 Gaussian
(2 clusters of 1000 points each) Mixture distributions
Run: gmm_example.m
A simple Matlab example (cont.)
● Silhouette value (S)
It is a measure of how similar a point is to a point in its own cluster.
Minimum average distance from the Average distance from 𝑖𝑖 𝑡𝑡𝑡 point to
𝑖𝑖 𝑡𝑡𝑡 point to points in a different cluster 𝑏𝑏𝑖𝑖 − 𝑎𝑎𝑖𝑖
other points in the same cluster
𝑆𝑆𝑖𝑖 = For well match of 𝑖𝑖 in its own cluster,
max(𝑎𝑎𝑖𝑖 , 𝑏𝑏𝑖𝑖 ) 𝑏𝑏𝑖𝑖 should be large and 𝑎𝑎𝑖𝑖 small.
𝑆𝑆𝑖𝑖 ranges between -1 to +1. High 𝑆𝑆𝑖𝑖 indicates that 𝑖𝑖 is well-matched to its
own cluster, and poorly-matched to neighboring clusters.
A simple Matlab example (cont.)
● Silhouette value (S)
K=2 GM
K=3 GM
A simple Matlab example (cont.)
● Akaike’s Information Criterion (AIC)
Provides a measure of the relative quality of a model for a given set of
data.
Number of estimated parameters Model parameters
2𝑛𝑛𝑝𝑝
Then, the aim is to get: min 1 + ∑𝑁𝑁
𝑡𝑡=1 𝜀𝜀 2 (𝑡𝑡, 𝜃𝜃)
𝑛𝑛𝑝𝑝 ,𝜃𝜃 𝑁𝑁
Number of values in the estimation data set
Prediction error
The most accurate model has the smallest AIC.
AIC=17584 AIC=14233 AIC=14238
Case study
Jesús Zambrano
jesus.zambrano@mdh.se
A wastewater treatment plant
A wastewater treatment plant (cont.)
The Process
Effluent
Influent
Waste
Q: flowrate
S: conc. soluble substrate
X: conc. biomass
r: recycle ratio
w: wastage ratio
The Process (cont.)
Clarification zone
Thickening
zone
Sludge blanket
Scanning a secondary settler
Sludge profile
The Problem
Scanning Sludge profiles
Level [m] Level [m]
SS sensor
How to detect
settler faulty profiles?
?
SS conc.
[g/L]
Let’s apply
Gaussian Mixture Models!
GMM for the settler
15 sludge profiles in non-faulty conditions
GMM for the settler (cont.)
GMM parameters 𝜋𝜋𝑘𝑘 , 𝜇𝜇𝑘𝑘 , 𝜎𝜎𝑘𝑘 :
We denote
Settler monitoring
• Sludge profiles from day 1 (blue) to day 33 (red).
• New profile every 15 minutes = 3168 profiles.
Day 1 -10 Day 11 - 20 Day 21 -33
(Red does not mean alarm!)
Residual and Fault detection criteria
threshold
normal where Classical binary
hypothesis
faulty! ℎ = max 𝑟𝑟 � testing problem
𝑡𝑡∈𝐻𝐻0
Settler monitoring (cont.)
residual
threshold
Conclusions
● Valuable information can be obtained by monitoring a
Secondary Settler in a wastewater treatment plant.
● Gaussian Mixture Models provide a novel tool for fault
detection in this process.
● The proposed method is general and could be
implemented in settlers with different geometries and
sludge profiles.
● The method is also suitable for monitoring deviations
in a process with repetitive data profiles.
Sources of information
● Books:
● Podcasts:
Thanks for your attention!
Jesús Zambrano
jesus.zambrano@mdh.se
Jesús Zambrano
jesus.zambrano@mdh.se
View publication stats