0% found this document useful (0 votes)

21 views16 pages

Lecture 5

This document presents Lecture 5 of a course on Probabilistic Machine Learning, focusing on the Expectation Maximization (EM) algorithm and its application to Gaussian Mixture Models (GMMs). It outlines the concepts of latent variables, the steps involved in the EM algorithm, and the derivation of the algorithm for GMMs, along with potential issues such as local optima and label-switching. The lecture emphasizes the advantages of GMMs over k-means clustering and provides references for further reading.

Uploaded by

marius.boda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views16 pages

Lecture 5

Uploaded by

marius.boda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Probabilistic Machine Learning

Lecture 5: Expectation maximization

Pekka Marttinen

Aalto University

February, 2025

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 1 / 16

Lecture 5 overview

Gaussian mixture models (GMMs), recap

EM algorithm
EM for Gaussian mixture models
Suggested reading: Bishop: Pattern Recognition and Machine
Learning
p. 110-113 (2.3.9): Mixtures of Gaussians
simple_example.pdf
p. 430-443: EM for Gaussian mixtures

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 2 / 16

GMMs, latent variable representation
Introduce latent variables zn =(zn1 , . . . , znK ) which spci…es the
component k of observation xn

1 , 0, . . . , 0)T
zn = (0, . . . , 0, |{z}
k th elem.

De…ne
K K
p (zn ) = ∏ πkz nk
and p (xn jzn ) = ∏ N (xn jµk , Σk )z nk

k =1 k =1
Then the marginal distribution p (xn ) is a GMM:
K
p (xn ) = ∑ πk N (xn jµk , Σk )
k =1
Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 3 / 16
GMM: responsibilities, complete data

Posterior probability (responsibility) p (znk = 1jxn ) that observation

xn was generated by component k

πk N (xn jµk , Σk )
γ(znk ) p (znk = 1jxn ) =
∑ j = 1 π j N ( xn j µ j , Σ j )
K

Complete data: latent variables z and data x together: (x, z)

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 4 / 16

Idea of the EM algorithm (1/2)

Let X denote the observed data, and θ model parameters. The goal
b
in maximum likelihood is to …nd θ:

θb = arg max flog p (X jθ )g

If model contains latent variables Z , the log-likelihood is given by

( )
log p (X jθ ) = log ∑ p (X , Z j θ ) ,
Z

which may be di¢ cult to maximize analytically

Possible solutions: 1) numerical optimization, 2) the EM algorithm
(expectation-maximization)

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 5 / 16

Idea of the EM algorithm (2/2)

X : observed data, Z : unobserved latent variables

fX , Z g: complete data, X : incomplete data
In EM algorithm, we assume that the complete data log-likelihood:

log p (X , Z jθ )

is easy to maximize.
Problem: Z is not observed
Solution: maximize

Q (θ, θ0 ) EZ jX ,θ0 [log p (X , Z jθ )]

= ∑ p (Z jX , θ0 ) log p (X , Z jθ )
Z

where p (Z jX , θ0 ) is the posterior distribution of the latent variables

computed using the current parameter estimate θ0
Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 6 / 16
Illustration of the EM algorithm for GMMs

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 7 / 16

EM algorithm in detail

Goal: maximize log p (X jθ ) w.r.t. θ

1 Initialize θ0
2 E-step Evaluate p (Z jX , θ0 ), and then compute

Q (θ, θ0 ) = EZ jX ,θ0 [log p (X , Z jθ )] = ∑ p (Z jX , θ0 ) log p (X , Z jθ )

3 M-step Evaluate θ new using

θ new = arg max Q (θ, θ0 ).

Set θ0 θ new
4 Repeat E and M steps until convergence

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 8 / 16

Why EM works

Figure: 11.16 in Murphy (2012)

As a function of θ, Q (θ, θ0 ) is a lower bound of the log-likelihood

log p (x jθ ) (plus a constant, see Bishop, Ch. 9.4).
EM iterates between 1) updating the lower bound (E-step), 2)
maximizing the lower bound (M-step).
Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 9 / 16
EM algorithm, comments

In general, Z does not have to be discrete, just replace the

summation in Q (θ, θ0 ) by integration.
EM-algorithm can be used to compute the MAP (maximum a
posteriori) estimate by maximizing in the M-step Q (θ, θ0 ) + log p (θ ).
In general, EM-algorithm is applicable when the observed data X can
be augmented into complete data fX , Z g such that log p (X , Z jθ ) is
easy to maximize; Z does not have to be latent variables but can
represent, for example, unobserved values of missing or censored
observations.

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 10 / 16

EM algorithm, simple example

Consider N independent observations x = (x1 , . . . , xN ) from a

two-component mixture of univariate Gaussians
1 1
p (xn jθ ) = N (xn j0, 1) + N (xn jθ, 1). (1)
2 2
One unknown parameter, θ, the mean of the second component.
Goal: estimate
θb = arg max flog p (xjθ )g .
θ

simple_example.pdf

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 11 / 16

EM algorithm for GMMs
k =1 π k N (x j µ k , Σ k )
p (x) = ∑K
1 Initialize parameter µk , Σk and mizing coe¢ cients πk . Repeat until
convergence:
2 E-step: Evaluate the responsibilities using current parameter values
πk N (xn jµk , Σk )
γ(znk ) =
∑ j = 1 π k N ( xn j µ k , Σ j )
K

3 M-step: Re-estimate the parameters using the current responsibilities

N
1
µnew
k =
Nk ∑ γ(znk )xn
n =1
N
1
Σnew
k =
Nk ∑ γ(znk )(xn µnew
k )(xn µnew
k )
T
n =1
N
πknew = k
N
Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 12 / 16
Derivation of the EM algorithm for GMMs

In the M-step the formulas for µnew

k and Σnew
k are obtained by
di¤erentiating the expected complete data log-likelihood Q (θ, θ0 )
with respect to the particular parameters, and setting the derivatives
to zero.
The formula for πknew can be derived by maximizing Q (θ, θ0 ) under
the constraint ∑K
k = πk = 1. This can be done using the Lagrange
multipliers.

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 13 / 16

EM for GMM, caveats
EM converges to a local optimum. In fact, the ML estimation for
GMMs is not well-de…ned due to singularities: if σk ! 0 for
components k with a single data point, likelihood goes to in…nity
(…g). Remedy: prior on σk .
Label-switching: non-identi…ability due to the fact that cluster labels
can be switched and likelihood remains the same.
In practice it is recommended to initialize the EM for the GMM by
k-means.

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 14 / 16

GMM vs. k-means
"Why use GMMs and not just k-means?"

from Wikipedia

1 Clusters can be of di¤erent sizes and shapes

2 Probabilistic assignment of data items to clusters
3 Possibility to include prior knowledge (structure of the model/prior
distributions on the parameters)
Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 15 / 16
Important points

ML-estimation of GMMs can be done using numerical optimization or

the EM algorithm.
The main idea of the EM algorithm is to maximize the expectation of
the complete data log-likelihood, where the expectation is computed
with respect to the current posterior distributions (responsibilites) of
the latent variables.

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 16 / 16

کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
Lecture-04 GMM EMalg
No ratings yet
Lecture-04 GMM EMalg
34 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
Week11 Summary Detail
No ratings yet
Week11 Summary Detail
7 pages
Gaussian Mixture Models Tutorial
No ratings yet
Gaussian Mixture Models Tutorial
26 pages
PROBABILISTIC Learning Jb-New
No ratings yet
PROBABILISTIC Learning Jb-New
13 pages
EM Algorithm & Gaussian Mixtures
No ratings yet
EM Algorithm & Gaussian Mixtures
10 pages
Gaussian Mixture Modelling GMM
No ratings yet
Gaussian Mixture Modelling GMM
11 pages
20 Gaussian Mixture Model
No ratings yet
20 Gaussian Mixture Model
55 pages
Module13 GaussianMixtureModel
No ratings yet
Module13 GaussianMixtureModel
17 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
The em Algorithm in ML in Bayesian Learning
No ratings yet
The em Algorithm in ML in Bayesian Learning
12 pages
Gaussian Mixture Models
No ratings yet
Gaussian Mixture Models
3 pages
Introduction To EM - Gaussian Mixture Models
No ratings yet
Introduction To EM - Gaussian Mixture Models
12 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
5
No ratings yet
5
29 pages
AI29
No ratings yet
AI29
3 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
24 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
5 pages
EM Algorithm for Data Scientists
No ratings yet
EM Algorithm for Data Scientists
31 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
ds11 2
No ratings yet
ds11 2
19 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
Applied Stat
No ratings yet
Applied Stat
2 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
Expectation Maximization (EM) Algorithm
No ratings yet
Expectation Maximization (EM) Algorithm
47 pages
Expectation-Maximization For The Gaussian Mixture Model
No ratings yet
Expectation-Maximization For The Gaussian Mixture Model
8 pages
Expectation-Maximization (E-M) Algorithm
No ratings yet
Expectation-Maximization (E-M) Algorithm
12 pages
EM Algorithm for Statisticians
No ratings yet
EM Algorithm for Statisticians
36 pages
The EM Algorithm: Ajit Singh November 20, 2005
No ratings yet
The EM Algorithm: Ajit Singh November 20, 2005
4 pages
Beamer
No ratings yet
Beamer
34 pages
HMM Tutorial for Engineers
No ratings yet
HMM Tutorial for Engineers
15 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
Gaussian Mixtures
No ratings yet
Gaussian Mixtures
5 pages
Clustering Mixture
No ratings yet
Clustering Mixture
22 pages
N D IX: The E-M Algorithm
No ratings yet
N D IX: The E-M Algorithm
12 pages
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
12 pages
ML Unit3 EM GMM VodnalaSrujana
No ratings yet
ML Unit3 EM GMM VodnalaSrujana
4 pages
L11.2 Prob Models em
No ratings yet
L11.2 Prob Models em
20 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
HMM Toolbox Usage Guide
No ratings yet
HMM Toolbox Usage Guide
3 pages
Get One More Story in Your Member Preview When You Sign Up. It's Free
No ratings yet
Get One More Story in Your Member Preview When You Sign Up. It's Free
12 pages
Tutorial em
No ratings yet
Tutorial em
57 pages
ExpectationMaximization Algorithm
No ratings yet
ExpectationMaximization Algorithm
7 pages
14 Gaussian Mixture Models
No ratings yet
14 Gaussian Mixture Models
60 pages
EM Algorithm and Variants: An Informal Tutorial
No ratings yet
EM Algorithm and Variants: An Informal Tutorial
17 pages
An Alternative View of EM - Poornima
No ratings yet
An Alternative View of EM - Poornima
4 pages
Expectation Maximization
No ratings yet
Expectation Maximization
19 pages
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
No ratings yet
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
28 pages
Bayesian Networks & EM Algorithm
No ratings yet
Bayesian Networks & EM Algorithm
7 pages
Microbiology-Course Module Revised 9.9.23
No ratings yet
Microbiology-Course Module Revised 9.9.23
12 pages
Chem Workshop #1
No ratings yet
Chem Workshop #1
2 pages
Impact of Pressure on IRP Fatigue
No ratings yet
Impact of Pressure on IRP Fatigue
23 pages
Summer 2122 Aubf Lab Periodical Test 2
No ratings yet
Summer 2122 Aubf Lab Periodical Test 2
38 pages
STP32537S Characterization of High Purity Cathodes For Plant Control
No ratings yet
STP32537S Characterization of High Purity Cathodes For Plant Control
30 pages
10.2305 IUCN - UK.1998.RLTS.T33255A9771604.en
No ratings yet
10.2305 IUCN - UK.1998.RLTS.T33255A9771604.en
5 pages
اسئلة فيزياء الرنين كروب B - نسخة
No ratings yet
اسئلة فيزياء الرنين كروب B - نسخة
2 pages
Cross Line Laser User Guide
No ratings yet
Cross Line Laser User Guide
44 pages
Separation of Polymers by Solvent Fractionation
No ratings yet
Separation of Polymers by Solvent Fractionation
3 pages
Weight-For-Age BOYS: 6 Months To 2 Years (Percentiles)
No ratings yet
Weight-For-Age BOYS: 6 Months To 2 Years (Percentiles)
1 page
Comprehensive Gened Booster 1 Questionnaire
No ratings yet
Comprehensive Gened Booster 1 Questionnaire
26 pages
Greek Mythology
No ratings yet
Greek Mythology
26 pages
Digital Satellite Receiver: User Manual
No ratings yet
Digital Satellite Receiver: User Manual
78 pages
Assignemnt2 - Xchart Rchart
No ratings yet
Assignemnt2 - Xchart Rchart
2 pages
Year 3 Science Exam Paper
No ratings yet
Year 3 Science Exam Paper
2 pages
Semiconductor Field Service Expert
No ratings yet
Semiconductor Field Service Expert
2 pages
Medical Y-Connector Guidelines
No ratings yet
Medical Y-Connector Guidelines
2 pages
BID NOTICE UNDER OPEN INTERNATIONAL BIDDING STATCOM Amended JULY24
No ratings yet
BID NOTICE UNDER OPEN INTERNATIONAL BIDDING STATCOM Amended JULY24
2 pages
Chapter - 01
No ratings yet
Chapter - 01
72 pages
Lesson Plan Exemplar.
No ratings yet
Lesson Plan Exemplar.
6 pages
Lte Users Guide v0.1 en
No ratings yet
Lte Users Guide v0.1 en
11 pages
Citric Acid-Production, Technology, Applications, Patent, Consultants, Company Profiles, Reports, Market
No ratings yet
Citric Acid-Production, Technology, Applications, Patent, Consultants, Company Profiles, Reports, Market
7 pages
An Exploratory Analysis On The Perception of Filipino Aircraft Maintenance Technicians On The Use of Paperless Maintenance Documentations
No ratings yet
An Exploratory Analysis On The Perception of Filipino Aircraft Maintenance Technicians On The Use of Paperless Maintenance Documentations
14 pages
LTC4365 240805 102331
No ratings yet
LTC4365 240805 102331
20 pages
RSCP Rssi RTWP
100% (5)
RSCP Rssi RTWP
1 page
Sukkur IBA Cafeteria Price Update
No ratings yet
Sukkur IBA Cafeteria Price Update
3 pages
Roll Crushers PDF
No ratings yet
Roll Crushers PDF
5 pages
Math Competition Exam
No ratings yet
Math Competition Exam
3 pages
Geography and Landforms of Asia
No ratings yet
Geography and Landforms of Asia
31 pages
Olimpiade Guru Bahasa Inggris SMP Sce 2017 (Soal)
No ratings yet
Olimpiade Guru Bahasa Inggris SMP Sce 2017 (Soal)
5 pages

Lecture 5

Uploaded by

Lecture 5

Uploaded by

Probabilistic Machine Learning

Lecture 5: Expectation maximization

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 1 / 16

Gaussian mixture models (GMMs), recap

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 2 / 16

Posterior probability (responsibility) p (znk = 1jxn ) that observation

Complete data: latent variables z and data x together: (x, z)

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 4 / 16

θb = arg max flog p (X jθ )g

If model contains latent variables Z , the log-likelihood is given by

which may be di¢ cult to maximize analytically

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 5 / 16

X : observed data, Z : unobserved latent variables

Q (θ, θ0 ) EZ jX ,θ0 [log p (X , Z jθ )]

where p (Z jX , θ0 ) is the posterior distribution of the latent variables

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 7 / 16

Goal: maximize log p (X jθ ) w.r.t. θ

Q (θ, θ0 ) = EZ jX ,θ0 [log p (X , Z jθ )] = ∑ p (Z jX , θ0 ) log p (X , Z jθ )

3 M-step Evaluate θ new using

θ new = arg max Q (θ, θ0 ).

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 8 / 16

Figure: 11.16 in Murphy (2012)

As a function of θ, Q (θ, θ0 ) is a lower bound of the log-likelihood

In general, Z does not have to be discrete, just replace the

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 10 / 16

Consider N independent observations x = (x1 , . . . , xN ) from a

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 11 / 16

3 M-step: Re-estimate the parameters using the current responsibilities

In the M-step the formulas for µnew

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 13 / 16

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 14 / 16

1 Clusters can be of di¤erent sizes and shapes

ML-estimation of GMMs can be done using numerical optimization or

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 16 / 16

You might also like