0% found this document useful (0 votes)

88 views31 pages

EM Algorithm for Data Scientists

The document discusses the Expectation-Maximization (EM) algorithm and its applications. EM is an iterative method for finding maximum likelihood estimates of parameters in probabilistic models, when the model depends on unobserved latent variables. It involves alternating between performing an expectation (E) step, to compute the expected value of the log-likelihood, and a maximization (M) step to compute the parameter values that maximize the expected log-likelihood found in the E step. The document describes how EM can be applied to problems like training Gaussian mixture models and learning stochastic string edit distances.

Uploaded by

Mary Jansi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views31 pages

EM Algorithm for Data Scientists

Uploaded by

Mary Jansi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

ExpectationMaximization Algorithm and Applications

Eugene Weinstein Courant Institute of Mathematical Sciences Nov 14th, 2006

List of Concepts
Maximum-Likelihood Estimation (MLE) Expectation-Maximization (EM) Conditional Probability Mixture Modeling Gaussian Mixture Models (GMMs) String edit-distance Forward-backward algorithms
2/31

Overview
Expectation-Maximization Mixture Model Training Learning String Edit-Distance

3/31

One-Slide MLE Review

Say I give you a coin with But I dont tell you the value of Now say I let you flip the coin n times
You get h heads and n-h tails

What is the natural estimate of ?

This is

More formally, the likelihood of is governed by a binomial distribution:

Can prove is the maximum-likelihood estimate of Differentiate with respect to , set equal to 0
4/31

EM Motivation
So, to solve any ML-type problem, we analytically maximize the likelihood function?
Seems to work for 1D Bernoulli (coin toss) Also works for 1D Gaussian (find , 2 )

Not quite
Distribution may not be well-behaved, or have too many parameters Say your likelihood function is a mixture of 1000 1000dimensional Gaussians (1M parameters) Direct maximization is not feasible

Solution: introduce hidden variables to

Simplify the likelihood function (more common) Account for actual missing data
5/31

Hidden and Observed Variables

Observed variables: directly measurable from the data, e.g.
The waveform values of a speech recording Is it raining today? Did the smoke alarm go off?

Hidden variables: influence the data, but not trivial to measure

The phonemes that produce a given speech recording P (rain today | rain yesterday) Is the smoke alarm malfunctioning?

6/31

Expectation-Maximization
Model dependent random variables:
Observed variable x Unobserved (hidden) variable y that generates x

Assume probability distributions:

represents set of all parameters of distribution

Repeat until convergence

E-step: Compute expectation of

( , : old, new distribution parameters) M-step: Find that maximizes Q

7/31

Conditional Expectation Review

Let X, Y be r.v.s drawn from the distributions P(x) and P(y) Conditional distribution given by: Then For function h(Y ): Given a particular value of X (X=x):
8/31

Maximum Likelihood Problem

Want to pick that maximizes the loglikelihood of the observed (x) and unobserved (y) variables given
Observed variable x Previous parameters

Conditional expectation of given x and is

9/31

EM Derivation
Lemma (Special case of Jensens Inequality): Let p(x), q(x) be probability distributions. Then

Proof: rewrite as:

Interpretation: relative entropy non-negative

10/31

EM Derivation
EM Theorem:
If then

Proof:

By some algebra and lemma, So, if this quantity is positive, so is

11/31

EM Summary
Repeat until convergence
E-step: Compute expectation of

( , : old, new distribution parameters) M-step: Find that maximizes Q

EM Theorem:
If then

Interpretation
As long as we can improve the expectation of the log-likelihood, EM improves our model of observed variable x Actually, its not necessary to maximize the expectation, just need to make sure that it increases this is called Generalized EM
12/31

EM Comments
In practice, the x is series of data points
To calculate expectation, can assume i.i.d and sum over all points:

Problems with EM?

Local maxima Need to bootstrap training process (pick a )

When is EM most useful?

When model distributions easy to maximize (e.g., Gaussian mixture models)

EM is a meta-algorithm, needs to be adapted to particular application

13/31

Overview
Expectation-Maximization Mixture Model Training Learning String Distance

14/31

EM Applications: Mixture Models

Gaussian/normal distribution
Parameters: mean and variance 2 In the multi-dimensional case, assume isotropic Gaussian: same variance in all dimensions We can model arbitrary distributions with density mixtures

15/31

Density Mixtures
Combine m elementary densities to model a complex data distribution

kth Gaussian parametrized by

16/31

Density Mixtures
Combine m elementary densities to model a complex data distribution

17/31

Density Mixtures
Combine m elementary densities to model a complex data distribution

Log-likelihood function of the data x given

Log of sum hard to optimize analytically! Instead, introduce hidden variable y

: x generated by Gaussian k

EM formulation: maximize
18/31

Gaussian Mixture Model EM

Goal: maximize n (observed) data points: n (hidden) labels:
: xi generated by Gaussian k

Several pages of math later, we get: E step: compute likelihood of

M step: update k, k, k for each Gaussian k=1..m

19/31

GMM-EM Discussion
Summary: EM naturally applicable to training probabilistic models EM is a generic formulation, need to do some hairy math to get to implementation Problems with GMM-EM?
Local maxima Need to bootstrap training process (pick a )

GMM-EM applicable to enormous number of pattern recognition tasks: speech, vision, etc. Hours of fun with GMM-EM
20/31

Overview
Expectation-Maximization Mixture Model Training Learning String Distance

21/31

String Edit-Distance
Notation: Operate on two strings: Edit-distance: transform one string into another using
Substitution: kitten bitten, cost Insertion: cop crop, cost Deletion: learn earn, cost

Can compute efficiently recursively

22/31

Stochastic String Edit-Distance

Instead of setting costs, model edit operation sequence as a random process Edit operations selected according to a probability distribution For edit operation sequence View string edit-distance as
memoryless (Markov): stochastic: random process according to () is governed by a true probability distribution transducer:
23/31

Edit-Distance Transducer
Arc label a:b/0 means input a, output b and weight 0 Assume

24/31

Two Distances
Define yield of an edit sequence (zn#) as the set of all strings hx,yi such that zn# turns x into y Viterbi edit-distance: negative loglikelihood of most likely edit sequence Stochastic edit-distance: negative loglikelihood of all edit sequences from x to y
25/31

Evaluating Likelihood
Viterbi: Stochastic: Both require calculation of possible edit sequences

over all

possibilities (three edit operations)

However, memoryless assumption allows us to compute likelihood efficiently Use the forward-backward method!
26/31

Forward
Evaluation of forward probabilities : likelihood of picking an edit sequence that generates the prefix pair Memoryless assumption allows efficient recursive computation:

27/31

Backward
Evaluation of backward probabilities : likelihood of picking an edit sequence that generates the suffix pair Memoryless assumption allows efficient recursive computation:

28/31

EM Formulation
Edit operations selected according to a probability distribution So, EM has to update based on occurrence counts of each operation (similar to coin-tossing example) Idea: accumulate expected counts from forward, backward variables (z): expected count of edit operation z
29/31

EM Details

(z): expected count of edit operation z e.g,

30/31

References
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society B, 39(1), 1977 pp. 1-38. C. F. J. Wu, On the Convergence Properties of the EM Algorithm, The Annals of Statistics, 11(1), Mar 1983, pp. 95-103. F. Jelinek, Statistical Methods for Speech Recognition, 1997 M. Collins, The EM Algorithm, 1997 J. A. Bilmes, A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models, Technical Report, University of Berkeley, TR-97-021, 1998 E. S. Ristad and P. N. Yianilos, Learning string edit distance, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(2), 1998, pp. 522-532. L.R. Rabiner. A tutorial on HMM and selected applications in speech recognition, In Proc. IEEE, 77(2), 1989, pp. 257-286. A. D'Souza, Using EM To Estimate A Probablity [sic] Density With A Mixture Of Gaussians M. Mohri. Edit-Distance of Weighted Automata, in Proc. Implementation and Application of Automata, (CIAA) 2002, pp. 1-23 J. Glass, Lecture Notes, MIT class 6.345: Automatic Speech Recognition, 2003 Carlo Tomasi, Estimating Gaussian Mixture Densities with EM A Tutorial, 2004 Wikipedia
31/31

TR 97 021
No ratings yet
TR 97 021
15 pages
Expectation Maximization Homework Solution
100% (1)
Expectation Maximization Homework Solution
8 pages
EM Algorithm for Statisticians
No ratings yet
EM Algorithm for Statisticians
36 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
Expectation Maximization
No ratings yet
Expectation Maximization
19 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
PROBABILISTIC Learning Jb-New
No ratings yet
PROBABILISTIC Learning Jb-New
13 pages
ds11 2
No ratings yet
ds11 2
19 pages
14 Gaussian Mixture Models
No ratings yet
14 Gaussian Mixture Models
60 pages
AI29
No ratings yet
AI29
3 pages
Week11 Summary Detail
No ratings yet
Week11 Summary Detail
7 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
No ratings yet
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
28 pages
6.2 K Means
No ratings yet
6.2 K Means
23 pages
EM Algorithm for HMM Training
No ratings yet
EM Algorithm for HMM Training
5 pages
Lecture 5
No ratings yet
Lecture 5
16 pages
Expectation Maximization (EM) Algorithm
No ratings yet
Expectation Maximization (EM) Algorithm
47 pages
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
No ratings yet
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
15 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
24 pages
Objectives:: Expectation Maximization (Em)
No ratings yet
Objectives:: Expectation Maximization (Em)
17 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
No ratings yet
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
16 pages
HMM Tutorial for Engineers
No ratings yet
HMM Tutorial for Engineers
15 pages
Aiml Lab Algorithms
No ratings yet
Aiml Lab Algorithms
10 pages
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
No ratings yet
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
7 pages
Lecture 7 PDF
No ratings yet
Lecture 7 PDF
23 pages
The Expectation-Maximization Algorithm
No ratings yet
The Expectation-Maximization Algorithm
14 pages
Gaussian Mixtures
No ratings yet
Gaussian Mixtures
5 pages
Em Algorithm Thesis
100% (3)
Em Algorithm Thesis
6 pages
The em Algorithm in ML in Bayesian Learning
No ratings yet
The em Algorithm in ML in Bayesian Learning
12 pages
Chapter 9.4 Allele Frequency Estimation
No ratings yet
Chapter 9.4 Allele Frequency Estimation
24 pages
Expectation Minimization
No ratings yet
Expectation Minimization
22 pages
Oral Texte
No ratings yet
Oral Texte
12 pages
5
No ratings yet
5
29 pages
Machine Learning: CSCE883
No ratings yet
Machine Learning: CSCE883
22 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
Expectation-Maximization (E-M) Algorithm
No ratings yet
Expectation-Maximization (E-M) Algorithm
12 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
Statistical Methods For NLP: Document and Topic Clustering, K-Means, Mixture Models, Expectation-Maximization
No ratings yet
Statistical Methods For NLP: Document and Topic Clustering, K-Means, Mixture Models, Expectation-Maximization
47 pages
(Slides) The em Algorithm
No ratings yet
(Slides) The em Algorithm
14 pages
Lecture-04 GMM EMalg
No ratings yet
Lecture-04 GMM EMalg
34 pages
Expectation Maximization
No ratings yet
Expectation Maximization
14 pages
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
12 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
Expectation Maximization: Dekang Lin Department of Computing Science University of Alberta
No ratings yet
Expectation Maximization: Dekang Lin Department of Computing Science University of Alberta
22 pages
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
No ratings yet
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
8 pages
Expectation-Maximization Algorithm
No ratings yet
Expectation-Maximization Algorithm
13 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
Expectation-Maximization For The Gaussian Mixture Model
No ratings yet
Expectation-Maximization For The Gaussian Mixture Model
8 pages
UNIT 4 - EM Alg
No ratings yet
UNIT 4 - EM Alg
3 pages
34-Em Algorithm-07-03-2023
No ratings yet
34-Em Algorithm-07-03-2023
19 pages
CpE646 6v3 PDF
No ratings yet
CpE646 6v3 PDF
44 pages
Moral Dilemmas
100% (1)
Moral Dilemmas
56 pages
Assignment 1 - IT1403 - Mobile Computing: Assignment 1 SPR No.: Name: Yr & Dept.
No ratings yet
Assignment 1 - IT1403 - Mobile Computing: Assignment 1 SPR No.: Name: Yr & Dept.
2 pages
Peter Resume (P)
No ratings yet
Peter Resume (P)
2 pages
Visual Studio 6.0: Object-Oriented Computer Language Microsoft's Visual Basic Backward Compatibility
No ratings yet
Visual Studio 6.0: Object-Oriented Computer Language Microsoft's Visual Basic Backward Compatibility
3 pages
Resume (P) Bio Data
No ratings yet
Resume (P) Bio Data
2 pages
Chapter 10: Storage and File Structure: Database System Concepts, 6 Ed
No ratings yet
Chapter 10: Storage and File Structure: Database System Concepts, 6 Ed
34 pages
ACIO Iiexe 2013 Detailed Advt
No ratings yet
ACIO Iiexe 2013 Detailed Advt
10 pages
Chapter 10: Storage and File Structure: Database System Concepts, 6 Ed
No ratings yet
Chapter 10: Storage and File Structure: Database System Concepts, 6 Ed
34 pages
Chapter 2: Intro To Relational Model: Database System Concepts, 6 Ed
No ratings yet
Chapter 2: Intro To Relational Model: Database System Concepts, 6 Ed
25 pages
Artificial Intelligence: Agents, Architecture, and Techniques
No ratings yet
Artificial Intelligence: Agents, Architecture, and Techniques
79 pages
EX - NO: Implementation of Address Resolution Protocol Date: Aim
No ratings yet
EX - NO: Implementation of Address Resolution Protocol Date: Aim
31 pages
History of Programming Languages
No ratings yet
History of Programming Languages
21 pages
Part 1 - Overview Chapter 1-Introduction 1.1.1 Introduction To Computer Networks
No ratings yet
Part 1 - Overview Chapter 1-Introduction 1.1.1 Introduction To Computer Networks
31 pages
Artificial Intelligence Course Outline
No ratings yet
Artificial Intelligence Course Outline
5 pages
Cs5151 - Data Base Technology Unit I Data Base System Concept
No ratings yet
Cs5151 - Data Base Technology Unit I Data Base System Concept
8 pages

EM Algorithm for Data Scientists

Uploaded by

EM Algorithm for Data Scientists

Uploaded by

ExpectationMaximization Algorithm and Applications

Eugene Weinstein Courant Institute of Mathematical Sciences Nov 14th, 2006

One-Slide MLE Review

What is the natural estimate of ?

More formally, the likelihood of is governed by a binomial distribution:

Solution: introduce hidden variables to

Hidden and Observed Variables

Hidden variables: influence the data, but not trivial to measure

Assume probability distributions:

Repeat until convergence

( , : old, new distribution parameters) M-step: Find that maximizes Q

Conditional Expectation Review

Maximum Likelihood Problem

Conditional expectation of given x and is

Proof: rewrite as:

Interpretation: relative entropy non-negative

By some algebra and lemma, So, if this quantity is positive, so is

( , : old, new distribution parameters) M-step: Find that maximizes Q

Problems with EM?

When is EM most useful?

EM is a meta-algorithm, needs to be adapted to particular application

EM Applications: Mixture Models

kth Gaussian parametrized by

Log-likelihood function of the data x given

Log of sum hard to optimize analytically! Instead, introduce hidden variable y

Gaussian Mixture Model EM

Several pages of math later, we get: E step: compute likelihood of

M step: update k, k, k for each Gaussian k=1..m

Can compute efficiently recursively

Stochastic String Edit-Distance

possibilities (three edit operations)

(z): expected count of edit operation z e.g,

You might also like