1|Page
KRISHNA ENGINEERING COLLEGE
(Approved by AICTE & Affiliated to Dr. APJ Abdul Kalam Technical University (Formerly UPTU), Lucknow)
Department of CSE-Artificial Intelligence
Department of CSE-Artificial Intelligence & Machine Learning
Machine Learning Techniques (KAI601)
Unit-2:
REGRESSION: Linear Regression and Logistic Regression
BAYESIAN LEARNING - Bayes theorem, Concept learning, Bayes Optimal Classifier, Naïve
Bayes classifier, Bayesian belief networks, EM algorithm.
SUPPORT VECTOR MACHINE: Introduction, Types of support vector kernel – (Linear kernel,
polynomial kernel, and Gaussian kernel), Hyperplane – (Decision surface), Properties of SVM, and
Issues in SVM.
Expectation Maximization Algorithm
The EM algorithm was explained and given its name in a classic 1977 paper by
Arthur Dempster, Nan Laird, and Donald Rubin
2|Page
Explanation of the Algorithm:
The steps of the EM algorithm are as follows:
1. We first consider a set of starting parameters given a set of incomplete (observed) data and we
assume that observed data come from a specific model
2. We then use the model to “estimate” the missing data. In other words after formulating some
parameters from observed data to build a model, we use this model to guess the missing value/data.
This step is called the expectation step.
3. Now we use the “complete” data that we have estimated to update parameters using the missing
data and observed data, we find the most likely modified parameters to build the modified model.
This is called the maximization step.
4. We repeat steps 2 & 3 until convergence that is there is no change in the parameters of the model
and the estimated model fits the observed data.
3|Page
4|Page
The major strength of the EM algorithm is its numerical stability where in every iteration of the
EM algorithm, the likelihood of the observed data increases that is we are heading towards a
solution. In addition, the EM handles parameter constraints gracefully.
In the case of EM algorithms can converge very slowly on some problems and this convergence is
intimately related to the amount of missing information. It guarantees to improve the probability
of the training corpus, which is different from reducing the errors directly. The EM algorithm
cannot guarantee to reach global maximum and sometimes could get struck at the local maxima,
saddle points, etc. Essentially the guess we make of the initial parameter values is very important
and can decide on the time to converge.
5|Page
6|Page
When we toss coin A, there is 80% chance that we will get head.
When we toss coin B, there is 45% chance that we will get head.
7|Page
When we toss coin A, there is 80% chance that we will get head.
When we toss coin B, there is 52% chance that we will get head.
8|Page
K-Means and EM algorithm
First we initialize the k means of the K-Means algorithm.
In the E Step we assign each point to a Cluster and during the M Step given the Clusters we refine
mean of each cluster k.
This process is repeated until the change in means is small.
Applications of EM Algorithm
EM Algorithm is often used in data clustering in Machine Learning and computer vision.
It is also used in natural language processing
The EM algorithm is used for parameter estimation in mixed models and quantitative
genetics
9|Page
It is used in psychometrics for estimating item parameters and latent abilities of item
response theory models
Some other applications include medical image reconstruction, structural engineering, etc.
Reference:
https://www.youtube.com/watch?v=qy3WKmSXM64
https://www.youtube.com/watch?v=3oefV-AoP0E
https://www.youtube.com/watch?v=DIADjJXrgps
https://www.edureka.co/blog/em-algorithm-in-machine-learning/
https://towardsdatascience.com/gaussian-mixture-models-explained-6986aaf5a95
https://ebooks.inflibnet.ac.in/csp15/chapter/expectation-and-maximization/
10 | P a g e
11 | P a g e