Gaussian Mixture Models Explained - Towards ... https://towardsdatascience.com/gaussian-mixtu...
Get one more story in your member
preview when you sign up. It’s free.
Already have an
Sign up with Google Sign up with Facebook
account? Sign in
1 of 12 11/7/19, 2:24 PM
Gaussian Mixture Models Explained - Towards ... https://towardsdatascience.com/gaussian-mixtu...
Gaussian Mixture Models
Explained
From intuition to implementation
Oscar Contreras Carrasco Follow
Jun 3 · 12 min read
In the world of Machine Learning, we can distinguish two
main areas: Supervised and unsupervised learning. The
main difference between both lies in the nature of the data
as well as the approaches used to deal with it. Clustering is
an unsupervised learning problem where we intend to find
clusters of points in our dataset that share some common
characteristics. Let’s suppose we have a dataset that looks
like this:
Get one more story in your member
preview when you sign up. It’s free.
Already have an
Sign up with Google Sign up with Facebook
account? Sign in
2 of 12 11/7/19, 2:24 PM
Gaussian Mixture Models Explained - Towards ... https://towardsdatascience.com/gaussian-mixtu...
. . .
Get one more story in your member
preview when you sign up. It’s free.
Already have an
Sign up with Google Sign up with Facebook
account? Sign in
3 of 12 11/7/19, 2:24 PM
Gaussian Mixture Models Explained - Towards ... https://towardsdatascience.com/gaussian-mixtu...
De�initions
A Gaussian Mixture is a function that is comprised of several
Gaussians, each identified by k ∈ {1,…, K}, where K is the
number of clusters of our dataset. Each Gaussian k in the
mixture is comprised of the following parameters:
A mean μ that defines its centre.
A covariance Σ that defines its width. This would be
equivalent to the dimensions of an ellipsoid in a
multivariate scenario.
A mixing probability π that defines how big or small the
Gaussian function will be.
Let us now illustrate these parameters graphically:
Get one more story in your member
preview when you sign up. It’s free.
Already have an
Sign up with Google Sign up with Facebook
account? Sign in
Here, we can see that there are three Gaussian functions,
4 of 12 11/7/19, 2:24 PM
Gaussian Mixture Models Explained - Towards ... https://towardsdatascience.com/gaussian-mixtu...
. . .
Get one more story in your member
preview when you sign up. It’s free.
Already have an
Sign up with Google Sign up with Facebook
account? Sign in
5 of 12 11/7/19, 2:24 PM
Gaussian Mixture Models Explained - Towards ... https://towardsdatascience.com/gaussian-mixtu...
Initial derivations
We are now going to introduce some additional notation.
Just a word of warning. Math is coming on! Don’t worry. I’ll
try to keep the notation as clean as possible for better
understanding of the derivations. First, let’s suppose we
want to know what is the probability that a data point xn
comes from Gaussian k. We can express this as:
Which reads “given a data point x, what is the probability it
came from Gaussian k?” In this case, z is a latent variable that
takes only two possible values. It is one when x came from
Gaussian k, and zero otherwise. Actually, we don’t get to see
this z variable in reality, but knowing its probability of
occurrence will be useful in helping us determine the
Gaussian mixture parameters, as we discuss later.
Likewise, we can state the following:
Get one more story in your member
preview when you sign up. It’s free.
Already have an
Sign up with Google Sign up with Facebook
account? Sign in
Which means that the overall probability of observing a
6 of 12 11/7/19, 2:24 PM
Gaussian Mixture Models Explained - Towards ... https://towardsdatascience.com/gaussian-mixtu...
. . .
Get one more story in your member
preview when you sign up. It’s free.
Already have an
Sign up with Google Sign up with Facebook
account? Sign in
7 of 12 11/7/19, 2:24 PM
Gaussian Mixture Models Explained - Towards ... https://towardsdatascience.com/gaussian-mixtu...
Expectation — Maximization algorithm
Well, at this point we have derived some expressions for the
probabilities that we will find useful in determining the
parameters of our model. However, in the past section we
could see that simply evaluating (3) to find such parameters
would prove to be very hard. Fortunately, there is an
iterative method we can use to achieve this purpose. It is
called the Expectation — Maximization, or simply EM
algorithm. It is widely used for optimization problems where
the objective function has complexities such as the one we’ve
just encountered for the GMM case.
Let the parameters of our model be
Let us now define the steps that the general EM algorithm
will follow¹.
Step 1: Initialise θ accordingly. For instance, we can use the
Get one
results moreby
obtained story in your
a previous member
K-Means run as a good
preview when
starting point youalgorithm.
for our sign up. It’s free.
Already have an
Sign up with Google Sign up with Facebook
Step 2 (Expectation step): Evaluate account? Sign in
8 of 12 11/7/19, 2:24 PM
Gaussian Mixture Models Explained - Towards ... https://towardsdatascience.com/gaussian-mixtu...
. . .
Get one more story in your member
preview when you sign up. It’s free.
Already have an
Sign up with Google Sign up with Facebook
account? Sign in
9 of 12 11/7/19, 2:24 PM
Gaussian Mixture Models Explained - Towards ... https://towardsdatascience.com/gaussian-mixtu...
Implementation in Python
Just as a side note, the full implementation is available as a
Jupyter notebook at https://bit.ly/2MpiZp4
I have used the Iris dataset for this exercise, mainly for
simplicity and fast training. From our previous derivations,
we stated that the EM algorithm follows an iterative
approach to find the parameters of a Gaussian Mixture
Model. Our first step was to initialise our parameters. In this
case, we can use the values of K-means to suit this purpose.
The Python code for this would look like:
1 def initialize_clusters(X, n_clusters):
2 clusters = []
3 idx = np.arange(X.shape[0])
4
5 kmeans = KMeans().fit(X)
6 mu_k = kmeans.cluster_centers_
7
8 for i in range(n_clusters):
9 clusters.append({
10 'pi_k': 1.0 / n_clusters,
11 'mu_k': mu_k[i],
12 'cov_k': np.identity(X.shape[1], dtype=np.float64)
13 })
14
Get one more story in your member
Next, we execute the expectation step. Here we calculate
preview when you sign up. It’s free.
Already have an
Sign up with Google Sign up with Facebook
account? Sign in
10 of 12 11/7/19, 2:24 PM
Gaussian Mixture Models Explained - Towards ... https://towardsdatascience.com/gaussian-mixtu...
. . .
Final remarks
Gaussian Mixture Models are a very powerful tool and are
widely used in diverse tasks that involve data clustering. I
hope you found this post useful! Feel free to approach with
questions or comments. I would also highly encourage you to
try the derivations yourself as well as look further into the
code. I look forward to creating more material like this soon.
Enjoy!
. . .
[1] Bishop, Christopher M. Pattern Recognition and Machine
Learning (2006) Springer-Verlag Berlin, Heidelberg.
[2] Murphy, Kevin P. Machine Learning: A Probabilistic
Perspective (2012) MIT Press, Cambridge, Mass,
Machine Learning Gaussian Mixture Model Gmm Clustering
Get one
Towards Datamore
Science story in your member
preview when you sign up. It’s free.
Already have an
Sign up with Google Sign up with Facebook
account? Sign in
11 of 12 11/7/19, 2:24 PM
Gaussian Mixture Models Explained - Towards ... https://towardsdatascience.com/gaussian-mixtu...
Discover Medium Make Medium Become a member
Welcome to a place where
yours Get unlimited access to
words matter. On Follow all the topics you the best stories on
Medium, smart voices and care about, and we’ll Medium — and support
original ideas take center deliver the best stories for writers while you’re at it.
stage - with no ads in you to your homepage and Just $5/month. Upgrade
sight. Watch inbox. Explore
About Help Legal
Get one more story in your member
preview when you sign up. It’s free.
Already have an
Sign up with Google Sign up with Facebook
account? Sign in
12 of 12 11/7/19, 2:24 PM