Density doesn't normalise in VBGMM and DPGMM #4267

markvdw · 2015-02-18T23:47:13Z

I'm having trouble using the VBGMM and DPGMM for density estimation. As far as I understand, both should have the same interface as the "normal" GMM. However, while the "normal" GMM produces a good fit, the VBGMM and DPGMM produce bad fits and non-normalised densities. This leads me to wonder whether there is something deeper wrong than me incorrectly using the code.

The problem presents itself both in the density estimation example, by appending the line:

print np.sum(np.exp(-Z)) * (x[1] - x[0]) * (y[1] - y[0])

This is approximately 1 when using a normal GMM, but much smaller when using the VB or DP GMM's.

The same behaviour is shown on a toy 1D density estimation problem:

import numpy as np
import numpy.random as rndn
import sklearn.mixture as skmix
import matplotlib.pyplot as plt

X = rnd.randn(0.7 * 300, 1) - 5
X = np.vstack((X, rnd.randn(0.3 * 300, 1) * 0.3 + 3))

# gmm = skmix.GMM(2)
gmm = skmix.DPGMM(2)
gmm.fit(X)

x = np.linspace(-10, 10, 1000)
p = np.exp(gmm.score(x))

plt.hist(X, bins=50, normed=True)
plt.plot(x, p)
plt.show()

integral = np.sum(p) * (x[1] - x[0])
print integral

Is this behaviour just the result of a poor fit due to a local optimum or something? The fact that the predictive densities don't normalise lead me to believe it's something else.

I asked the same question on StackOverflow.

The text was updated successfully, but these errors were encountered:

amueller · 2015-02-20T18:28:24Z

As I said there, I think there are issues with VBGMM. The variational updates are non-standard. There is another issue here: #2454

amueller · 2015-02-20T18:36:25Z

Does the above code run for you? Which version of scikit-learn are you using?

I think it should be

import numpy as np
import numpy.random as rndn
import sklearn.mixture as skmix
import matplotlib.pyplot as plt
rnd  =  np.random.RandomState()

X = rnd.randn(0.7 * 300, 1) - 5
X = np.vstack((X, rnd.randn(0.3 * 300, 1) * 0.3 + 3))

# gmm = skmix.GMM(2)
gmm = skmix.DPGMM(2)
gmm.fit(X)

x = np.linspace(-10, 10, 1000).reshape(-1, 1)
p = np.exp(gmm.score_samples(x)[0])

plt.hist(X, bins=50, normed=True)
plt.plot(x, p)
plt.show()

integral = np.sum(p) * (x[1] - x[0])
print integral

markvdw · 2015-02-20T19:05:59Z

Both the code above and the code you wrote run (though there's a typo in the import numpy.random statement). I'm using 0.15.2. Both produce the same result. Bad fits and non-normalised predictions.

amueller · 2015-02-23T02:56:12Z

I only pretty recently looked into the module and I agree that there is something fishy.
I think we need to rewrite it, as the derivation of the variation updates seems non-standard. Any help is very welcome.

markvdw · 2015-02-23T17:55:22Z

I'm currently busy with a job, but I'll have some free time in April. Should be an interesting project - I'll get in touch nearer that time.

amueller · 2015-02-24T21:51:18Z

We might pose it as a GSOC project, but if no-one takes it you are very welcome. I think it is actually fun, but I'm not sure I'll have the time.

amueller · 2016-09-29T13:52:50Z

This should be fixed in the new BayesianGaussianMixture class.

amueller added the Bug label Feb 23, 2015

amueller modified the milestone: 0.19 Sep 29, 2016

amueller closed this as completed Sep 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Density doesn't normalise in VBGMM and DPGMM #4267

Density doesn't normalise in VBGMM and DPGMM #4267

markvdw commented Feb 18, 2015

amueller commented Feb 20, 2015

amueller commented Feb 20, 2015

markvdw commented Feb 20, 2015

amueller commented Feb 23, 2015

markvdw commented Feb 23, 2015

amueller commented Feb 24, 2015

amueller commented Sep 29, 2016

Density doesn't normalise in VBGMM and DPGMM #4267

Density doesn't normalise in VBGMM and DPGMM #4267

Comments

markvdw commented Feb 18, 2015

amueller commented Feb 20, 2015

amueller commented Feb 20, 2015

markvdw commented Feb 20, 2015

amueller commented Feb 23, 2015

markvdw commented Feb 23, 2015

amueller commented Feb 24, 2015

amueller commented Sep 29, 2016