Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Density doesn't normalise in VBGMM and DPGMM #4267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
markvdw opened this issue Feb 18, 2015 · 7 comments
Closed

Density doesn't normalise in VBGMM and DPGMM #4267

markvdw opened this issue Feb 18, 2015 · 7 comments
Labels
Milestone

Comments

@markvdw
Copy link

markvdw commented Feb 18, 2015

I'm having trouble using the VBGMM and DPGMM for density estimation. As far as I understand, both should have the same interface as the "normal" GMM. However, while the "normal" GMM produces a good fit, the VBGMM and DPGMM produce bad fits and non-normalised densities. This leads me to wonder whether there is something deeper wrong than me incorrectly using the code.

The problem presents itself both in the density estimation example, by appending the line:

print np.sum(np.exp(-Z)) * (x[1] - x[0]) * (y[1] - y[0])

This is approximately 1 when using a normal GMM, but much smaller when using the VB or DP GMM's.

The same behaviour is shown on a toy 1D density estimation problem:

import numpy as np
import numpy.random as rndn
import sklearn.mixture as skmix
import matplotlib.pyplot as plt

X = rnd.randn(0.7 * 300, 1) - 5
X = np.vstack((X, rnd.randn(0.3 * 300, 1) * 0.3 + 3))

# gmm = skmix.GMM(2)
gmm = skmix.DPGMM(2)
gmm.fit(X)

x = np.linspace(-10, 10, 1000)
p = np.exp(gmm.score(x))

plt.hist(X, bins=50, normed=True)
plt.plot(x, p)
plt.show()

integral = np.sum(p) * (x[1] - x[0])
print integral

Is this behaviour just the result of a poor fit due to a local optimum or something? The fact that the predictive densities don't normalise lead me to believe it's something else.

I asked the same question on StackOverflow.

@amueller
Copy link
Member

As I said there, I think there are issues with VBGMM. The variational updates are non-standard. There is another issue here: #2454

@amueller
Copy link
Member

Does the above code run for you? Which version of scikit-learn are you using?

I think it should be

import numpy as np
import numpy.random as rndn
import sklearn.mixture as skmix
import matplotlib.pyplot as plt
rnd  =  np.random.RandomState()

X = rnd.randn(0.7 * 300, 1) - 5
X = np.vstack((X, rnd.randn(0.3 * 300, 1) * 0.3 + 3))

# gmm = skmix.GMM(2)
gmm = skmix.DPGMM(2)
gmm.fit(X)

x = np.linspace(-10, 10, 1000).reshape(-1, 1)
p = np.exp(gmm.score_samples(x)[0])

plt.hist(X, bins=50, normed=True)
plt.plot(x, p)
plt.show()

integral = np.sum(p) * (x[1] - x[0])
print integral

@markvdw
Copy link
Author

markvdw commented Feb 20, 2015

Both the code above and the code you wrote run (though there's a typo in the import numpy.random statement). I'm using 0.15.2. Both produce the same result. Bad fits and non-normalised predictions.

@amueller amueller added the Bug label Feb 23, 2015
@amueller
Copy link
Member

I only pretty recently looked into the module and I agree that there is something fishy.
I think we need to rewrite it, as the derivation of the variation updates seems non-standard. Any help is very welcome.

@markvdw
Copy link
Author

markvdw commented Feb 23, 2015

I'm currently busy with a job, but I'll have some free time in April. Should be an interesting project - I'll get in touch nearer that time.

@amueller
Copy link
Member

We might pose it as a GSOC project, but if no-one takes it you are very welcome. I think it is actually fun, but I'm not sure I'll have the time.

@amueller amueller modified the milestone: 0.19 Sep 29, 2016
@amueller
Copy link
Member

This should be fixed in the new BayesianGaussianMixture class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants