-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Density doesn't normalise in VBGMM and DPGMM #4267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As I said there, I think there are issues with VBGMM. The variational updates are non-standard. There is another issue here: #2454 |
Does the above code run for you? Which version of scikit-learn are you using? I think it should be import numpy as np
import numpy.random as rndn
import sklearn.mixture as skmix
import matplotlib.pyplot as plt
rnd = np.random.RandomState()
X = rnd.randn(0.7 * 300, 1) - 5
X = np.vstack((X, rnd.randn(0.3 * 300, 1) * 0.3 + 3))
# gmm = skmix.GMM(2)
gmm = skmix.DPGMM(2)
gmm.fit(X)
x = np.linspace(-10, 10, 1000).reshape(-1, 1)
p = np.exp(gmm.score_samples(x)[0])
plt.hist(X, bins=50, normed=True)
plt.plot(x, p)
plt.show()
integral = np.sum(p) * (x[1] - x[0])
print integral |
Both the code above and the code you wrote run (though there's a typo in the import numpy.random statement). I'm using 0.15.2. Both produce the same result. Bad fits and non-normalised predictions. |
I only pretty recently looked into the module and I agree that there is something fishy. |
I'm currently busy with a job, but I'll have some free time in April. Should be an interesting project - I'll get in touch nearer that time. |
We might pose it as a GSOC project, but if no-one takes it you are very welcome. I think it is actually fun, but I'm not sure I'll have the time. |
This should be fixed in the new BayesianGaussianMixture class. |
I'm having trouble using the VBGMM and DPGMM for density estimation. As far as I understand, both should have the same interface as the "normal" GMM. However, while the "normal" GMM produces a good fit, the VBGMM and DPGMM produce bad fits and non-normalised densities. This leads me to wonder whether there is something deeper wrong than me incorrectly using the code.
The problem presents itself both in the density estimation example, by appending the line:
This is approximately 1 when using a normal GMM, but much smaller when using the VB or DP GMM's.
The same behaviour is shown on a toy 1D density estimation problem:
Is this behaviour just the result of a poor fit due to a local optimum or something? The fact that the predictive densities don't normalise lead me to believe it's something else.
I asked the same question on StackOverflow.
The text was updated successfully, but these errors were encountered: