-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Scaling features using MinMaxScaler makes DPGMM always have one cluster #6694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, there are probably bugs in DPGMM. Search the issue tracker for DPGMM and you'll find a few. |
Yes, I've noticed this issue with the old DPGMM class. |
Thanks for the report. We should make this a non-regression test for the new implementation of DP Gaussian Mixtures. |
@tguillemot Is there now a new usable DPGMM version? I tried the version of your GSoC-BayesianMixture Version and it seems something is wrong with the E step (lowerbound doesnot update so the it always finishs after only one iteration) |
@tguillemot Sorry. I forgot to report that my data is quite high dimensional, like 300. In this case, if the data amount is low, then the DPGMM will not work fine. The code has no problem. Thanks. |
@fringsoo Ok |
Should be fixed with the new BayesianMixture. Please give it a shot @fringsoo. |
I have noticed that if I scale my dataset using MinMaxScaler() then if I use DPGMM with whatever value for alpha, it will always create one cluster (label). This might be related to some numerical precision issue.
If I don't rescale the data or if instead of the MinMaxScaler() I use StandardScaler(), then this problem does not occur (i.e., the DPGMM creates more than one cluster).
Is this a bug in the sklearn.mixture.DPGMM or did I miss something ?
API is here: http://scikit-learn.org/stable/modules/generated/sklearn.mixture.DPGMM.html#sklearn.mixture.DPGMM
I have also tried on the artificial data in this example (from the official site) : http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm.html#example-mixture-plot-gmm-py
It works, but if I resale the generated dataset X by adding the following line, then the DPGMM will create only one cluster:
The text was updated successfully, but these errors were encountered: