-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
sklearn.mixture.GaussianMixture doesn't sample properly plus a prob with fitting #7822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The last issue was fixed in #7701. The first we may not fix: generally On 4 November 2016 at 08:23, Luca Celotti [email protected] wrote:
|
I updated the sklearn, and it doesn't seem to solve the sampling issue, for 3000 asked gives back 1177, but probably I have to update in a more savvy manner and besides, this is an issue I already found a way around. The first issue is more annoying because I think it's fundamental for people using GMM to be able to compare proposed methods with new data. In my case I'm defining a GMM using a novelty detection rule, so it's more complicated than just fitting to data. |
It's not in the current release, but it's included in the current master, and should be in 0.18.1 to be released shortly. |
Ping @tguillemot |
Thanks a lot. At the end I decided to use another package instead of sklearn http://pypr.sourceforge.net/mog.html I don't know if you can get inspiration from there. I don't know how far reaching you want your package to be. In my interest I would want it as complete as possible, but it's clear that there are alternative solutions. Best, |
In your example :
So, it's normal if you have only 665 elements.
Indeed there was a problem and it's corrected by #7702.
This issue is linked to #7701 : I think what people want is a @agramfort @raghavrv @TomDLT @jnothman What's your point of view about the last point ? |
You mean people want to define a parametrised gaussian mixture and sample from it? I'm ambivalent. |
@jnothman Yes it is what I mean. As it's the second time someone ask for something like that, I give you the code to do the sample. For the full covariance case : weights /= np.sum(weights)
n_samples = rng.multinomial(n_samples, weights_)
X = np.vstack([rng.multivariate_normal(mean, covariance, int(sample))
for (mean, covariance, n_sample) in zip(means_, covariances_, n_samples)])
y = np.concatenate([j * np.ones(sample, dtype=int)
for j, sample in enumerate(n_samples_comp)]) For the diagonal covariance case : weights /= np.sum(weights)
n_samples = rng.multinomial(int(n_samples), weights_)
X = np.vstack([mean + rng.randn(n_sample, n_features) * np.sqrt(covariance)
for (mean, covariance, n_sample) in zip(means_, covariances_, n_samples)])
y = np.concatenate([j * np.ones(sample, dtype=int)
for j, sample in enumerate(n_samples_comp)]) |
yes, I realise it's no big deal. |
@jnothman Indeed, sorry. |
I suppose you're right that just as we don't provide a separate function to |
@ACTLA we definitely don't want to be as far-reaching as we can be. We are already stretched very thin. Maybe also check out pomegranate: https://github.com/jmschrei/pomegranate |
I vote to close. Not sure if pomegranate is where we want to sent people for sampling but it certainly addresses this use-case. |
+1 to close |
+1
|
@LuCeHe The trick about using precisions_cholesky_ is really useful. It enabled me to continue using sklearn, and avoid changing a large codebase to change to another package. IMHO, one should be able to predict posteriors (my use-case) without fitting, if all the parameters are provided at creation by the user. |
Uh oh!
There was an error while loading. Please reload this page.
Description
the new sklearn.mixture.GaussianMixture has messed up several things. 2 of them:
Steps/Code to Reproduce
Expected Results
1000,1000
Actual Results
665, 665
Versions
The text was updated successfully, but these errors were encountered: