You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd expect that if n_samples are requested, n_samples are provided, even when n_samples is not an even multiple of the number of mixture distributions.
Actual Results
Instead, it appears that my expectations are confounded in Line 388 of mixture/base.py, since
np.round(model.weights*n_samples).astype(int)
will not necessarily sum to n, depending on the weights and samples.
I could be mistaken on this, but I think that the correct way to sample n_samples from a mixture distribution with frequency vector weights_ is to draw a count vector, np.random.multinomial(n_samples, weights_), and then draw the corresponding number of samples from each component distribution as comes out of the multinomial draw.
This would involve a one-line change to mixture/base.py replacing line 388 referenced above with:
The text was updated successfully, but these errors were encountered:
ljwolf
changed the title
GaussianMixture.sample(n) occasionally returns incorrectly-sized samples possibly due to rounding error.
GaussianMixture.sample(n) occasionally returns incorrectly-sized samples
Oct 18, 2016
Thanks a lot for the very detailed issue! Not an expert on Gaussian mixtures, but it seems like a problem indeed. The GaussianMixture.sample docstring does say that the returned array should have n_samples rows:
Generate random samples from the fitted Gaussian distribution.
Parameters
----------
n_samples : int, optional
Number of samples to generate. Defaults to 1.
Returns
-------
X : array, shape (n_samples, n_features)
Randomly generated sample
Description
GaussianMixture.sample(n)
occasionally returns incorrectly-sized samples possibly due to rounding error.Steps/Code to Reproduce
Expected Results
I'd expect that if
n_samples
are requested,n_samples
are provided, even whenn_samples
is not an even multiple of the number of mixture distributions.Actual Results
Instead, it appears that my expectations are confounded in Line 388 of
mixture/base.py
, sincewill not necessarily sum to
n
, depending on the weights and samples.Suggested Patch
I could be mistaken on this, but I think that the correct way to sample
n_samples
from a mixture distribution with frequency vectorweights_
is to draw a count vector,np.random.multinomial(n_samples, weights_)
, and then draw the corresponding number of samples from each component distribution as comes out of the multinomial draw.This would involve a one-line change to
mixture/base.py
replacing line 388 referenced above with:Versions
The text was updated successfully, but these errors were encountered: