Thanks to visit codestin.com
Credit goes to github.com

Skip to content

dpgmm sample not working #1637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
esheldon opened this issue Jan 29, 2013 · 15 comments
Closed

dpgmm sample not working #1637

esheldon opened this issue Jan 29, 2013 · 15 comments
Labels
Milestone

Comments

@esheldon
Copy link

Hi Guys -

Trying out DPGMM to see if I can get a more stable solution. Things look good so far except for one problem.

The sample() method does not work for dpgmm. The primary reason is that sample() assumes there is a self.covars_, but this does not exist.

There is a secondary problem however. I tried using get_covars to set the covars but these covariances are not correct, perhaps because of a different convention somehow in the definitions of these covariances.

best, and thanks for the good work on sklearn,
-e

@amueller
Copy link
Member

Thanks for reporting. I'll try to have a look soon (if no one else does before me).

@amueller
Copy link
Member

By the way, thanks for trying to help us with DPGMM. Maybe one good step forward would be to improve test coverage to prevent such mishaps :-/

@amueller
Copy link
Member

Ok, so it seems that the sample method is inherited from GMM. I'm not sure that should be the case. the parameterization in which VBGMM and DPGMM are stored are somewhat different from GMM iirc.
It would be great if GMM and VBGMM/DPGMM had a common interface. That is not really the case now :-/

@GaelVaroquaux
Copy link
Member

Ok, so it seems that the sample method is inherited from GMM. I'm not
sure that should be the case. the parameterization in which VBGMM and
DPGMM are stored are somewhat different from GMM iirc.

So the sample method should probably be rewritten in the subclass.
@esheldon do you want to have a go?

@esheldon
Copy link
Author

I am interested but I don't have much time over the next couple of weeks.

Looking at this, I think the sampling code is the easy part. We just need
to have the variances present in the object, correct?

More of an issue is that the variances produced by get_covar are not
correct. I played with it some more and I'm pretty sure that they are
actually wrong and it isn't just a misinterpretation. I don't understand
the algorithm so I'm not sure I could help.

-e

On Wed, Jan 30, 2013 at 3:24 AM, Gael Varoquaux [email protected]:

Ok, so it seems that the sample method is inherited from GMM. I'm not
sure that should be the case. the parameterization in which VBGMM and
DPGMM are stored are somewhat different from GMM iirc.

So the sample method should probably be rewritten in the subclass.
@esheldon do you want to have a go?


Reply to this email directly or view it on GitHubhttps://github.com//issues/1637#issuecomment-12878997.

Erin Scott Sheldon
Brookhaven National Laboratory erin dot sheldon at gmail dot com

@amueller
Copy link
Member

This is probably the wrong function to get the covariances. The VBGMM stores the precision values, not covariances iirc.

@esheldon
Copy link
Author

It just inverts the precisions

def _get_covars(self):
    return [pinvh(c) for c in self._get_precisions()]

-e

On Wed, Jan 30, 2013 at 9:44 AM, Andreas Mueller
[email protected] wrote:

This is probably the wrong function to get the covariances. The VBGMM stores the precision values, not covariances iirc.


Reply to this email directly or view it on GitHub.

Erin Scott Sheldon
Brookhaven National Laboratory erin dot sheldon at gmail dot com

@amueller
Copy link
Member

Ok I'll have to have a closer look. What gave you the impression they are wrong?

@esheldon
Copy link
Author

Looks like they are off by a factor of two, so maybe it is just definitional.

x=numpy.random.randn(1000)
gmm=mixture.DPGMM(n_components=1)
gmm.fit(x.reshape(1000,1))
gmm.precs_
[array([[ 0.49380025]])]
gmm._get_covars()
[array([[ 2.02511035]])]

@amueller
Copy link
Member

hm that is a bit irritating behavior, though....

@amueller
Copy link
Member

The get_covars just uses precs_. I think this is an "off by two" error in VBGMM/ DPGMM. It is the same for all covariance types apparently. cc @alextp

@joelkuiper
Copy link

Is there any word on this? Or a hint on how to fix it 😄 I'd love to get a sample from the DPGMM
@amueller @alextp

@HapeMask
Copy link

Looks like they are off by a factor of two, so maybe it is just definitional.

x=numpy.random.randn(1000)
gmm=mixture.DPGMM(n_components=1)
gmm.fit(x.reshape(1000,1))
gmm.precs_
[array([[ 0.49380025]])]
gmm._get_covars()
[array([[ 2.02511035]])]

Maybe I'm sorely mistaken / missing something obvious, but this appears to be correct behavior to me. Precision is inverse covariance, and the covariance returned by _get_covars() is equal to the inverse precision in your example:

>>> 1 / 2.02511035
0.49380025142827405

I don't see any off-by-two error here.

@amueller
Copy link
Member

I agree with @HapeMask, I think this looks ok. I fixed it in #4182.

@ogrisel
Copy link
Member

ogrisel commented Sep 10, 2016

Closing: the new Dirichlet process GMM re-write has been merged in master. Its sample method is properly tested.

@ogrisel ogrisel closed this as completed Sep 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants