-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: numpy.random.Generator.dirichlet should accept zeros. #22547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This was implemented in #9577. That PR description claims that it hangs when the values are 0, but I don't see how that can be the case as I think we could make |
I think it might actually be important for alpha[i] to be greater than 0. The PDF of the distribution is inversely proportional to B(alpha), which is the product of all gamma(alpha[i]) divided by the gamma of the sum of alpha. If alpha[i] is 0, then gamma(alpha[i]) is inf, which breaks everything. |
Sometimes those kinds of divergences in the PDF don't really affect our ability to draw random numbers. I think this is just such a case. When This is analogous to the case of a multivariate normal with a singular covariance matrix. The PDF is notionally infinite on the ridge. You can transform to the lower-dimensional nonsingular space, draw the multivariate normal there, then transform back to the full space. Both of these are coherent procedures that have practical uses. |
I will put this in terms of a simple example. If I flip a coin twice and get heads twice, the probability of tails is still greater than zero. Sorry, I don’t have any tools at-the-ready to say precisely what distribution should be – but I would ballpark it at 0.25.
Dirichlet behaves reasonably if I tell it there was one head and one tail.
If I tell Dirichlet there were two head and 0.00001 of a tail, then the sampler is convinced that the probability of a tail is very, very small.
If I tell Dirichlet there were zero tails, it just crashes.
Code examples and results below.
In its current state, Dirichlet crashes when it shouldn’t. The quick-fix of using small values instead of zeroes causes misleading results.
Thank you for your attention to the matter. I regret I don’t have expertise to suggest a solution.
Wayne Hajas
From: Robert Kern ***@***.***>
Sent: Thursday, November 17, 2022 7:36 AM
To: numpy/numpy ***@***.***>
Cc: Hajas, Wayne ***@***.***>; Author ***@***.***>
Subject: Re: [numpy/numpy] BUG: numpy.random.Generator.dirichlet should accept zeros. (Issue #22547)
Sometimes those kinds of divergences in the PDF don't really affect our ability to draw random numbers. I think this is just such a case. When alpha[i] = 0, then we're really just drawing from a Dirichlet distribution of one dimension lower, with i removed. Then we shove a 0 back in its place when we're done.
This is analogous to the case of a multivariate normal with a singular covariance matrix. The PDF is notionally infinite on the ridge. You can transform to the lower-dimensional nonsingular space, draw the multivariate normal there, then transform back to the full space.
Both of these are coherent procedures that have practical uses.
—
Reply to this email directly, view it on GitHub<#22547 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHAYILDWHWSXRKW4ZRIOIWDWIZGGHANCNFSM6AAAAAARZRPN54>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>, the
|
Hello! I would like to work on this issue. From my understanding, accepting |
The
Per NEP 19, the |
Describe the issue:
numpy.random.mtrand.RandomState.dirichlet no longer accepts alpha(count)-values that are zero.
With older (e.g. 1.11.3) versions of numpy, dirichlet accepted zero as an alpha(count) value.
With newer (e.g. 1.21.5) versions of numpy, alpha(count) values must be greater than zero. Very small real-values are accepted.
I have some applications where alpha(count|)-values are raw-data and zero is a very valid value. These applications worked with old versions of numpy but not with newer versions.
Reproduce the code example:
Error message:
NumPy/Python version information:
1.21.5
Context for the issue:
I have some applications where alpha(count|)-values are raw-data and zero is a very valid value. These applications worked with old versions of numpy but not with newer versions.
The text was updated successfully, but these errors were encountered: