BUG: numpy.random.Generator.dirichlet should accept zeros. #22547

WayneHajas · 2022-11-07T19:53:03Z

Describe the issue:

numpy.random.mtrand.RandomState.dirichlet no longer accepts alpha(count)-values that are zero.

With older (e.g. 1.11.3) versions of numpy, dirichlet accepted zero as an alpha(count) value.

numpy.version
'1.11.3'
dirichlet([5,9,0,8])
array([ 0.17970351, 0.35902845, 0. , 0.46126803])
dirichlet([5,9,0,8])
array([ 0.15228294, 0.45822224, 0. , 0.38949482])

With newer (e.g. 1.21.5) versions of numpy, alpha(count) values must be greater than zero. Very small real-values are accepted.

numpy.version
'1.21.5'
dirichlet([5,9,0.000001,8])
array([0.38285451, 0.26206592, 0. , 0.35507958])
dirichlet([5,9,0,8])
Traceback (most recent call last):
File "", line 1, in
File "mtrand.pyx", line 4390, in numpy.random.mtrand.RandomState.dirichlet
ValueError: alpha <= 0

I have some applications where alpha(count|)-values are raw-data and zero is a very valid value. These applications worked with old versions of numpy but not with newer versions.

Reproduce the code example:

dirichlet([5,9,0,8])

Error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "mtrand.pyx", line 4390, in numpy.random.mtrand.RandomState.dirichlet
ValueError: alpha <= 0

NumPy/Python version information:

1.21.5

Context for the issue:

I have some applications where alpha(count|)-values are raw-data and zero is a very valid value. These applications worked with old versions of numpy but not with newer versions.

rkern · 2022-11-07T21:58:14Z

This was implemented in #9577. That PR description claims that it hangs when the values are 0, but I don't see how that can be the case as standard_gamma() has had special cases for shape == 0. It did appear to hang when the values were very, very low but nonzero because of the unprotected loop in that case. I suspect the main reason alpha[i] == 0 was excluded because Wikipedia entry claims that alpha[i] > 0 is required, but that is often dodgy.

I think we could make Generator.dirichlet() accept alpha[i] == 0 (though not RandomState, per NEP 19).

MatteoRaso · 2022-11-17T09:15:07Z

I think it might actually be important for alpha[i] to be greater than 0. The PDF of the distribution is inversely proportional to B(alpha), which is the product of all gamma(alpha[i]) divided by the gamma of the sum of alpha. If alpha[i] is 0, then gamma(alpha[i]) is inf, which breaks everything.

rkern · 2022-11-17T15:35:52Z

Sometimes those kinds of divergences in the PDF don't really affect our ability to draw random numbers. I think this is just such a case. When alpha[i] = 0, then we're really just drawing from a Dirichlet distribution of one dimension lower, with i removed. Then we shove a 0 back in its place when we're done.

This is analogous to the case of a multivariate normal with a singular covariance matrix. The PDF is notionally infinite on the ridge. You can transform to the lower-dimensional nonsingular space, draw the multivariate normal there, then transform back to the full space.

Both of these are coherent procedures that have practical uses.

WayneHajas · 2022-11-21T17:14:24Z

I will put this in terms of a simple example. If I flip a coin twice and get heads twice, the probability of tails is still greater than zero. Sorry, I don’t have any tools at-the-ready to say precisely what distribution should be – but I would ballpark it at 0.25. Dirichlet behaves reasonably if I tell it there was one head and one tail. If I tell Dirichlet there were two head and 0.00001 of a tail, then the sampler is convinced that the probability of a tail is very, very small. If I tell Dirichlet there were zero tails, it just crashes. Code examples and results below. In its current state, Dirichlet crashes when it shouldn’t. The quick-fix of using small values instead of zeroes causes misleading results. Thank you for your attention to the matter. I regret I don’t have expertise to suggest a solution. Wayne Hajas From: Robert Kern ***@***.***> Sent: Thursday, November 17, 2022 7:36 AM To: numpy/numpy ***@***.***> Cc: Hajas, Wayne ***@***.***>; Author ***@***.***> Subject: Re: [numpy/numpy] BUG: numpy.random.Generator.dirichlet should accept zeros. (Issue #22547) Sometimes those kinds of divergences in the PDF don't really affect our ability to draw random numbers. I think this is just such a case. When alpha[i] = 0, then we're really just drawing from a Dirichlet distribution of one dimension lower, with i removed. Then we shove a 0 back in its place when we're done. This is analogous to the case of a multivariate normal with a singular covariance matrix. The PDF is notionally infinite on the ridge. You can transform to the lower-dimensional nonsingular space, draw the multivariate normal there, then transform back to the full space. Both of these are coherent procedures that have practical uses. — Reply to this email directly, view it on GitHub<#22547 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHAYILDWHWSXRKW4ZRIOIWDWIZGGHANCNFSM6AAAAAARZRPN54>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>, the

pcralmeida · 2023-03-19T12:20:11Z

Hello! I would like to work on this issue. From my understanding, accepting alpha[i] == 0 would suffice, open to sugestions though.

WarrenWeckesser · 2023-08-14T01:13:03Z

The dirichlet method of the Generator class now allows elements of alpha to be zero (see #23440 and the follow-up #24220):

In [4]: np.__version__
Out[4]: '2.0.0.dev0+git20230813.104addf'

In [5]: rng = np.random.default_rng()

In [6]: rng.dirichlet([5, 9, 0, 8])
Out[6]: array([0.22294196, 0.51402094, 0.        , 0.26303709])

In [7]: rng.dirichlet([5, 9, 0, 8], size=8)
Out[7]: 
array([[0.31371971, 0.31066775, 0.        , 0.37561255],
       [0.25549883, 0.52689855, 0.        , 0.21760262],
       [0.15567579, 0.40443253, 0.        , 0.43989168],
       [0.18513736, 0.55825023, 0.        , 0.25661241],
       [0.25517287, 0.40680073, 0.        , 0.3380264 ],
       [0.29160739, 0.43306643, 0.        , 0.27532618],
       [0.23052236, 0.3841242 , 0.        , 0.38535344],
       [0.18530714, 0.49334535, 0.        , 0.32134751]])

Per NEP 19, the RandomState.dirichlet (aka np.random.dirichlet) won't be updated, so I'm closing this issue.

WayneHajas added the 00 - Bug label Nov 7, 2022

WayneHajas changed the title ~~BUG: <Please write a comprehensive title after the 'BUG: ' prefix>~~ BUG: numpy.random.Generator.dirichlet should accept zeros. Nov 7, 2022

WarrenWeckesser added the component: numpy.random label Nov 7, 2022

MatteoRaso mentioned this issue Mar 24, 2023

BUG: accept zeros on numpy.random dirichlet function #23440

Merged

WarrenWeckesser mentioned this issue Jul 18, 2023

BUG: random: dirichlet(alpha) can return nans in some cases. #24210

Closed

WarrenWeckesser closed this as completed Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: numpy.random.Generator.dirichlet should accept zeros. #22547

BUG: numpy.random.Generator.dirichlet should accept zeros. #22547

WayneHajas commented Nov 7, 2022

rkern commented Nov 7, 2022

Uh oh!

MatteoRaso commented Nov 17, 2022

Uh oh!

rkern commented Nov 17, 2022

Uh oh!

WayneHajas commented Nov 21, 2022 via email

Uh oh!

pcralmeida commented Mar 19, 2023

Uh oh!

WarrenWeckesser commented Aug 14, 2023

Uh oh!

Uh oh!

BUG: numpy.random.Generator.dirichlet should accept zeros. #22547

BUG: numpy.random.Generator.dirichlet should accept zeros. #22547

Comments

WayneHajas commented Nov 7, 2022

Describe the issue:

Reproduce the code example:

Error message:

NumPy/Python version information:

Context for the issue:

rkern commented Nov 7, 2022

Uh oh!

MatteoRaso commented Nov 17, 2022

Uh oh!

rkern commented Nov 17, 2022

Uh oh!

WayneHajas commented Nov 21, 2022 via email

Uh oh!

pcralmeida commented Mar 19, 2023

Uh oh!

WarrenWeckesser commented Aug 14, 2023

Uh oh!