Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: dtype changed from float64 to int64 in scipy discrete_rv #27054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
oscarbenjamin opened this issue Jul 26, 2024 · 10 comments
Closed

BUG: dtype changed from float64 to int64 in scipy discrete_rv #27054

oscarbenjamin opened this issue Jul 26, 2024 · 10 comments
Labels

Comments

@oscarbenjamin
Copy link

Describe the issue:

The issue is seen with the NumPy nightly wheels since a few days ago. I don't know what exactly the cause is because it is something happening inside SciPy but somehow an array ended up being a different dtype when passed through to the _pmf method shown below which causes it to fail with:

  File "/home/oscar/current/active/sympy/t.py", line 6, in _pmf
    return (2/3)*3**(1 - i)
                 ~^^~~~~~~~
ValueError: Integers to negative integer powers are not allowed.

Reproduce the code example:

from scipy.stats import rv_discrete

class rv_exponential(rv_discrete):
    def _pmf(self, i):
        print(i.dtype, i.shape)
        return (2/3)*3**(1 - i)

rv = rv_exponential(a=0.0, b=float('inf'))

print(rv.rvs())

Error message:

$ python t.py 
int64 (28,)
Traceback (most recent call last):
  File "/home/oscar/current/active/sympy/t.py", line 10, in <module>
    print(rv.rvs())
          ^^^^^^^^
  File "/home/oscar/.pyenv/versions/sympy-3.12.git/lib/python3.12/site-packages/scipy/stats/_distn_infrastructure.py", line 3430, in rvs
    return super().rvs(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oscar/.pyenv/versions/sympy-3.12.git/lib/python3.12/site-packages/scipy/stats/_distn_infrastructure.py", line 1108, in rvs
    vals = self._rvs(*args, size=size, random_state=random_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oscar/.pyenv/versions/sympy-3.12.git/lib/python3.12/site-packages/scipy/stats/_distn_infrastructure.py", line 1034, in _rvs
    Y = self._ppf(U, *args)
        ^^^^^^^^^^^^^^^^^^^
  File "/home/oscar/.pyenv/versions/sympy-3.12.git/lib/python3.12/site-packages/scipy/stats/_distn_infrastructure.py", line 1049, in _ppf
    return self._ppfvec(q, *args)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oscar/.pyenv/versions/sympy-3.12.git/lib/python3.12/site-packages/numpy/lib/_function_base_impl.py", line 2470, in __call__
    return self._call_as_normal(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oscar/.pyenv/versions/sympy-3.12.git/lib/python3.12/site-packages/numpy/lib/_function_base_impl.py", line 2463, in _call_as_normal
    return self._vectorize_call(func=func, args=vargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oscar/.pyenv/versions/sympy-3.12.git/lib/python3.12/site-packages/numpy/lib/_function_base_impl.py", line 2553, in _vectorize_call
    outputs = ufunc(*inputs)
              ^^^^^^^^^^^^^^
  File "/home/oscar/.pyenv/versions/sympy-3.12.git/lib/python3.12/site-packages/scipy/stats/_distn_infrastructure.py", line 3057, in _drv2_ppfsingle
    qb = self._cdf(b, *args)
         ^^^^^^^^^^^^^^^^^^^
  File "/home/oscar/.pyenv/versions/sympy-3.12.git/lib/python3.12/site-packages/scipy/stats/_distn_infrastructure.py", line 3396, in _cdf
    return self._cdfvec(k, *args)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oscar/.pyenv/versions/sympy-3.12.git/lib/python3.12/site-packages/numpy/lib/_function_base_impl.py", line 2470, in __call__
    return self._call_as_normal(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oscar/.pyenv/versions/sympy-3.12.git/lib/python3.12/site-packages/numpy/lib/_function_base_impl.py", line 2463, in _call_as_normal
    return self._vectorize_call(func=func, args=vargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oscar/.pyenv/versions/sympy-3.12.git/lib/python3.12/site-packages/numpy/lib/_function_base_impl.py", line 2553, in _vectorize_call
    outputs = ufunc(*inputs)
              ^^^^^^^^^^^^^^
  File "/home/oscar/.pyenv/versions/sympy-3.12.git/lib/python3.12/site-packages/scipy/stats/_distn_infrastructure.py", line 3392, in _cdf_single
    return np.sum(self._pmf(m, *args), axis=0)
                  ^^^^^^^^^^^^^^^^^^^
  File "/home/oscar/current/active/sympy/t.py", line 6, in _pmf
    return (2/3)*3**(1 - i)
                 ~^^~~~~~~~
ValueError: Integers to negative integer powers are not allowed.

Python and NumPy Versions:

This is seen with the NumPy nightly wheels since a few days ago.

With released NumPy 2.0.1 the code runs to completion with

$ python t.py 
float64 (11,)
float64 (1,)
float64 (6,)
float64 (3,)
float64 (2,)
0

Runtime Environment:

No response

Context for the issue:

This comes from a SciPy issue: scipy/scipy#21272 and a SymPy issue sympy/sympy#26862

@oscarbenjamin
Copy link
Author

Is it only possible to build numpy main with cython master right now?

I was getting this error until I installed the Cython nightly build:

          action(self, namespace, argument_values, option_string)
        File "/home/oscar/.pyenv/versions/3.12.0/envs/sympy-3.12.git/lib/python3.12/site-packages/Cython/Compiler/CmdLine.py", line 22, in __call__
          directives = Options.parse_directive_list(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/oscar/.pyenv/versions/3.12.0/envs/sympy-3.12.git/lib/python3.12/site-packages/Cython/Compiler/Options.py", line 533, in parse_directive_list
          raise ValueError('Unknown option: "%s"' % name)
      ValueError: Unknown option: "freethreading_compatible"
      [139/179] Compiling C object numpy/_core/lib_simd.dispatch.h_baseline.a.p/meson-generated__simd.dispatch.c.o
      [140/179] Compiling C object numpy/_core/lib_simd.dispatch.h_SSE42.a.p/meson-generated__simd.dispatch.c.o
      [141/179] Compiling C object numpy/_core/lib_simd.dispatch.h_AVX512_SKX.a.p/meson-generated__simd.dispatch.c.o
      ninja: build stopped: subcommand failed.
      [end of output]

@oscarbenjamin
Copy link
Author

Bisected to 6c91567 from gh-26766.

CC @mtsokol

@mtsokol
Copy link
Member

mtsokol commented Jul 28, 2024

I need to take a look what is going on in SciPy but I think SciPy relied on floor, ceil, and trunc output cast to float64 which IMO is incorrect, so I think it requires a fix on SciPy side perform explicit casting to float64.

@Siddharth-Latthe-07
Copy link

@oscarbenjamin The issue you're experiencing appears to be due to a change in behavior in the recent NumPy nightly builds. The error message indicates that the i array, which is passed to your _pmf method, has an integer data type (int64), causing an error when a negative integer power operation is attempted.
A possible workaround is to ensure that the i array is always treated as a float within the _pmf method.
Here is the updated _pmf method with this fix:

from scipy.stats import rv_discrete
import numpy as np

class rv_exponential(rv_discrete):
    def _pmf(self, i):
        i = np.asarray(i, dtype=float)  # Ensure `i` is a float array
        print(i.dtype, i.shape)
        return (2/3)*3**(1 - i)

rv = rv_exponential(a=0.0, b=float('inf'))

print(rv.rvs())

Hope this helps
Thanks

@oscarbenjamin
Copy link
Author

The workaround is not a trivial as it might seem. In context the code here is generated by a codeprinter from a symbolic expression in SymPy:

In [27]: from sympy import *

In [28]: i = Symbol('i')

In [29]: e = (S(2)/3)**3 * 3**(1 - i)

In [30]: e
Out[30]: 
   1 - i
8β‹…3     
────────
   27   

In [31]: f = lambdify(i, e)

In [32]: f
Out[32]: <function _lambdifygenerated(i)>

In [34]: print(f.__doc__)
Created with lambdify. Signature:

func(i)

Expression:

8*3**(1 - i)/27

Source code:

def _lambdifygenerated(i):
    return (8/27)*3**(1 - i)


Imported modules:

The code printers would have to be modified to handle this somehow and there would need to be some UI for a caller of lambdify to say if they want this to happen or not.

In any case the first question is really whether it should be expected that the array would have an integer type at all. For the _pmf method it is expected that the argument values will be integers but apart from degenerate cases with 0 or 1 the function is basically guaranteed to return non-integer values so I don't think it makes sense to use an integer array here (although others may disagree). So far this change to integer type does not seem to be intentional in SciPy's case.

@Siddharth-Latthe-07
Copy link

Given that the _pmf method is expected to handle integer inputs, but the function generated by lambdify from SymPy uses floating-point arithmetic, there is a need to bridge the gap between the expectations of integer inputs and the resulting floating-point computations.
To address the issue without having to manually convert the array in every instance of _pmf, we can take a two-pronged approach:

  1. Modify the _pmf method to handle the type conversion transparently.
  2. Investigate and consider reporting the potential unintended change in SciPy to maintain consistency.

Transparent Type Conversion in _pmf:
Here's how you can modify the _pmf method to ensure that it handles the input conversion internally. This will make sure that the method operates correctly regardless of whether the input is an integer array or not.

from scipy.stats import rv_discrete
import numpy as np

class rv_exponential(rv_discrete):
    def _pmf(self, i):
        i = np.asarray(i, dtype=float)  # Ensure `i` is a float array
        print(i.dtype, i.shape)
        return (2/3) * 3 ** (1 - i)

rv = rv_exponential(a=0.0, b=float('inf'))

print(rv.rvs())

While modifying the code printer in SymPy or providing a user interface in lambdify for type handling would be a more extensive change, the immediate issue can be mitigated by ensuring that the _pmf method correctly handles type conversion. This approach is less intrusive and maintains compatibility with existing code.

@oscarbenjamin
Copy link
Author

@Siddharth-Latthe-07 I didn't want to say anything at first but your comments are clearly generated by an LLM with minimal editing and are not relevant to the discussion in this issue.

It might be reasonable for ChatGPT to offer this kind of advice for a novice user who wants to fix some simple code but that is not the situation here. I don't need ChatGPT to tell me how to modify the repro code to avoid this issue: I wrote that code deliberately so that it would demonstrate the issue.

There has been a change in NumPy which now means that SymPy's usage of a SciPy function does not work any more. A change is now needed in at least one of NumPy, SciPy or SymPy and considering what should be changed or not is the purpose of this and the related SciPy and SymPy issues. The first thing that I want to establish is if NumPy and SciPy are going to keep this changed behaviour.

@mattip
Copy link
Member

mattip commented Jul 29, 2024

@Siddharth-Latthe-07 any more of these useless comments and we will need to report you. See similar comments here and here. Many of us follow the git firehose for this repo, and do not appreciate getting spammed.

@lucascolley
Copy link
Contributor

Feel free to close this issue, indeed SciPy just needs to adapt to the changed type promotion of floor πŸ‘

@oscarbenjamin
Copy link
Author

Thanks, I'll close this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants