Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: nanmin/nanmax with object dtype comparing nans #28839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rhshadrach opened this issue Apr 27, 2025 · 5 comments
Open

BUG: nanmin/nanmax with object dtype comparing nans #28839

rhshadrach opened this issue Apr 27, 2025 · 5 comments
Labels

Comments

@rhshadrach
Copy link

rhshadrach commented Apr 27, 2025

Describe the issue:

When using np.nanmin or np.nanmax with object dtype, it appears nan values are being compared to non-nan values. With the docstring

Return minimum of an array or minimum along an axis, ignoring any NaNs

I would expect no comparison here to be made and the operation successful. However I wonder if there could be pragmatic reasons to not want to support this operation.

This could be related to the following issues/PR, but it wasn't clear to me.

Reproduce the code example:

import datetime
import numpy as np

values = np.array([np.nan, datetime.date(2020, 1, 2)])
np.nanmin(values)
# TypeError: '<=' not supported between instances of 'float' and 'datetime.date'

Error message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[12], line 5
      2 import datetime
      4 values = np.array([np.nan, datetime.date(2020, 1, 2)])
----> 5 np.nanmin(values)
      6 # TypeError: '<=' not supported between instances of 'float' and 'datetime.date'

File [~/dev/venvs/pandas/lib/python3.12/site-packages/numpy/lib/_nanfunctions_impl.py:364](http://localhost:8888/home/richard/dev/venvs/pandas/lib/python3.12/site-packages/numpy/lib/_nanfunctions_impl.py#line=363), in nanmin(a, axis, out, keepdims, initial, where)
    361 else:
    362     # Slow, but safe for subclasses of ndarray
    363     a, mask = _replace_nan(a, +np.inf)
--> 364     res = np.amin(a, axis=axis, out=out, **kwargs)
    365     if mask is None:
    366         return res

File [~/dev/venvs/pandas/lib/python3.12/site-packages/numpy/_core/fromnumeric.py:3319](http://localhost:8888/home/richard/dev/venvs/pandas/lib/python3.12/site-packages/numpy/_core/fromnumeric.py#line=3318), in amin(a, axis, out, keepdims, initial, where)
   3306 @array_function_dispatch(_min_dispatcher)
   3307 def amin(a, axis=None, out=None, keepdims=np._NoValue, initial=np._NoValue,
   3308          where=np._NoValue):
   3309     """
   3310     Return the minimum of an array or minimum along an axis.
   3311 
   (...)
   3317     ndarray.min : equivalent method
   3318     """
-> 3319     return _wrapreduction(a, np.minimum, 'min', axis, None, out,
   3320                           keepdims=keepdims, initial=initial, where=where)

File [~/dev/venvs/pandas/lib/python3.12/site-packages/numpy/_core/fromnumeric.py:86](http://localhost:8888/home/richard/dev/venvs/pandas/lib/python3.12/site-packages/numpy/_core/fromnumeric.py#line=85), in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     83         else:
     84             return reduction(axis=axis, out=out, **passkwargs)
---> 86 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

TypeError: '<=' not supported between instances of 'float' and 'datetime.date'

Python and NumPy Versions:

NumPy: 2.2.5
Python: 3.12.3 (main, Feb 4 2025, 14:48:35) [GCC 13.3.0]

Runtime Environment:

[{'numpy_version': '2.2.5',
  'python': '3.12.3 (main, Feb  4 2025, 14:48:35) [GCC 13.3.0]',
  'uname': uname_result(system='Linux', node='brokenglass', release='6.11.0-24-generic', version='#24~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 25 20:14:34 UTC 2', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Haswell',
  'filepath': '/home/richard/dev/venvs/pandas/lib/python3.12/site-packages/numpy.libs/libscipy_openblas64_-6bb31eeb.so',
  'internal_api': 'openblas',
  'num_threads': 32,
  'prefix': 'libscipy_openblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.28'}]

Context for the issue:

This came up in pandas-dev/pandas#61204 (comment)

@seberg
Copy link
Member

seberg commented May 10, 2025

However I wonder if there could be pragmatic reasons to not want to support this operation.

The reason is that we would have to define "isnan" for arbitrary objects in some sane way. I am actually pretty OK with defining it as something like not obj != obj, while I don't really like doing something like if isinstance(obj, float): return math.isnan(obj).

So if you like that idea, you may be able to help nudge NumPy towards doing that :).

(Of course not obj != obj may fail easily, e.g. if an array is included, but that would also fail now, just earlier.)

@seberg
Copy link
Member

seberg commented May 10, 2025

Hmmm, there may also be an option of defining the fmin/fmax loop itself for objects, via something like:

def fmin(lhs, rhs):
    if lhs <= rhs:
        return lhs
    elif rhs > lhs:
        return rhs
    elif rhs == rhs:
        return rhs
    else:
        return lhs

EDIT: Sorry, I realized that this still requires rhs == rhs or an equivalent lhs != lhs.

@rhshadrach
Copy link
Author

while I don't really like doing something like if isinstance(obj, float): return math.isnan(obj).

Curious what the reason is here - whether it's behavior or performance.

@seberg
Copy link
Member

seberg commented May 11, 2025

I just don't really like special casing for a potentially long list of types. Next thing we notice that we have to also check for complex, then decimal. And after adding all three, someone will open a bug report saying that pd.NA should also be skipped (or None, but that isn't NaN-like).

EDIT: There may be an argument that this special casing is less bad in fmin/fmax compared to isnan.

@rhshadrach
Copy link
Author

rhshadrach commented May 11, 2025

I would propose the following definition of isnan for object dtypes:

A value v in an object dtype array will result in True from np.isnan if and only if arr = np.array(v) is dimension 0, not object dtype, and np.isnan(arr) is True.

I would certainly be opposed to np.isnan returning True on pd.NA 😄

That said, I'm not opposed to your proposal of obj != obj. While it can be ill-performant on nested data or even fail, I think it would be an improvement over the status-quo.

Edit: One negative consequence of obj != obj that I expect could be be relevant to pandas users: currently bool(pd.NA != pd.NA) will raise, so any array with pd.NA in it will raise.

pandas-dev/pandas#38224

@charris charris changed the title BUG: nanmin/nanax with object dtype comparing nans BUG: nanmin/nanmax with object dtype comparing nans May 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants