BUG: Masked array default fill value can overflow #25677

ngoldbaum · 2024-01-24T15:47:52Z

Describe the issue:

For both signed and unsigned integers the default fill value is 99999, while for floats it is 1e20.

This is problematic for (u)int[8,16] as well as half floats, which do not contain the default fill value in their valid range.

Reproduce the code example:

>>> arr = np.ma.array([1, 2, 3], mask=[1, 0, 1], dtype=np.int8)
>>> arr.filled()
array([63,  2, 63], dtype=int8)

>>> arr = np.ma.array([1, 2, 3], mask=[1, 0, 1], dtype=np.float16)
>>> arr.filled()
/Users/goldbaum/Documents/numpy/build-install/usr/lib/python3.11/site-packages/numpy/ma/core.py:3873: RuntimeWarning: overflow encountered in cast
  np.copyto(result, fill_value, where=m)
Out[16]: array([inf,  2., inf], dtype=float16)

Error message:

N/A

Python and NumPy Versions:

Numpy 2.0 dev on python 3.11.

Runtime Environment:

N/A

Context for the issue:

I don't think this is an urgent but didn't see an issue describing this behavior so I'm filing this for future searchers.

That said, this does complicate the NEP 50 implementation because we need to have a number of workarounds so that this continues to work. I think it would be better to choose a default fill value that fits in the range of the data ([i,f]info.max?) for these types, but I have no idea what that entails for backward compatibility.

The text was updated successfully, but these errors were encountered:

msavinash · 2024-01-25T18:01:41Z

If a specific fill value is absent in filled, it gets the default fill value for that dtype. The catch is that it gets the default fill value for that "kind" of dtype. below code from numpy.ma core code is given for reference:
default_filler = {'b': True, 'c': 1.e20 + 0.0j, 'f': 1.e20, 'i': 999999, 'O': '?', 'S': b'N/A', 'u': 999999, 'V': b'???', 'U': 'N/A' }

You can see that for any kind of float (float16, float32, float64) the same fill value is being used. For a fill value of "1.e20" you always get the runtime error while typecasting (say to float16). I tried the below code to reproduce that issue:
converted_value = np.array(1e+20, dtype=np.float64).astype(np.float16)

One workaround is to define fill values to each specific dtype. Another solution I tried that worked was to replace default value of float to 'inf', which did not raise any issue for any specific float dtypes.
fill_value = np.inf

ngoldbaum · 2024-01-25T18:19:36Z

The main issue here isn't so much the implementation of changing the default fill value, or even coming up with better default fill values, it's whether or not changing this will break user code. The default fill value is used in a number of places implicitly in operations so it's not clear to me if changing the default will have unintended consequences elsewhere.

msavinash · 2024-01-25T19:54:19Z

Since the fill value is not determined by the user, we should be able to change the default values without major code breakage. But there are chances of user code that depends on the default values (can't think of an example right off the bat). Maybe change and document it?

msavinash · 2024-01-25T21:51:46Z

Also this warning is not seen in earlier versions of Numpy (I tried with v1.23.5). So it can be changed, although I need to verify how it's being handled in earlier versions.

BielStela · 2024-07-18T09:44:43Z

Hello, I just want to add to the discussion that this issue is causing real problems in some cases. Since 2.0 does not accept out of bounds python ints this causes problems when serializing and deserializing masked arrays with default fill values. For example when multiprocessing with dask, it serializes the masked array by parts and deserializes the fill value to python integer causing an overflow exception when rebuilding the masked array. I can't tell what would be the sane default for fill_value but having something that is well behaved under any circumstance would be ideal.

Hope this helps and thank you!

ngoldbaum · 2024-07-18T12:20:03Z

Hi, I don't think this interaction with dask has been reported. Can you make a short runnable example that demonstrates the issue?

BielStela · 2024-07-18T12:51:27Z

Sure,

In [1]: import numpy as np
   ...: import dask
   ...: import dask.distributed

In [2]: client = dask.distributed.Client()

In [3]: def f():
   ...:     return np.ma.masked_array([1,1], [0,1], dtype=np.uint8)
   ...:

In [4]: task = dask.delayed(f)()

In [5]: client.compute(task).result()
2024-07-18 14:48:24,570 - distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
  File ".venv/lib/python3.12/site-packages/numpy/ma/core.py", line 489, in _check_fill_value
    fill_value = np.asarray(fill_value, dtype=ndtype)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: Python integer 999999 out of bounds for uint8

using:
Python 3.12.4
numpy 2.0.0
dask 2024.7.0

ngoldbaum · 2024-07-18T18:48:47Z

ping @seberg in case you weren't aware of this interaction between dask, masked arrays, and the NEP 50 changes

Maybe we should use the minimum or maximum int as the default fill values for ints that are too small to store 999999? Or whatever value it overflows to that we were accidentally using before?

seberg · 2024-07-19T05:26:45Z

I was faintly aware, but not that it was an issue here. A solution may be to make sure that the default fill value is typed to e.g. int64. That way NumPy will keep force-casting it. Not 100% sure if that has other fallout, but Ithink the fill-value is always an array so it should be fine.

BielStela · 2024-07-21T15:24:13Z

The solution we apply goes in that line: we force-cast the fill_value with numpy with something like this to force a controlled overflow

def f():
    arr = np.ma.masked_array([1,1], [0,1], dtype=np.uint8)
    arr.fill_value = arr.fill_value
    return arr

fill_value will only overflow if is the default value so It does the job avoiding serialization errors because it overflows in the setter, becoming 63~ and then it is safe to move it around. Not the prettiest thing I must say but it imitates the old (bad) behavior. Luckily in our case we can live with a trashy fill_value since we never use it to fill the array.

fengluoqiuwu · 2024-10-20T12:39:09Z

With type np.uint8, default fill value can be fixed by calling _check_fill_value to force casting its type.
But np.float16 can't it will always get a overflow runtime warning when use the default_value.
And another issue is, _check_fill_value is define to get a (valid) fill value of a certain type, but it doesn't with np.float16.

>>> check = np.ma.core._check_fill_value
>>> test = check(None,np.float16)
>>> test
array(inf, dtype=float16)
>>> test = check(test,np.float16)
C:\Users\Eden_\Desktop\Coder\MachineLearning\my_numpy\numpy\numpy\ma\core.py:492: RuntimeWarning: overflow encountered in cast  fill_value = np.asarray(fill_value, dtype=ndtype)
>>> test
array(inf, dtype=float16)

Kirill888 · 2024-12-03T12:17:55Z

This also breaks aggregation operations for masked arrays with small dtype:

this raises an error

import numpy as np
x = np.arange(10, dtype="uint8")
a = np.ma.MaskedArray(x, x & 1, fill_value=111)
a.max(keepdims=True)

Due to fill_value being too large, error is raised from here (line 6089):

numpy/numpy/ma/core.py

Lines 6082 to 6089 in 5047f7b

    
           result = self.filled(fill_value).max( 
        
               axis=axis, out=out, **kwargs).view(type(self)) 
        
           if result.ndim: 
        
               # Set the mask 
        
               result.__setmask__(newmask) 
        
               # Get rid of Infs 
        
               if newmask.ndim: 
        
                   np.copyto(result, result.fill_value, where=newmask)

Essentially result array is produced with .view(type(self)), which ends up losing fill_value=111 parameter, so gets replaced by 99999, then it raises an error from np.copy as you can't fit that.

I think it should be

result = self.filled(fill_value).max( 
     axis=axis, out=out, **kwargs).view(type(self))
result.fill_value = self.fill_value

seberg · 2024-12-03T12:31:16Z

@Kirill888 there is a hot-fix that will be in 2.1.x (soon) and 2.2. (also not far future).

Although, it would be nice to figure out a clean way to handle this. Maybe something like: unless explicitly set don't usually propagate mask values (right now "explicitly set" is not defined, because the value is cached and thus considered "set" on random operations).

ngoldbaum added the 00 - Bug label Jan 24, 2024

20revsined mentioned this issue Apr 16, 2024

Possible Solutions to Issues 26216, 25677, and 25589 #26276

Closed

fengluoqiuwu mentioned this issue Oct 20, 2024

DISCUSS: About issue of masked array #27588

Open

remi-braun mentioned this issue Nov 27, 2024

'save_cog_with_dask' fails with numpy > 2 opendatacube/odc-geo#189

Closed

Kirill888 mentioned this issue Dec 3, 2024

Masked Array reductions and numpy 2+ dask/dask#11574

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Masked array default fill value can overflow #25677

BUG: Masked array default fill value can overflow #25677

ngoldbaum commented Jan 24, 2024

msavinash commented Jan 25, 2024 •

edited

Loading

ngoldbaum commented Jan 25, 2024 •

edited

Loading

msavinash commented Jan 25, 2024

msavinash commented Jan 25, 2024

BielStela commented Jul 18, 2024

ngoldbaum commented Jul 18, 2024

BielStela commented Jul 18, 2024 •

edited

Loading

ngoldbaum commented Jul 18, 2024

seberg commented Jul 19, 2024

BielStela commented Jul 21, 2024 •

edited

Loading

fengluoqiuwu commented Oct 20, 2024

Kirill888 commented Dec 3, 2024 •

edited

Loading

seberg commented Dec 3, 2024

BUG: Masked array default fill value can overflow #25677

BUG: Masked array default fill value can overflow #25677

Comments

ngoldbaum commented Jan 24, 2024

Describe the issue:

Reproduce the code example:

Error message:

Python and NumPy Versions:

Runtime Environment:

Context for the issue:

msavinash commented Jan 25, 2024 • edited Loading

ngoldbaum commented Jan 25, 2024 • edited Loading

msavinash commented Jan 25, 2024

msavinash commented Jan 25, 2024

BielStela commented Jul 18, 2024

ngoldbaum commented Jul 18, 2024

BielStela commented Jul 18, 2024 • edited Loading

ngoldbaum commented Jul 18, 2024

seberg commented Jul 19, 2024

BielStela commented Jul 21, 2024 • edited Loading

fengluoqiuwu commented Oct 20, 2024

Kirill888 commented Dec 3, 2024 • edited Loading

seberg commented Dec 3, 2024

msavinash commented Jan 25, 2024 •

edited

Loading

ngoldbaum commented Jan 25, 2024 •

edited

Loading

BielStela commented Jul 18, 2024 •

edited

Loading

BielStela commented Jul 21, 2024 •

edited

Loading

Kirill888 commented Dec 3, 2024 •

edited

Loading