-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: Masked array default fill value can overflow #25677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If a specific fill value is absent in You can see that for any kind of float (float16, float32, float64) the same fill value is being used. For a fill value of "1.e20" you always get the runtime error while typecasting (say to float16). I tried the below code to reproduce that issue: One workaround is to define fill values to each specific dtype. Another solution I tried that worked was to replace default value of float to 'inf', which did not raise any issue for any specific float dtypes. |
The main issue here isn't so much the implementation of changing the default fill value, or even coming up with better default fill values, it's whether or not changing this will break user code. The default fill value is used in a number of places implicitly in operations so it's not clear to me if changing the default will have unintended consequences elsewhere. |
Since the fill value is not determined by the user, we should be able to change the default values without major code breakage. But there are chances of user code that depends on the default values (can't think of an example right off the bat). Maybe change and document it? |
Also this warning is not seen in earlier versions of Numpy (I tried with v1.23.5). So it can be changed, although I need to verify how it's being handled in earlier versions. |
Hello, I just want to add to the discussion that this issue is causing real problems in some cases. Since 2.0 does not accept out of bounds python ints this causes problems when serializing and deserializing masked arrays with default fill values. For example when multiprocessing with dask, it serializes the masked array by parts and deserializes the fill value to python integer causing an overflow exception when rebuilding the masked array. I can't tell what would be the sane default for fill_value but having something that is well behaved under any circumstance would be ideal. Hope this helps and thank you! |
Hi, I don't think this interaction with dask has been reported. Can you make a short runnable example that demonstrates the issue? |
Sure,
using: |
ping @seberg in case you weren't aware of this interaction between dask, masked arrays, and the NEP 50 changes Maybe we should use the minimum or maximum int as the default fill values for ints that are too small to store 999999? Or whatever value it overflows to that we were accidentally using before? |
I was faintly aware, but not that it was an issue here. A solution may be to make sure that the default fill value is typed to e.g. int64. That way NumPy will keep force-casting it. Not 100% sure if that has other fallout, but Ithink the fill-value is always an array so it should be fine. |
The solution we apply goes in that line: we force-cast the def f():
arr = np.ma.masked_array([1,1], [0,1], dtype=np.uint8)
arr.fill_value = arr.fill_value
return arr fill_value will only overflow if is the default value so It does the job avoiding serialization errors because it overflows in the setter, becoming 63~ and then it is safe to move it around. Not the prettiest thing I must say but it imitates the old (bad) behavior. Luckily in our case we can live with a trashy |
With type >>> check = np.ma.core._check_fill_value
>>> test = check(None,np.float16)
>>> test
array(inf, dtype=float16)
>>> test = check(test,np.float16)
C:\Users\Eden_\Desktop\Coder\MachineLearning\my_numpy\numpy\numpy\ma\core.py:492: RuntimeWarning: overflow encountered in cast fill_value = np.asarray(fill_value, dtype=ndtype)
>>> test
array(inf, dtype=float16) |
This also breaks aggregation operations for masked arrays with small this raises an error import numpy as np
x = np.arange(10, dtype="uint8")
a = np.ma.MaskedArray(x, x & 1, fill_value=111)
a.max(keepdims=True) Due to Lines 6082 to 6089 in 5047f7b
Essentially I think it should be result = self.filled(fill_value).max(
axis=axis, out=out, **kwargs).view(type(self))
result.fill_value = self.fill_value |
@Kirill888 there is a hot-fix that will be in 2.1.x (soon) and 2.2. (also not far future). Although, it would be nice to figure out a clean way to handle this. Maybe something like: unless explicitly set don't usually propagate mask values (right now "explicitly set" is not defined, because the value is cached and thus considered "set" on random operations). |
Describe the issue:
For both signed and unsigned integers the default fill value is 99999, while for floats it is 1e20.
This is problematic for (u)int[8,16] as well as half floats, which do not contain the default fill value in their valid range.
Reproduce the code example:
Error message:
Python and NumPy Versions:
Numpy 2.0 dev on python 3.11.
Runtime Environment:
N/A
Context for the issue:
I don't think this is an urgent but didn't see an issue describing this behavior so I'm filing this for future searchers.
That said, this does complicate the NEP 50 implementation because we need to have a number of workarounds so that this continues to work. I think it would be better to choose a default fill value that fits in the range of the data (
[i,f]info.max
?) for these types, but I have no idea what that entails for backward compatibility.The text was updated successfully, but these errors were encountered: