Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: Silencing errors of filled_value's casting in array_finalize in MaskedArray #27596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

fengluoqiuwu
Copy link
Contributor

@fengluoqiuwu fengluoqiuwu commented Oct 19, 2024

Silencing errors related to the casting of fill_value in __array_finalize__ of MaskedArray, and using a default value if the original fill_value cannot be cast even with the method _check_fill_value.

While this approach may introduce some level of risk, it is reasonable to assume that if users set the fill_value themselves, they are aware that it may lead to undefined behavior with specific type casting. It is unfortunate that the casting of masked_array types is hindered simply because we assigned fill_value with the default value indirectly. This not only causes issues with methods that use dtype= directly, but also impedes any calls to ufunc that return a dtype different from the original dtype (since calls to the ufunc won't be applied to the stored fill_value), exacerbating the situation.

I believe this is the most effective way to address these issues, rather than designing a new ufunc for masked_array or marking whether the value was set by the user or not.
fixes #27165

@melissawm melissawm added the component: numpy.ma masked arrays label Oct 24, 2024
@seberg
Copy link
Member

seberg commented Oct 30, 2024

@fengluoqiuwu two things here:

  1. It would be nice to explain where the TypeError comes from. As a reviewer if I accept that catching the TypeError may be the best thing (probably also OverflowError?) the next thing will be to ask whether this is the right point.
  2. We really can't merge even an obvious bug fix without a test and a test is also very helpful for reviewers.

@fengluoqiuwu
Copy link
Contributor Author

It would be nice to explain where the TypeError comes from.

The TypeError is raised by the _check_fill_value method, which is converted various errors we want to catch, such as OverflowError, into TypeError.

We really can't merge even an obvious bug fix without a test and a test is also very helpful for reviewers.

I will add it later.

@seberg
Copy link
Member

seberg commented Nov 4, 2024

Thanks! Looking a bit closer, I suspect this is right and we could probably just do it. We don't know that the dtype matches, and we have to just fall back to the default when the "inherited" fill-value fails (this is part of __array_finalize__).

The reason I ask about where it comes from is that it may better to handle this in _check_fill_value itself. I see a similar branch in one place in compare where it catches (TypeError, ValueError).

In this case, I actually like if we could audit/clean up things a bit more. That probably means pushing this logic into _check_fill_value but adding a fallback=False argument to it.
(From which I hope two things. First, make it more obvious why the try/except is there. And second, make sure we add this logic in all places where it should be used and not just the one where it happened to be noticed.)

I think in most places probably we should fall back to None. But when a user does arr.fill_value = fill_value, I don't think we should (it is the users job to pass a reasonable fill value).

(It is unfortunate, that this was always weird, but the integer casting is now floating up these issues...)

@fengluoqiuwu
Copy link
Contributor Author

I think in most places probably we should fall back to None.

I think using the default value is better. In the comment for _check_fill_value, it says that it will only return an ndarray, so the return value won’t be checked before use. Additionally, in _comparison, it uses the default value, which is the same situation as this one.

seberg added a commit to seberg/numpy that referenced this pull request Nov 20, 2024
This is the most minimal fix I could think off... If we have uints
still use the out-of-bounds value, but make it uint.  This is because
the code uses copyto with the default same-kind casting in places...

This doesn't remove all the other quirks, and surely, it is a behavior
change in some awkward situations (since you got a uint, someone might
mix the newly uint value with an integer and get float64 results, etc.)

But... it is by far the most minimal fix I could think of after a longer
time.

A more thorough fix may be to just always store the exact dtype, but
it would propagate differently e.g. when casting.  A start for this
is currently in numpygh-27596.

Closes numpygh-27269, numpygh-27580
seberg added a commit to seberg/numpy that referenced this pull request Nov 20, 2024
This is the most minimal fix I could think off... If we have uints
still use the out-of-bounds value, but make it uint.  This is because
the code uses copyto with the default same-kind casting in places...

This doesn't remove all the other quirks, and surely, it is a behavior
change in some awkward situations (since you got a uint, someone might
mix the newly uint value with an integer and get float64 results, etc.)

But... it is by far the most minimal fix I could think of after a longer
time.

A more thorough fix may be to just always store the exact dtype, but
it would propagate differently e.g. when casting.  A start for this
is currently in numpygh-27596.

Closes numpygh-27269, numpygh-27580
ArvidJB pushed a commit to ArvidJB/numpy that referenced this pull request Jan 8, 2025
This is the most minimal fix I could think off... If we have uints
still use the out-of-bounds value, but make it uint.  This is because
the code uses copyto with the default same-kind casting in places...

This doesn't remove all the other quirks, and surely, it is a behavior
change in some awkward situations (since you got a uint, someone might
mix the newly uint value with an integer and get float64 results, etc.)

But... it is by far the most minimal fix I could think of after a longer
time.

A more thorough fix may be to just always store the exact dtype, but
it would propagate differently e.g. when casting.  A start for this
is currently in numpygh-27596.

Closes numpygh-27269, numpygh-27580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Awaiting a code review
Development

Successfully merging this pull request may close these issues.

BUG: inconsistent fill_value casting for masked arrays
3 participants