Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: numpy.ma.min/max fails for uint and float16 dtypes #27584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

fengluoqiuwu
Copy link
Contributor

@fengluoqiuwu fengluoqiuwu commented Oct 17, 2024

Delete inf check.
Due to in the doc it says that we should use the output of minimun_fill_value, which is inf as dytype is floatingpoint.
Fixes #27580

@eendebakpt
Copy link
Contributor

@fengluoqiuwu I am not sure this is the right fix (changing the return type of default_fill_value seems like a viable option as well). But could you:

  • Update the description in the first post to match the commits
  • Add a unit test for the case that is solved by this PR

Thanks!

@fengluoqiuwu
Copy link
Contributor Author

fengluoqiuwu commented Oct 17, 2024

@eendebakpt

Apologies for the confusion. I updated the fix with another way but forgot to change the description accordingly. I'll ensure the details match next time.

I'll add the unit test now to cover the case solved by this PR. Thanks for the suggestion!

@ngoldbaum
Copy link
Member

I agree that using unsafe casting here is incorrect.

@fengluoqiuwu
Copy link
Contributor Author

Unsigned int works with the patch below, but np.float16 still fails while np.float32 works as expected. It's a bit puzzling, and I'm currently working on resolving the issue.

@fengluoqiuwu
Copy link
Contributor Author

Considering that masked arrays in NumPy use a mask to ignore certain values during calculations, do we really need to be concerned about the actual fill value within the array when the mask is applied, as properties fill_value will store the value else where? Specifically, does the fill value impact any operations or computations for the masked elements?

It seems that I removed the code handling inf values, and the tests of ma still ran without issues. When using data type which maximum is smaller than the default value, and we can't fill the masked array with default value. For example, it can't fill with 1.e20 to masked array with dtype=np.float16, so what should we actual fill to the masked array if getting rid of inf is necessary?

@fengluoqiuwu
Copy link
Contributor Author

I agree with you. Using unsafe casting here doesn't seem right. It might trigger a runtime warning with dtype=np.float16, whereas the original approach would likely raise an error.

@fengluoqiuwu
Copy link
Contributor Author

fengluoqiuwu commented Oct 18, 2024

As mentioned in the documentation, the masked array should be filled with the result of the specific function. Therefore, I would prefer not to include the inf check, as it could alter the fill value. I will open a new issue to discuss what value should be used when encountering inf or values outside the expected range.

@fengluoqiuwu fengluoqiuwu changed the title BUG: Fix bug-27580 BUG: numpy.ma.min/max fails for uint and float16 dtypes Oct 19, 2024
@eendebakpt
Copy link
Contributor

The PR addresses the issue by using _check_fill_value at the place where the value becomes problematic. But would it perhaps be better to change the fill value?

Currently:

import numpy as np
from numpy.ma.core import default_fill_value, _check_fill_value
value = default_fill_value(np.dtype(np.uint8)) # is 999999. should it be 63?
_check_fill_value(value, np.uint8) # array(63, dtype=uint8)
_check_fill_value(None, np.uint8) # is 999999. should be array(63, dtype=uint8)?

I would expect _check_fill_value(None, dtype) and _check_fill_value(default_fill_value(dtype), dtype) to be the same. And default_fill_value(dtype) to return a fill value that is valid for the specified dtype. Maybe changing these will cause too much trouble elsewhere (I have not checked that).

@fengluoqiuwu
Copy link
Contributor Author

The PR addresses the issue by using _check_fill_value at the place where the value becomes problematic. But would it perhaps be better to change the fill value?

Currently:

import numpy as np
from numpy.ma.core import default_fill_value, _check_fill_value
value = default_fill_value(np.dtype(np.uint8)) # is 999999. should it be 63?
_check_fill_value(value, np.uint8) # array(63, dtype=uint8)
_check_fill_value(None, np.uint8) # is 999999. should be array(63, dtype=uint8)?

I would expect _check_fill_value(None, dtype) and _check_fill_value(default_fill_value(dtype), dtype) to be the same. And default_fill_value(dtype) to return a fill value that is valid for the specified dtype. Maybe changing these will cause too much trouble elsewhere (I have not checked that).

I actually agree with this approach, as the original design causes many issues and bugs. I would prefer to take a more aggressive approach, which is to enforce that the fill value matches the dtype of the array. However, I think it would be better to address these issues along with other design problems in MaskedArray in the same version to avoid making multiple changes to its behavior. (The more aggressive approach would be to refactor the current MaskedArray implementation, as I’ve found that many bugs due to design flaws cannot be easily fixed.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Awaiting a code review
Development

Successfully merging this pull request may close these issues.

BUG: numpy.ma.min fails for uint dtypes
4 participants