-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
DISCUSS: About issue of masked array #27588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As I delve deeper into the masked array code, my concerns continue to grow. There are numerous issues within it that could lead to serious consequences. Recently, I've been attempting to fix some bugs in this module; however, I find that the repairs I implement often merely suppress the bugs rather than eliminate them at their source. To truly eradicate these bugs, we must first establish appropriate standards and then test and modify all methods to ensure they comply with these standards. This is an incredibly challenging task, particularly when it involves altering the API. I believe that addressing these underlying issues is crucial for the long-term stability and usability of the masked array functionality. Without a thorough examination and a systematic approach to standardization, we may continue to face recurring problems that undermine the integrity of this component. |
Any suggestions regarding the masked array section would be greatly appreciated. I am also seeking to collect all related bug reports that have been encountered. This month, I will be organizing the issues related to the |
I think that the fundamental issue with masked arrays is that Therefore, I suspect it is fundamentally impossible to ensure that |
I understand that it's not feasible to test all NumPy methods comprehensively for every scenario. However, I believe that, within the current architecture of masked arrays, it would still be valuable to focus on testing certain key areas. For instance, Additionally, I think it’s important to verify that the functions in Particularly, I believe that within the current masked array, we might want to focus on testing edge cases like That said, I do think it's important to ensure that most methods work reliably under most conditions, and that at least in a certain version, we have a set of tests that have passed. |
To throw some oil on the fire; |
You may be interested in If there are other features of NumPy (beyond the Standard) that are needed, consider suggesting that they be added to the standard (if they are not efficient or easy to write in terms of Standard functionality). If that doesn't work, we could considering library-specific extensions. |
Thank you very much!
It seems like it is not utilizing acceleration. Moreover, the time difference appears to grow with the increase in dimensionality. |
Interesting! Well that's clearly something to fix. Fortunately it's just code, so there's always a way. If I were to guess, the slowdown could be due to use of BTW the syntax can be a bit more convenient:
Explicit use of |
mdhaber/marray#86 should mostly address this. Still not quite as fast as NumPy masked array (which has performance surprisingly close to regular array), but it was what I could do before heading to work! |
I was wondering how even after mdhaber/marray#86, NumPy's masked array Simple: ignore the mask. import numpy as np
A = np.ma.masked_array(np.ones((2, 2)), ~np.eye(2, dtype=bool))
print(A)
[[1.0 --]
[-- 1.0]]
print(A @ A)
# [[2.0 --]
# [-- 2.0]] It seems to just Using |
While reading the code and addressing some bugs related to
numpy/ma
, I encountered a few questions:Filling Values in Masked Arrays:
Do we actually care about the exact fill value in masked arrays, given that they are masked by other values? If not, I believe I can resolve bug 27580 by simply removing the check for
inf
.Default Fill Value for Masked Arrays:
Currently, the default fill value for masked arrays is defined by
default_filler
for Python data types. However, Python doesn’t have unsigned integer types, so fornp.uint
arrays, the default fill value is stored asnp.int64(999999)
. This causes issues in operations likecopyto(..., casting='samekind')
, as seen in bug 27580 and bug 27269. Should we consider using NumPy data types for the default fill value to ensure that the fill value matches the data type of the array (e.g., using a fill value that corresponds to integers or unsigned integers as appropriate)?Large Default Fill Values:
Some default fill values seem quite large, such as
999999
fornp.int8
and1.e20
fornp.float16
. What would be an appropriate default fill value for masked arrays, particularly for small data types likeint8
andfloat16
? (bug25677)Reviewing
copyto
in Masked Arrays:Should we perform a comprehensive review of
copyto
functionality for masked arrays? It seems likely that similar bugs could exist due to the same root cause.Testing for Small Data Types:
Should we extend the test suite to include small data types (e.g.,
int8
andfloat16
) to ensure that functions handle these cases correctly?Checking Method Consistency
Should we check the consistency of method between (no-masked)
masked array
andndarray
? There is some difference between methods and behaviors of (no-masked)masked array
andndarray
, for example, see bug27258.Making Standard Clear
Some methods' standard is not clear. For example, should we auto mask the invalid result? In some function (such as
sqrt
,std
) it does, but in other function (such asmedian
,mean
). Something more worse is that in the document some function don't mention it but auto change the mask (sqrt
std
) , and others do mention it but not change (mean
).And something more worse is that, some important methods don't have clear explanation both in document and doc string, some of them are really important. For example,
__array_wrap__
, most of the callings to ufunc call it, and I think it might be the cause of the bug25635.Since I'm not sure where to place these questions, I’ve marked this as a discussion for now.
The text was updated successfully, but these errors were encountered: