-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
MAINT,ENH: Rewrite scalar math logic #21188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
55c2247
to
5c5e2d8
Compare
Hmmmpf, somehow going to EDIT: My bad, I bet it really needs to be force-inlined, due to the complex values being structs passed by value. |
Puh... Added (tricky) tests and fixed some other bugs:
I did not fix that complex comparisons behave differently for arrays and scalars (ufuncs use NaN aware order currently?!). It is still a change, since the scalar path will be used a bit more often, but... Otherwise, these are the scalar-math changes as discussed @mattip. It is still based off gh-21178, though. |
(outp)->real = (in1r*rat + in1i)*scl; \ | ||
(outp)->imag = (in1i*rat - in1r)*scl; \ | ||
} \ | ||
} while(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note, since this felt a bit more complex with the branches, I just decided to use the ufunc-loop (and include that) directly. But happy to go the other way again.
We should reactivate this, but I am not sure if the test failures were real, so closing/reopening. (I guess there may ave been something about complex numbers or so.) |
b99a043
to
053b933
Compare
This commit tries to redo the scalar math logic to take some more care about subclasses, but most of all introduce logic to defer to the `other` if `self` can be cast to it safely (i.e. it is the correct promotion). This makes things much faster and more reliable, since we now use defer much less to `ufuncs` indirectly. This ensures that integer overflows are reported more reliably. Another major point about it, is that this reorganizes the coercion of Python int, float, complex (and bool). This should help a bit with switching to "weak" Python scalars. This may just be a first step on a longer path...
This significantly speeds pure up integer operations since checking floating point errors is somewhat slow. It seems to slow down some clongdouble math a tiny bit, maybe because the functions don't always get inlined fully.
The function was only used for the scalarmath and is not even documented, so schedule it for deprecation.
The assert doesn't make sense for richcompare, since it has no notion of forward. It is also problematic in the other cases, because e.g. complex remainder defers (because complex remainder is undefined). So `complex % bool` will ask bool, which then will try to defer to `complex` (even though it is not a forward op). That is correct, since both should defer to tell Python that the operation is undefined.
For some reason, some of the clang/macos CI is failing in the multiply operation becaues it fails to report floating point overflow errors. Maybe using `NPY_FINLINE` is enough to keep the old behaviour, since this seemed not to have beena problem when the function was a macro.
IIRC, these have to be inlined, because otherwise we pass structs by value, which does not always work correctly.
... also undefine any complex floor division, because it is also not defined. Note that the errors for some of these may be a bit less instructive, but for now I assume that is OK.
However, note that complex comparisons currently do NOT agree, but the problem is that we should possibly consider changing the ufunc rather than the scalar, so not changing it in this PR.
These are complicated, and modifications could probably be allowed here. The complexities arise not just from the assymetric behaviour of Python binary operators, but also because we additionally have our own logic for deferring sometimes (for arrays). That is, we may coerce the other object to an array when it is an "unknown" object. This may assume that subclasses of our scalars are always valid "arrays" already (so they never need to be coerced explicitly). That should be a sound assumption, I think?
PyPy does not seem to replace tp_richcompare in the "C wrapper" object for subclasses defining `__le__`, etc. That is understandable, but means that we cannot (easily) figure out that the subclass should be preferred. In general, this is a bit of a best effort try anyway, and this is probably simply OK. Hopefully, subclassing is rare and comparing two _different_ subclasses even more so.
babb657
to
3b60eff
Compare
I somewhat think the change for subclass logic is so esoteric, that I am not even sure a release note is useful...
Btw. when it comes to the floating point error issues on clang... It turns out the problem was moving the the Considering that it seemed like it sometimes was correct, I am even wondering if FPEs can be checked out-of-band and it is possible to create race conditions... But that is more of a curiosity, I am not worried that the fix of moving the FPE checking is not reasonable. |
…asses This seems just complicated, PyPy doesn't fully allow it, and CPython's `int` class does not even attempt it... So, just remove most of this logic (although, this keeps the "is subclass" information around, even if that is effectively unused).
Ping @mattip, not sure how to proceed. Do you want to have another look pair programming or can we go ahead? |
Thanks for the ping. I'll try to understand this a bit better, was quite lost the first time around :) |
*out = a / b; | ||
return NPY_FPE_OVERFLOW; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Help me out: when is a < 0 && a == -a
for bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, abs(np.int8(-128)) == -128
.
ret = OTHER_IS_UNKNOWN_OBJECT; | ||
} | ||
Py_DECREF(dtype); | ||
return ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with this big table for now. At some point when we deprecate value based casting for scalars we can rethink this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, created the issue for that!
scalar_res = op(scalar1, scalar2) | ||
assert_array_equal(scalar_res, res) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay for hypothesis-based tests
LGTM. The refactoring looks OK, and the speedups are great. I assume the benchmarks at the top of the PR are still valid?
Could you add some benchmarks for these and quantify the slowdown? |
The big slowdown is that for:
when the array can be cast to the scalar:
Used to be much faster (scalar path kicked in), then:
And for now they both go through the array path. So this is a huge slowdown in that very specific case. The I have added a benchmark for it. |
Sorry, I am not following. What was the new benchmark performance before the PR and what is it afterwards? How does the time for the new benchmarks |
OK, here is the rerun benchmark:
The That slowdown is massive of course, since the array path is much much slower than the scalar path (especially for scalars unfortunately). But, I somewhat doubt that |
Let's put this in since it does clean up the error handling and makes some parts of the code cleaner. We may need to back it out if it turns out 0d arrays are used more than we think, but I agree scalars are likely to be much more common. I will wait a little while to see if anyone else has an opinion. |
Yeah, or re-add special cases for 0-D arrays. Although, in that case it would be nice to at least also make |
Thanks @seberg |
Checks condition a == NPY_MIN_@NAME@ to determine whether an overflow error has occurred for np.int8 type. See #21289 and #21188 (comment) for reference. This also adds error integer overflow handling to the `-scalar` paths and "activates" a test for the unsigned versions. A few tests are skipped, because the tests were buggy (they never ran). These paths require followups to fix.
This tries to redo the scalar math logic to take some more care about subclasses, but most of all introduce logic to defer to the
other
ifself
can be cast to it safely (i.e. it is the correct promotion).This makes things much faster and more reliable, since we now use defer much less to
ufuncs
indirectly. This ensures that integer overflows are reported more reliably.Another major point about it, is that this reorganizes the coercion of Python int, float, complex (and bool). This should help a bit with switching to "weak" Python scalars.
Further, it contains a commit to make all macros to inline functions and move the floating point overflow flag handling to a return value. Checking floating point flags is not insanely slow, but it is pretty slow on the scale of integer operations here (~30% on my computer).
It should fix some bugs around subclasses, but then subclassing scalars should still be pretty prone to issues (similar to subclassing arrays I guess). Complicated subclass cases may still end up in the generic (array) path, but we catch a few more ahead of time.
This is the 2-3 approach I have been thinking about. Another one would be to have (more or less) a single function dealing with any scalar inputs. It does "inline" the casting logic here. I do not like that, but it seemed somewhat straight forward – it would be nice to create
npy_cast_@from@_to_@to@
or so functions, to use here more generically.The alternative would be logic with the existing "cast" functionality, but it should be slower, and while verbose (and the current macros are ugly), I am not sure that would actually end up much better.
The
PyArray_ScalarFromObject
seemed only useful for the old scalar paths, so I added a deprecation: Neither were even documented, and both would probably need some work to transition to the new DTypes well.PyArray_CastScalarDirect
may also be a target, but it is still used in at least one place, so likely should just fix it.Benchmarks:
(Some corner cases may be significantly slower, mainly certain
scalar + 0d_array
ops, but I am not sure I want to worry about those much.)This PR became quite big, I may split it up. At this time it relies on gh-21178. It is currently missing some additional tests for the subclass behavior at least, but I would like to check code-coverage on that front as well.