-
-
Notifications
You must be signed in to change notification settings - Fork 11k
BUG: fix uint alignment asserts in lowlevel loops #12626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f9edc06
to
8cc9d9b
Compare
Hmm the updated test caught some bugs... |
Good test :) |
At some point we should experiment with not copying using ints. I think the original need was to work around gcc memcpy/memmov etc. that used to be performance disasters and supposedly fixed some years ago. With SIMD et al, it may be that compilers can do a better job than we can these days. |
The 32-bit failing test hits the assert on line 812 in Here is the stack trace from a non-debug python so I do not get the nice python stack arguments.
|
assert(N == 0 || npy_is_aligned(dst, _ALIGN(_TYPE2))); | ||
# endif | ||
assert(N == 0 || npy_is_aligned(src, _UINT_ALIGN(_TYPE1))); | ||
assert(N == 0 || npy_is_aligned(dst, _UINT_ALIGN(_TYPE2))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now fails on npy_longdouble on 32 bit, where sizeof(npy_longdouble) == 12.
Shouldn't this be _UINT_ALIGN(dtype->alignment), not _UINT_ALIGN(dtype->elsize)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written, this can never succeed when sizeof(_TYPE2) == 12
Edit: add qualifier sizeof
@ahaldane We have continuing failures in both the master branches and 1.16.x due to merging the earlier fix. It would be good to get this finished. |
See also #12638. |
I'll get to it tonight, I have the fix mostly done from a few days ago, just need to check it over. |
848fa04
to
791c5f4
Compare
All right, should be ready for review now. I now realized that any arrays which want to use the "aligned" code-paths in the lowlevel loops must be both "uint" and "true" aligned. This is because casting paths need true alignment, and copy paths need uint alignment, but the dtype-dispatch funcs ( So the solution here is to make the alignment flag be computed using both types of alignment on those cases. I also made the assert statements in the lowlevel loops check for the appropriate kind of alignment, depending on the next few lines after the assert. I grepped all the source so I think I got all the places that need to check both. There are a few places that only need to check one or the other, eg the On my system, it passes tests with USE_DEBUG both on x64 and in a x86 chroot. The updated test here should be more thorough and deterministic than before, too. I also documented some cases where uint alignment is larger than true alignment and vice-versa, which are useful for sanity checking. |
|
||
The ndarray is guranteed *not* aligned to twice the requested alignment. | ||
Eg, if align=4, guarantees it is not aligned to 8. If align=None uses | ||
dtype.alignment.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"guranteed" spelling.
LGTM. Newly allocated ndarrays will always be both uint aligned and true aligned, correct? |
Only true alignment is guaranteed, as uint alignment depends on the itemsize. Arrays whose itemsize is not 2,4,8,16 count as "not uint aligned", and will generally go down the unaligned code paths. |
/* New arrays are aligned and need no cast */ | ||
op_itflags[iop] |= NPY_OP_ITFLAG_ALIGNED; | ||
/* | ||
* New arrays are guranteed true-aligned, but copy/cast code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"guranteed" spelling.
@@ -2888,11 +2894,17 @@ npyiter_allocate_arrays(NpyIter *iter, | |||
PyArray_DATA(op[iop]), NULL); | |||
|
|||
/* | |||
* New arrays are aligned need no cast, and in the case | |||
* New arrays are guranteed true-aligned, but copy/cast code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"guranteed" spelling.
/* The temporary copy is aligned and needs no cast */ | ||
op_itflags[iop] |= NPY_OP_ITFLAG_ALIGNED; | ||
/* | ||
* New arrays are guranteed true-aligned, but copy/cast code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"guranteed" spelling.
numpy/core/tests/test_multiarray.py
Outdated
""" | ||
Allocate a new ndarray with aligned memory. | ||
|
||
The ndarray is guranteed *not* aligned to twice the requested alignment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"guranteed" spelling.
791c5f4
to
812e359
Compare
Typos fixed. |
Great. Thanks Allan. |
alignments. | ||
Note that the strided-copy and strided-cast code are deeply intertwined and so | ||
any arrays being processed by them must be both uint and true aligned, even | ||
though te copy-code only needs uint alignment and the cast code only true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aligned = raw_array_is_aligned(ndim, shape, dst_data, dst_strides, | ||
npy_uint_alignment(dst_dtype->elsize)) && | ||
raw_array_is_aligned(ndim, shape, dst_data, dst_strides, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a belated comment, but this does seem rather inefficient: one should check the larger first and then the smaller only if it isn't an integer factor of the smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how often this recurs, probably should have a routine that checks both...
Further correction to the debug assert statements in
lowlevel_stride_loops.c.src
to account for uint alignment, see #12618.This also updates the unit test so it always fails if the alignment is incorrectly calculated, instead of sporadically failing depending on what malloc gives. That's done by making
_aligned_zeros
align to the requested alignment yet not twice the alignment.The particular case that was failing was for 16-byte longdouble, which is 8-byte "uint aligned" but 16-byte "true aligned". (The copy-code copies 16-byte types with two uint64 assignments). So an 8-byte-aligned ptr would go into the uint aligned copy code, but would trip the 16-byte assert statement.