Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: fix uint alignment asserts in lowlevel loops #12626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 3, 2019

Conversation

ahaldane
Copy link
Member

Further correction to the debug assert statements in lowlevel_stride_loops.c.src to account for uint alignment, see #12618.

This also updates the unit test so it always fails if the alignment is incorrectly calculated, instead of sporadically failing depending on what malloc gives. That's done by making _aligned_zeros align to the requested alignment yet not twice the alignment.

The particular case that was failing was for 16-byte longdouble, which is 8-byte "uint aligned" but 16-byte "true aligned". (The copy-code copies 16-byte types with two uint64 assignments). So an 8-byte-aligned ptr would go into the uint aligned copy code, but would trip the 16-byte assert statement.

@ahaldane ahaldane force-pushed the further_uint_align_fix branch from f9edc06 to 8cc9d9b Compare December 28, 2018 03:34
@ahaldane ahaldane added this to the 1.16.0 release milestone Dec 28, 2018
@ahaldane
Copy link
Member Author

Hmm the updated test caught some bugs...

@charris
Copy link
Member

charris commented Dec 28, 2018

Hmm the updated test caught some bugs...

Good test :)

@charris
Copy link
Member

charris commented Dec 28, 2018

At some point we should experiment with not copying using ints. I think the original need was to work around gcc memcpy/memmov etc. that used to be performance disasters and supposedly fixed some years ago. With SIMD et al, it may be that compilers can do a better job than we can these days.

@mattip
Copy link
Member

mattip commented Dec 31, 2018

The 32-bit failing test hits the assert on line 812 in _aligned_cast_bool_to_longdouble, when running numpy/core/tests/test_einsum.py::TestEinsum::test_einsum_sums_longdouble

Here is the stack trace from a non-debug python so I do not get the nice python stack arguments.

#3  0xf7e1bd07 in __assert_fail_base (fmt=0xf7f56258 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0xf7966618 "N == 0 || npy_is_aligned(dst, _UINT_ALIGN(_TYPE2))", 
    file=0xf796645c "numpy/core/src/multiarray/lowlevel_strided_loops.c.src", line=812, 
    function=0xf79675e0 <__PRETTY_FUNCTION__.17102> "_aligned_cast_bool_to_longdouble") at assert.c:92
#4  0xf7e1bd8b in __GI___assert_fail (assertion=0xf7966618 "N == 0 || npy_is_aligned(dst, _UINT_ALIGN(_TYPE2))", 
    file=0xf796645c "numpy/core/src/multiarray/lowlevel_strided_loops.c.src", line=812, 
    function=0xf79675e0 <__PRETTY_FUNCTION__.17102> "_aligned_cast_bool_to_longdouble") at assert.c:101
#5  0xf77ef5e9 in _aligned_cast_bool_to_longdouble (dst=0xffff7938 "\354\201\377\377ln\365\367\002", dst_stride=0, 
    src=0xffff7a2b "", src_stride=0, N=1, __NPY_UNUSED_TAGGEDsrc_itemsize=1, __NPY_UNUSED_TAGGEDdata=0x0)
    at numpy/core/src/multiarray/lowlevel_strided_loops.c.src:812
#6  0xf77aac6c in PyArray_CastRawArrays (count=1, src=0xffff7a2b "", dst=0xffff7938 "\354\201\377\377ln\365\367\002", src_stride=0, 
    dst_stride=0, src_dtype=0xf7a2cd40 <BOOL_Descr>, dst_dtype=0xf7a2c9c0 <LONGDOUBLE_Descr>, move_references=0)
    at numpy/core/src/multiarray/dtype_transfer.c:3785
#7  0xf7777cc4 in PyArray_AssignRawScalar (dst=0xf4ab1ac0, src_dtype=0xf7a2cd40 <BOOL_Descr>, src_data=0xffff7a2b "", wheremask=0x0, 
    casting=NPY_SAFE_CASTING) at numpy/core/src/multiarray/array_assign_scalar.c:248
#8  0xf7784781 in PyArray_AssignZero (dst=0xf4ab1ac0, wheremask=0x0) at numpy/core/src/multiarray/convert.c:539
#9  0xf77be7ea in PyArray_EinsteinSum (subscripts=<optimized out>, nop=1, op_in=0xffff929c, dtype=0x0, order=NPY_KEEPORDER, 
    casting=NPY_SAFE_CASTING, out=0x0) at numpy/core/src/multiarray/einsum.c.src:2772
#10 0xf7813b3c in array_einsum (__NPY_UNUSED_TAGGEDdummy=0xf7a76e64, args=0xf4eb7fac, kwds=0xf4ac810c)
    at numpy/core/src/multiarray/multiarraymodule.c:2663

assert(N == 0 || npy_is_aligned(dst, _ALIGN(_TYPE2)));
# endif
assert(N == 0 || npy_is_aligned(src, _UINT_ALIGN(_TYPE1)));
assert(N == 0 || npy_is_aligned(dst, _UINT_ALIGN(_TYPE2)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now fails on npy_longdouble on 32 bit, where sizeof(npy_longdouble) == 12.
Shouldn't this be _UINT_ALIGN(dtype->alignment), not _UINT_ALIGN(dtype->elsize)?

Copy link
Member

@mattip mattip Dec 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, this can never succeed when sizeof(_TYPE2) == 12

Edit: add qualifier sizeof

@charris
Copy link
Member

charris commented Dec 31, 2018

@ahaldane We have continuing failures in both the master branches and 1.16.x due to merging the earlier fix. It would be good to get this finished.

@charris
Copy link
Member

charris commented Jan 1, 2019

See also #12638.

@ahaldane
Copy link
Member Author

ahaldane commented Jan 2, 2019

I'll get to it tonight, I have the fix mostly done from a few days ago, just need to check it over.

@ahaldane ahaldane force-pushed the further_uint_align_fix branch 3 times, most recently from 848fa04 to 791c5f4 Compare January 3, 2019 05:25
@ahaldane
Copy link
Member Author

ahaldane commented Jan 3, 2019

All right, should be ready for review now.

I now realized that any arrays which want to use the "aligned" code-paths in the lowlevel loops must be both "uint" and "true" aligned. This is because casting paths need true alignment, and copy paths need uint alignment, but the dtype-dispatch funcs (PyArray_GetDtypeTransferFunction) don't know which one we will go down yet when we supply the aligned flag.

So the solution here is to make the alignment flag be computed using both types of alignment on those cases. I also made the assert statements in the lowlevel loops check for the appropriate kind of alignment, depending on the next few lines after the assert.

I grepped all the source so I think I got all the places that need to check both. There are a few places that only need to check one or the other, eg the mapiter_* functions only need uint alignment so I left some IsUintAligned checks alone (not visible in diff).

On my system, it passes tests with USE_DEBUG both on x64 and in a x86 chroot. The updated test here should be more thorough and deterministic than before, too. I also documented some cases where uint alignment is larger than true alignment and vice-versa, which are useful for sanity checking.


The ndarray is guranteed *not* aligned to twice the requested alignment.
Eg, if align=4, guarantees it is not aligned to 8. If align=None uses
dtype.alignment."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"guranteed" spelling.

@mattip
Copy link
Member

mattip commented Jan 3, 2019

LGTM. Newly allocated ndarrays will always be both uint aligned and true aligned, correct?

@ahaldane
Copy link
Member Author

ahaldane commented Jan 3, 2019

Only true alignment is guaranteed, as uint alignment depends on the itemsize. Arrays whose itemsize is not 2,4,8,16 count as "not uint aligned", and will generally go down the unaligned code paths.

/* New arrays are aligned and need no cast */
op_itflags[iop] |= NPY_OP_ITFLAG_ALIGNED;
/*
* New arrays are guranteed true-aligned, but copy/cast code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"guranteed" spelling.

@@ -2888,11 +2894,17 @@ npyiter_allocate_arrays(NpyIter *iter,
PyArray_DATA(op[iop]), NULL);

/*
* New arrays are aligned need no cast, and in the case
* New arrays are guranteed true-aligned, but copy/cast code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"guranteed" spelling.

/* The temporary copy is aligned and needs no cast */
op_itflags[iop] |= NPY_OP_ITFLAG_ALIGNED;
/*
* New arrays are guranteed true-aligned, but copy/cast code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"guranteed" spelling.

"""
Allocate a new ndarray with aligned memory.

The ndarray is guranteed *not* aligned to twice the requested alignment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"guranteed" spelling.

@ahaldane ahaldane force-pushed the further_uint_align_fix branch from 791c5f4 to 812e359 Compare January 3, 2019 16:27
@ahaldane
Copy link
Member Author

ahaldane commented Jan 3, 2019

Typos fixed.

@charris charris merged commit fd89a41 into numpy:master Jan 3, 2019
@charris
Copy link
Member

charris commented Jan 3, 2019

Great. Thanks Allan.

alignments.
Note that the strided-copy and strided-cast code are deeply intertwined and so
any arrays being processed by them must be both uint and true aligned, even
though te copy-code only needs uint alignment and the cast code only true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahaldane - in #12677 could you fix the type here (te -> the)

aligned = raw_array_is_aligned(ndim, shape, dst_data, dst_strides,
npy_uint_alignment(dst_dtype->elsize)) &&
raw_array_is_aligned(ndim, shape, dst_data, dst_strides,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a belated comment, but this does seem rather inefficient: one should check the larger first and then the smaller only if it isn't an integer factor of the smaller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how often this recurs, probably should have a routine that checks both...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants