MAINT: do not use copyswap in where internals #23770

ngoldbaum · 2023-05-16T17:13:54Z

This makes it possible to do e.g. np.where(condition, string_array, other_string_array) for StringDType.

Also added a new benchmark for the code path I modified. It ends up being 33% slower for object arrays and 10% slower for the non-object case:

       before           after         ratio
     [9d08632f]       [f8b1a3e4]
     <rm-copyswap-in-where~1>       <rm-copyswap-in-where>
+      82.8±0.8μs        110±0.8μs     1.33  bench_function_base.Where.time_2_object
+      14.2±0.2μs       15.6±0.2μs     1.10  bench_function_base.Where.time_2
-      75.5±0.7μs       71.5±0.6μs     0.95  bench_function_base.Where.time_interleaved_zeros_x8

Not sure about the last change, I think that might be noise?

ngoldbaum · 2023-05-16T17:40:07Z

The latest push no longer has a slowdown in the fast path:

       before           after         ratio
     [9d08632f]       [46cf47d1]
     <rm-copyswap-in-where~2>       <rm-copyswap-in-where>
+      80.9±0.5μs          110±2μs     1.36  bench_function_base.Where.time_2_object
-        78.2±6μs       73.7±0.8μs     0.94  bench_function_base.Where.time_all_ones
-        16.0±1μs       14.7±0.2μs     0.92  bench_function_base.Where.time_2_broadcast

seberg

The latest push no longer has a slowdown in the fast path:

I would think that is just random. That code shouldn't add 4us overhead, I would buy up to 100ns if you are insisting. I don't care about removing it, but the iterator construct will add much more overhead anyway.

It might be nice to clean up that needs_api. But I think consolidating the two dtype transfer functions into one is useful. There is no swapping. You can add NPY_ITER_ALIGNED if you don't like using unaligned access (will force buffering if unaligned, but ensure you can assume aligned.)

seberg · 2023-05-16T17:31:49Z

numpy/core/src/multiarray/multiarraymodule.c

+                &y_cast_info, &y_transfer_flags) != NPY_SUCCEED) {
+            goto fail;
+        }
+
        NPY_BEGIN_THREADS_NDITER(iter);


Hmmm, maybe we can make this precise here:

transfer_flags = PyArrayMethod_COMBINED_FLAGS(x_transfer_flags, y_transfer_flags); transfer_flags = PyArrayMethod_COMBINED_FLAGS( transfer_flags, PyArrayNpyIter_GetTransferFlags(iter)); if (!(cast_flags & NPY_METH_REQUIRES_PYAPI)) { NPY_BEGIN_THREADS_THRESHOLDED(NpyIter_GetIterSize(iter)); }

in principle we should do this here (and probably in a few other places).

seberg · 2023-05-16T17:35:31Z

numpy/core/src/multiarray/multiarraymodule.c

+                y_is_aligned, ystrides[0], ystrides[1], dty, common_dt, 0,
+                &y_cast_info, &y_transfer_flags) != NPY_SUCCEED) {
+            goto fail;
+        }


Theoretically a point if one of them is not aligned, but the iterator probably even ensures that.

We really only need a single transfer function here, I think. The old code is unnecessarily complex: the iterator ensures that swapping cannot possibly be necessary. (may not be always ideal, but that is what it does.)

ngoldbaum · 2023-05-16T20:55:08Z

Thanks for the suggestions, especially the tip to use NPY_ITER_ALIGNED, it didn't occur to me to use the iterator to ensure the casts are exactly the same.

ngoldbaum · 2023-05-16T21:06:20Z

Here are the asv timings on my laptop with the latest push:

+        79.1±2μs          106±6μs     1.34  bench_function_base.Where.time_2_object
+      11.6±0.2μs       12.4±0.1μs     1.07  bench_function_base.Where.time_1
+     13.9±0.09μs      14.7±0.07μs     1.06  bench_function_base.Where.time_2

seberg

Thanks for the trouble of simplifying the code! The part of the strides is (technically) simplifying it too much unfortunately.

You can try that test (or just add it anyway), but I don't want to try to waste time trying to find a test that fails reliably...

seberg · 2023-05-17T06:51:52Z

numpy/core/src/multiarray/multiarraymodule.c

+        npy_intp cstride = strides[1];
+        npy_intp xstride = strides[2];
+        npy_intp ystride = strides[3];


Argg, sorry. I had not noticed this. The above strides * is good, the iterator keeps using that. But we cannot do this part, because the iterator may mutate it.

(I doubt this can happen for this iterator, and it would require broadcasting at the very least. I will also go on record that I suspect that the mechanism for why it changes, using the buffers differently, is probably a lot of complexity for almost no or even negative gain.)

I have ideas on what kind of test would be necessary, something like:

a = np.ones((100, 10000), dtype="f4") b = np.ones(50000)[::5] # non-itemsize stride

because it doesn't cast, but its larger than the buffer size along the last dimension and the normal stride is larger than the buffer stride.
(Wrong dtype for a to force buffer/cast on that op, so that growing the loop isn't trivial, dunno if it matters.)

But, you can get lost for a week in that logic, so no worries either way, the above is just a guess...

seberg · 2023-05-17T07:05:59Z

numpy/core/src/multiarray/multiarraymodule.c

+        PyArray_Descr **dts = NpyIter_GetDescrArray(iter);
+        PyArray_Descr *dtx = dts[2];
+        PyArray_Descr *dty = dts[3];
+        npy_intp itemsize = dts[0]->elsize;


Suggested change

PyArray_Descr **dts = NpyIter_GetDescrArray(iter);

PyArray_Descr *dtx = dts[2];

PyArray_Descr *dty = dts[3];

npy_intp itemsize = dts[0]->elsize;

npy_intp itemsize = common_dt->elsize;

Use common_dt everywhere. The iterator enforces that after all (if it didn't we would actually have to do the cast below. To be fair, when we don't use that "trivial copy" fast-path, that could be faster even, but that is a different thing than this PR).

seberg · 2023-05-17T07:08:52Z

numpy/core/src/multiarray/multiarraymodule.c

+
+        NPY_ARRAYMETHOD_FLAGS transfer_flags = 0;
+
+        npy_intp transfer_strides[2] = {xstride, itemsize};


Suggested change

npy_intp transfer_strides[2] = {xstride, itemsize};

npy_intp transfer_strides[2] = {itemsize, itemsize};

We copy only a single item anyway, so it doesn't matter.

ngoldbaum · 2023-05-17T15:44:02Z

Applied your suggestions. I tried a little bit to get a failing test, but I think the iterator is already enforcing that the strides are the same before we get to the where inner loop. In any case, your suggestions make things clearer.

seberg · 2023-05-17T15:50:49Z

Can you please change the stride array unpacking back, so that it is inside the iternext() loop?? I understand that finding an example is either too hard or even impossible; but the strides are not guaranteed to be fixed by the iterator (there is a reason a GetFixedStridesArray function exists).

ngoldbaum · 2023-05-17T15:52:28Z

Can you please change the stride array unpacking back, so that it is inside the iternext() loop??

Ah, sorry, I didn't appreciate that was the issue.

numpy/core/src/multiarray/multiarraymodule.c

seberg · 2023-05-17T19:05:26Z

numpy/core/src/multiarray/multiarraymodule.c

+
+        int swap = PyDataType_ISBYTESWAPPED(common_dt);
+        int native = (swap == 0) && !needs_api;


Suggested change

int swap = PyDataType_ISBYTESWAPPED(common_dt);

int native = (swap == 0) && !needs_api;

int has_ref = PyDataType_REFCHK(common_dt);

Oh lets just delete it, this is meaningless. It can never be swapped.

numpy/core/src/multiarray/multiarraymodule.c

The dtype is forced to be the same, swapping is impossible but even if it was possible, it wouldn't matter here.

seberg

Thanks, LGTM. I took the liberty to push a (tiny) follow up commit to clean things out a bit. (no need for swap check, and we use refchk everywhere else for the trivial copy paths, so it's nice to do that explicitly here too I think).

(will probably just merge tomorrow, unless anyone beats me to it)

mattip · 2023-05-17T19:37:27Z

Could one of you rerun the relevant benchmarks on the final version?

mattip · 2023-05-17T20:04:49Z

numpy/core/src/multiarray/multiarraymodule.c

+                            if (cast_info.func(
+                                    &cast_info.context, args, &one,
+                                    transfer_strides, cast_info.auxdata) < 0) {
+                                goto fail;


Isn't there a path where has_ref is false but we get here because of strange itemsize, and then we call this function without the GIL (after NPY_BEGIN_THREADS_THRESHOLDED) (also cast_info below)

If we go here, we account for whether or not the GIL is needed (e.g. the GIL should be released for the string dtype prototype). That is done via the transfer_flags.

The only thing we assume is that if !has_ref is true than a value of that dtype can be copied via a memcpy (or pointer assignment really). At this point the function call never does a cast and only a copy.

seberg · 2023-05-17T20:19:40Z

Could one of you rerun the relevant benchmarks on the final version?

we didn't really touch the core, so I think any fluctuations will be random, but maybe @ngoldbaum you can re-run it?

ngoldbaum · 2023-05-17T21:15:42Z

Could one of you rerun the relevant benchmarks on the final version?

I only see a change in the new benchmark I added:


       before           after         ratio
     [9d08632f]       [01a251ba]
     <rm-copyswap-in-where~6>       <rm-copyswap-in-where>
+        81.3±1μs          106±3μs     1.31  bench_function_base.Where.time_2_object

This was with Python 3.11 on my laptop with no special configuration for benchmarking.

mattip · 2023-05-18T06:29:32Z

I guess the slowdown makes sense since there is now a function call.

mattip · 2023-05-18T06:29:50Z

Thanks @ngoldbaum, @seberg

ngoldbaum added 2 commits May 16, 2023 11:06

BENCH: add benchmark for where slow path

9d08632

MAINT: do not use copyswap in where internals

f8b1a3e

github-actions bot added the 03 - Maintenance label May 16, 2023

MAINT: attempt to speed up optimized path

46cf47d

seberg reviewed May 16, 2023

View reviewed changes

MAINT: use NPY_ITER_ALIGNED flag to simplify cast setup

efa004b

seberg reviewed May 17, 2023

View reviewed changes

MAINT: simplify where cast setup further

20463ea

seberg reviewed May 17, 2023

View reviewed changes

numpy/core/src/multiarray/multiarraymodule.c Show resolved Hide resolved

move stride unpacking inside the iteration loop

2db272b

seberg reviewed May 17, 2023

View reviewed changes

numpy/core/src/multiarray/multiarraymodule.c Outdated Show resolved Hide resolved

seberg reviewed May 17, 2023

View reviewed changes

numpy/core/src/multiarray/multiarraymodule.c Outdated Show resolved Hide resolved

MAINT: Use has_ref for trivial copy decision and remove unused needs_api

01a251b

The dtype is forced to be the same, swapping is impossible but even if it was possible, it wouldn't matter here.

seberg approved these changes May 17, 2023

View reviewed changes

mattip reviewed May 17, 2023

View reviewed changes

mattip merged commit cebb7a6 into numpy:main May 18, 2023

ngoldbaum mentioned this pull request May 14, 2024

BUG: fixes for three related stringdtype issues #26436

Merged

charris mentioned this pull request May 16, 2024

BUG: fixes for three related stringdtype issues (#26436) #26459

Merged


		NPY_ARRAYMETHOD_FLAGS transfer_flags = 0;

		npy_intp transfer_strides[2] = {xstride, itemsize};

	npy_intp transfer_strides[2] = {xstride, itemsize};
	npy_intp transfer_strides[2] = {itemsize, itemsize};


		int swap = PyDataType_ISBYTESWAPPED(common_dt);
		int native = (swap == 0) && !needs_api;

Uh oh!

MAINT: do not use copyswap in where internals #23770

MAINT: do not use copyswap in where internals #23770

Uh oh!

Conversation

ngoldbaum commented May 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented May 16, 2023

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngoldbaum commented May 16, 2023

Uh oh!

ngoldbaum commented May 16, 2023

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngoldbaum commented May 17, 2023

Uh oh!

seberg commented May 17, 2023

Uh oh!

ngoldbaum commented May 17, 2023

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

mattip commented May 17, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented May 17, 2023

Uh oh!

ngoldbaum commented May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented May 18, 2023

Uh oh!

mattip commented May 18, 2023

Uh oh!

Uh oh!

ngoldbaum commented May 16, 2023 •

edited

Loading

ngoldbaum commented May 17, 2023 •

edited

Loading