ENH: introduce a notion of "compatible" stringdtype instances #26261 #26351

charris · 2024-04-26T17:16:52Z

Backport of #26261.

This is a followup for #26198, it accomplishes the same thing I wanted in that PR.

See the change to the NEP for the details and explanation.

In short, this relaxes the error checking in stringdtype ufuncs and changes the common_instance logic, allowing operations between distinct stringdtype instances as long as the result isn't ambiguous. This makes it much simpler to work with non-default stringdtype instances, since users don't need to zealously convert all ufunc arguments to the same dtype before passing them to numpy in the most common cases like passing a python string as an argument.

For all operations that take more than one string argument, we now only raise an error if the inputs have distinct na_object settings. We allow distinct coerce settings and just choose coerce=False for string outputs if any input dtype had coerce=False set.

Also added a test. There was one spot in the existing tests where we were doing equality comparisons between arrays with distinct na_object settings, so I updated that test to account for the behavior change.

Ideally we could get this merged in time to be included with 2.0 RC2. I'd like to have this in NumPy 2.0 because it will eliminate a lot of boilerplate argument sanitizing in pandas when it called numpy ufuncs. I totally understand if this is coming too late in the game though.

charris · 2024-04-26T19:02:06Z

@ngoldbaum I can fix the errors here several ways:

Bring in more code from main
Remove code from the backport
Don't do the backport.

The problem with 1. is, IIRC, that we don't intend to implement all the string functions available in 2.1 in 2.0. Does that still stand?

ngoldbaum · 2024-04-26T19:23:35Z

numpy/_core/src/umath/stringdtype_ufuncs.cpp

+
+    NpyString_release_allocators(5, allocators);
+    return -1;
+}


I think you just need to delete everything from line 1509 to here. This is all code supporting ufuncs we're not backporting to 2.0.

charris · 2024-04-27T01:22:35Z

Thanks @ngoldbaum, that fixes the problem.

charris added 01 - Enhancement 08 - Backport Used to tag backport PRs labels Apr 26, 2024

charris added this to the 2.0.0 release milestone Apr 26, 2024

ngoldbaum and others added 8 commits April 26, 2024 11:57

MNT: fix copy/paste error for NA type extracted from Pandas

d4875be

ENH: introduce 'compatible' stringdtype instances

8acdf6f

MNT: refactor stringdtype compatibility checking out of common_instance

4bb1e22

MNT: refactor tortured logic in test

f07f36d

MNT: refactor stringdtype_setitem following marten's suggestion

99e7f71

MAINT: respond to marten's comments

e7f972c

MNT: respond to minor comments from marten

c25651c

MAINT: Update some files from main

1682195

ngoldbaum reviewed Apr 26, 2024

View reviewed changes

MAINT: Remove some ufuncs not wanted in 2.0.x

1028749

charris force-pushed the backport-26261 branch from 0527cf7 to 1028749 Compare April 26, 2024 22:06

charris merged commit 9c913ee into numpy:maintenance/2.0.x Apr 27, 2024
57 checks passed

charris deleted the backport-26261 branch April 27, 2024 01:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: introduce a notion of "compatible" stringdtype instances #26261 #26351

ENH: introduce a notion of "compatible" stringdtype instances #26261 #26351

Uh oh!

charris commented Apr 26, 2024

Uh oh!

charris commented Apr 26, 2024

Uh oh!

ngoldbaum Apr 26, 2024

Uh oh!

Uh oh!

charris commented Apr 27, 2024

Uh oh!

Uh oh!

Uh oh!

ENH: introduce a notion of "compatible" stringdtype instances #26261 #26351

ENH: introduce a notion of "compatible" stringdtype instances #26261 #26351

Uh oh!

Conversation

charris commented Apr 26, 2024

Uh oh!

charris commented Apr 26, 2024

Uh oh!

ngoldbaum Apr 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

charris commented Apr 27, 2024

Uh oh!

Uh oh!