Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH: introduce a notion of "compatible" stringdtype instances #26261 #26351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 27, 2024

Conversation

charris
Copy link
Member

@charris charris commented Apr 26, 2024

Backport of #26261.

This is a followup for #26198, it accomplishes the same thing I wanted in that PR.

See the change to the NEP for the details and explanation.

In short, this relaxes the error checking in stringdtype ufuncs and changes the common_instance logic, allowing operations between distinct stringdtype instances as long as the result isn't ambiguous. This makes it much simpler to work with non-default stringdtype instances, since users don't need to zealously convert all ufunc arguments to the same dtype before passing them to numpy in the most common cases like passing a python string as an argument.

For all operations that take more than one string argument, we now only raise an error if the inputs have distinct na_object settings. We allow distinct coerce settings and just choose coerce=False for string outputs if any input dtype had coerce=False set.

Also added a test. There was one spot in the existing tests where we were doing equality comparisons between arrays with distinct na_object settings, so I updated that test to account for the behavior change.

Ideally we could get this merged in time to be included with 2.0 RC2. I'd like to have this in NumPy 2.0 because it will eliminate a lot of boilerplate argument sanitizing in pandas when it called numpy ufuncs. I totally understand if this is coming too late in the game though.

@charris charris added 01 - Enhancement 08 - Backport Used to tag backport PRs labels Apr 26, 2024
@charris charris added this to the 2.0.0 release milestone Apr 26, 2024
@charris
Copy link
Member Author

charris commented Apr 26, 2024

@ngoldbaum I can fix the errors here several ways:

  1. Bring in more code from main
  2. Remove code from the backport
  3. Don't do the backport.

The problem with 1. is, IIRC, that we don't intend to implement all the string functions available in 2.1 in 2.0. Does that still stand?


NpyString_release_allocators(5, allocators);
return -1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you just need to delete everything from line 1509 to here. This is all code supporting ufuncs we're not backporting to 2.0.

@charris charris merged commit 9c913ee into numpy:maintenance/2.0.x Apr 27, 2024
57 checks passed
@charris charris deleted the backport-26261 branch April 27, 2024 01:18
@charris
Copy link
Member Author

charris commented Apr 27, 2024

Thanks @ngoldbaum, that fixes the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 08 - Backport Used to tag backport PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants