Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: Fix string to bool cast regression #16068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 12, 2020

Conversation

seberg
Copy link
Member

@seberg seberg commented Apr 24, 2020

While the legacy behaviour of casting strings to booleans by
first converting the string to an integer is undersireable in
general. It will require a Deprecation/FutureWarning to do the
transition.
Changing this accidentally was thus a regression.

Closes gh-16023

@seberg
Copy link
Member Author

seberg commented Apr 27, 2020

@eric-wieser maybe you can give this a quick careful review? We need to put this in within the next few days to get 1.18.4 out this months (its a weird limit due to our rackspace account used in the 1.18.x release process).

@eric-wieser
Copy link
Member

Code looks fine. It's hard to tell if this really restores exactly the behavior we care about, but it look like it does at least do that for the behavior people are somehow relying upon.

@eric-wieser
Copy link
Member

I assume what matters is this line:

https://github.com/numpy/numpy/pull/15766/files#diff-f3387afa660b73f8f37f0978193799d9L1512

Which encoded this two-step conversion.

@seberg
Copy link
Member Author

seberg commented Apr 27, 2020

Hmmmpf, yeah, there are definitely changes in UnicodeError, but I do not care about those.

Hard to be quite sure that there are not any changes for structured dtypes due to existing inconsistencies in the change that nobody noticed yet. I guess if we want to be on the safe side, there is no way but to revert the whole things. I will try and see if there is anything...

@eric-wieser
Copy link
Member

Can you add a test for a structured type containing strings being cast to one containing bools, just to be sure?

@seberg
Copy link
Member Author

seberg commented Apr 27, 2020

Ah, nevermind, str->structured never hit the old code actually since convert is 0 in that case. And I guess string -> number conversion always uses Long, float or complex, and where it did not this was a fix (i.e. longdouble).

The only thing that is a bit strange is datetimes... I cannot find a change in behaviour there, but I did not yet understand why not...

While the legacy behaviour of casting strings to booleans by
first converting the string to an integer is undersireable in
general. It will require a Deprecation/FutureWarning to do the
transition.
Changing this accidentally was thus a regression.

Closes numpygh-16023
@seberg seberg force-pushed the string-to-bool-regression branch from 1584f89 to e0bdf74 Compare April 27, 2020 19:08
@seberg
Copy link
Member Author

seberg commented Apr 27, 2020

Arrg, no wonder I am confused... We have a _strided_to_strided_string_to_datetime as a second implementation for the datetime casts. So while this one was actually incorrect and accidentally fixed as well. It was never used within NumPy.
So, there is a second very subtle change here, when using datetime/timedelta with direct access of the descr->f->cast functions or through PyArray_CastScalarToCtype, PyArray_GetCastFunc, PyArray_CastScalarDirect, where the cast from string -> datetime/timedelta was accidentally fixed to be in line with NumPy...

@eric-wieser added structured string -> structured bool to the test. I am inclined to ignore that annoying point about the datetime, but we could make a release note about it. Or, I can undo more of the commit for backporting purpose just to make a point of not modifying even the most sublte thing in a point release.

@seberg
Copy link
Member Author

seberg commented Apr 27, 2020

Anyway, I am happy to also just revert if we have any doubts for the backport. As for master, I think this is definitely OK.

@charris
Copy link
Member

charris commented Apr 28, 2020

@eric-wieser Ping.

@charris
Copy link
Member

charris commented Apr 29, 2020

@seberg What is the argument for not backporting this?

@seberg
Copy link
Member Author

seberg commented Apr 29, 2020

  1. I am missing some weird corner case
  2. Manual string to datetime casts, in C-extensions will have their behaviour silently fixed.

@seberg
Copy link
Member Author

seberg commented Apr 29, 2020

I am not overly worried about that cast, it seems like a corner case of a corner case of a corner case. But if we want to be safe nothing changes in the point release, I can revert the change more fully and only keep the changes for floating point numbers (and maybe integers).

@charris
Copy link
Member

charris commented Apr 29, 2020

Let's play this conservative and not add anything new for the backport besides the fixes we backported in 1.18.3. I'll be branching 1.9 in about two weeks and would like to get 1.18.4 out before rackspace runs out on us.

@seberg
Copy link
Member Author

seberg commented Apr 29, 2020

Oh those changes are already part of that backport, but I can undo more of the backport wihout affecting the initial bufix,

@charris
Copy link
Member

charris commented Apr 29, 2020

I can undo more of the backport

Let's do that, it is the least risky thing to do.

@charris
Copy link
Member

charris commented Apr 29, 2020

Note that I am only talking about the backport.

@charris
Copy link
Member

charris commented Apr 29, 2020

If the easy thing is to just revert the original backport, I'm open to that also.

@seberg
Copy link
Member Author

seberg commented Apr 29, 2020

No its fine, I will do this now on the 1.18.x branch.

@seberg seberg removed the 09 - Backport-Candidate PRs tagged should be backported label Apr 29, 2020
@seberg
Copy link
Member Author

seberg commented Apr 29, 2020

OK, see gh-16109 for the maximum possible revert approach (while retaining the original bug fix which only affects the behaviour of longdoubles).

@charris charris modified the milestones: 1.18.4 release, 1.19.0 release May 2, 2020
Copy link
Member

@anirudh2290 anirudh2290 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, verified that only STRING_TO_BOOL and UNICODE_TO_BOOL trigger this new code path. If I understand correctly, the final decision is that this goes into 1.19 with a release note ? if so, probably needs a "needs release note" tag.

@charris charris added 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes and removed 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes labels May 6, 2020
@charris
Copy link
Member

charris commented May 12, 2020

@seberg Ping. Does this need a release note?

@seberg
Copy link
Member Author

seberg commented May 12, 2020

Added a release note, not sure if nice. It is a very subtle C-API bug fix in a sense... The datetime thing is the only change I am aware of.

@charris charris merged commit 84a4c4b into numpy:master May 12, 2020
@charris
Copy link
Member

charris commented May 12, 2020

Thanks Sebastian.

@seberg seberg deleted the string-to-bool-regression branch May 12, 2020 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change in bool type coersion?
4 participants