-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG,MAINT: Remove incorrect special case in string to number casts #15766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The string to number casts fall back to using the scalars and the type setitem function to do the cast. However, before calling setitem, they sometimes already called the Python function for string coercion. This is unnecessary. Closes numpygh-15608
Hmmm, seems the reverse direction |
I think @ahaldane had some functions for handling that. Don't recall if doubledouble was among the supported types, but quad precision was. |
Ah, the problem is probably just that NumPy assumes that the string will fit into a length of 32, which is probably only correct for extended precision (or also IEEE quads?) longdoubles. I am not even sure there is a reasonable maximum string length for double-double numbers though... I could solve the issue here, by hardcoding the correct length, or just disabling that part of the test. Since it is a bit orthogonal from the other issue... |
Quad precision is about 32 digits for the mantissa, so I expect about 40 characters for exponential format, bit more for |
I can get to 29 characters repr for extended precision longdoubles (most numbers 28), so maybe we should just up the string length to hold quad correctly. 40 decimals mantissa sounds right, which would give around 46-48 probably (probably 47). |
Trying with 48 length just out of curiousity. I would be happy to do that, but would probably need a release note, and possibly not good for backporting. |
@seberg - what a lovely finding that just removing code solves a bug! That part looks good (well, at least removing something that I don't comprehend does). |
I would prefer not to make another 1.18 release unless we uncover a bad regression. That release has been unusually trouble free so far and the 1.19 branch isn't that far off. |
In that case no need to worry about backcompat for that PR, I guess. The main question is if 48 is actually a good length then... |
Hmmm, I think I misjudged with 48. It seems to me that the length necessary for the 113 mantissa is 35 since:
with the additional characters EDIT: Hmmm, but double precision sometimes needs 17 digits, not 16... so I guess it needs +1 in some cases. In any case 48 would be plenty... |
We may have to revisit this when we start to handle double-double on ppc64le, but otherwise LGTM. |
Thanks, @seberg - you win the price for fix-by-removal! |
Double-double on ppc64 only needs 107 bits (53+54) mantissa, so 48 chars should be enough there too. I think the quad (113-bit) mantissa is the biggest longdouble there is for anything in the foreseeable future, so 48 should be enough per analysis above. |
Ah, good @ahaldane I was not sure if numbers such as |
gcc and glibc typically drop bits past the 106th, so we are in good company to assume that too. There is some debate about whether those non-"normalized" values are valid, and the standard appears to be actively being updated. Eg, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399 among others. gcc and glibc actually also have inconsistent treatments of ppc-double-double: They use different LDBL_MAX last I checked, and glibc sometimes uses the 107th bit in printing routines. It's a little bit bessy! |
Decided to make another release of 1.18 before 1.19, so marked this. |
The string to number casts fall back to using the scalars and
the type setitem function to do the cast.
However, before calling setitem, they sometimes already called
the Python function for string coercion. This is unnecessary.
Closes gh-15608
For a second I thought that the old code at least is a speed optimization, but it seems to make things slower. So, I did not dig into why the code exists, I assume it is unnecessary, but I may be missing something... And the tests run perfectly without.