Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG,MAINT: Remove incorrect special case in string to number casts #15766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 18, 2020

Conversation

seberg
Copy link
Member

@seberg seberg commented Mar 16, 2020

The string to number casts fall back to using the scalars and
the type setitem function to do the cast.
However, before calling setitem, they sometimes already called
the Python function for string coercion. This is unnecessary.

Closes gh-15608


For a second I thought that the old code at least is a speed optimization, but it seems to make things slower. So, I did not dig into why the code exists, I assume it is unnecessary, but I may be missing something... And the tests run perfectly without.

The string to number casts fall back to using the scalars and
the type setitem function to do the cast.
However, before calling setitem, they sometimes already called
the Python function for string coercion. This is unnecessary.

Closes numpygh-15608
@seberg
Copy link
Member Author

seberg commented Mar 17, 2020

Hmmm, seems the reverse direction longdouble_arr.astype(str) is failing on some platforms.

@charris
Copy link
Member

charris commented Mar 17, 2020

I think @ahaldane had some functions for handling that. Don't recall if doubledouble was among the supported types, but quad precision was.

@seberg
Copy link
Member Author

seberg commented Mar 17, 2020

Ah, the problem is probably just that NumPy assumes that the string will fit into a length of 32, which is probably only correct for extended precision (or also IEEE quads?) longdoubles.

I am not even sure there is a reasonable maximum string length for double-double numbers though... I could solve the issue here, by hardcoding the correct length, or just disabling that part of the test. Since it is a bit orthogonal from the other issue...

@charris
Copy link
Member

charris commented Mar 17, 2020

Quad precision is about 32 digits for the mantissa, so I expect about 40 characters for exponential format, bit more for __repr__.

@seberg
Copy link
Member Author

seberg commented Mar 17, 2020

I can get to 29 characters repr for extended precision longdoubles (most numbers 28), so maybe we should just up the string length to hold quad correctly. 40 decimals mantissa sounds right, which would give around 46-48 probably (probably 47).
There may be some errors with weird double-doubles, but it seems that at least quad precision should work right.

@seberg
Copy link
Member Author

seberg commented Mar 17, 2020

Trying with 48 length just out of curiousity. I would be happy to do that, but would probably need a release note, and possibly not good for backporting.

@mhvk
Copy link
Contributor

mhvk commented Mar 17, 2020

@seberg - what a lovely finding that just removing code solves a bug! That part looks good (well, at least removing something that I don't comprehend does).
The question about the change in string length is slightly trickier, as it could break now working code that expected 32 character strings for long doubles, which would seem inappropriate for a bug fix. Since this is a bug only on platforms where float128 is actually float128, might it be possible to make this platform specific? Otherwise, my sense would be not to backport that change.

@charris
Copy link
Member

charris commented Mar 17, 2020

I would prefer not to make another 1.18 release unless we uncover a bad regression. That release has been unusually trouble free so far and the 1.19 branch isn't that far off.

@seberg
Copy link
Member Author

seberg commented Mar 17, 2020

In that case no need to worry about backcompat for that PR, I guess. The main question is if 48 is actually a good length then...

@seberg
Copy link
Member Author

seberg commented Mar 17, 2020

Hmmm, I think I misjudged with 48. It seems to me that the length necessary for the 113 mantissa is 35 since:

In [30]: np.log2(10**-(35))                                                                                             
Out[30]: -116.26748332105768
In [31]: np.log2(10**-(34))                                                                                             
Out[31]: -112.94555522617031

with the additional characters -.e adding 3, and the expontent adding 6 (sign+5 digits). Which in total gives 44. Although since currently we use 32 which is much larger also, maybe 48 is fine.

EDIT: Hmmm, but double precision sometimes needs 17 digits, not 16... so I guess it needs +1 in some cases. In any case 48 would be plenty...

@mattip
Copy link
Member

mattip commented Mar 18, 2020

We may have to revisit this when we start to handle double-double on ppc64le, but otherwise LGTM.

@mattip mattip merged commit 20f22d8 into numpy:master Mar 18, 2020
@mhvk
Copy link
Contributor

mhvk commented Mar 18, 2020

Thanks, @seberg - you win the price for fix-by-removal!

@ahaldane
Copy link
Member

Double-double on ppc64 only needs 107 bits (53+54) mantissa, so 48 chars should be enough there too. I think the quad (113-bit) mantissa is the biggest longdouble there is for anything in the foreseeable future, so 48 should be enough per analysis above.

@seberg
Copy link
Member Author

seberg commented Mar 18, 2020

Ah, good @ahaldane I was not sure if numbers such as 1e300 + 1e-300 are valid double-doubles (even if rather useless added precision), or whether ppc64 forces the second double to directly following the previous double in precision...

@seberg seberg deleted the simplify-specialized-casts branch March 18, 2020 15:40
@ahaldane
Copy link
Member

gcc and glibc typically drop bits past the 106th, so we are in good company to assume that too.

There is some debate about whether those non-"normalized" values are valid, and the standard appears to be actively being updated. Eg, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399 among others. gcc and glibc actually also have inconsistent treatments of ppc-double-double: They use different LDBL_MAX last I checked, and glibc sometimes uses the 107th bit in printing routines. It's a little bit bessy!

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Mar 31, 2020
@charris
Copy link
Member

charris commented Mar 31, 2020

Decided to make another release of 1.18 before 1.19, so marked this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loss of precision with np.str_ input to np.longdouble
5 participants