BUG,MAINT: Remove incorrect special case in string to number casts #15766

seberg · 2020-03-16T23:43:19Z

The string to number casts fall back to using the scalars and
the type setitem function to do the cast.
However, before calling setitem, they sometimes already called
the Python function for string coercion. This is unnecessary.

Closes gh-15608

For a second I thought that the old code at least is a speed optimization, but it seems to make things slower. So, I did not dig into why the code exists, I assume it is unnecessary, but I may be missing something... And the tests run perfectly without.

The string to number casts fall back to using the scalars and the type setitem function to do the cast. However, before calling setitem, they sometimes already called the Python function for string coercion. This is unnecessary. Closes numpygh-15608

seberg · 2020-03-17T00:11:50Z

Hmmm, seems the reverse direction longdouble_arr.astype(str) is failing on some platforms.

charris · 2020-03-17T00:32:50Z

I think @ahaldane had some functions for handling that. Don't recall if doubledouble was among the supported types, but quad precision was.

seberg · 2020-03-17T00:53:15Z

Ah, the problem is probably just that NumPy assumes that the string will fit into a length of 32, which is probably only correct for extended precision (or also IEEE quads?) longdoubles.

I am not even sure there is a reasonable maximum string length for double-double numbers though... I could solve the issue here, by hardcoding the correct length, or just disabling that part of the test. Since it is a bit orthogonal from the other issue...

charris · 2020-03-17T00:58:05Z

Quad precision is about 32 digits for the mantissa, so I expect about 40 characters for exponential format, bit more for __repr__.

seberg · 2020-03-17T01:09:02Z

I can get to 29 characters repr for extended precision longdoubles (most numbers 28), so maybe we should just up the string length to hold quad correctly. 40 decimals mantissa sounds right, which would give around 46-48 probably (probably 47).
There may be some errors with weird double-doubles, but it seems that at least quad precision should work right.

seberg · 2020-03-17T01:17:03Z

Trying with 48 length just out of curiousity. I would be happy to do that, but would probably need a release note, and possibly not good for backporting.

mhvk · 2020-03-17T20:52:44Z

@seberg - what a lovely finding that just removing code solves a bug! That part looks good (well, at least removing something that I don't comprehend does).
The question about the change in string length is slightly trickier, as it could break now working code that expected 32 character strings for long doubles, which would seem inappropriate for a bug fix. Since this is a bug only on platforms where float128 is actually float128, might it be possible to make this platform specific? Otherwise, my sense would be not to backport that change.

charris · 2020-03-17T23:15:44Z

I would prefer not to make another 1.18 release unless we uncover a bad regression. That release has been unusually trouble free so far and the 1.19 branch isn't that far off.

seberg · 2020-03-17T23:18:02Z

In that case no need to worry about backcompat for that PR, I guess. The main question is if 48 is actually a good length then...

seberg · 2020-03-17T23:50:03Z

Hmmm, I think I misjudged with 48. It seems to me that the length necessary for the 113 mantissa is 35 since:

In [30]: np.log2(10**-(35))                                                                                             
Out[30]: -116.26748332105768
In [31]: np.log2(10**-(34))                                                                                             
Out[31]: -112.94555522617031

with the additional characters -.e adding 3, and the expontent adding 6 (sign+5 digits). Which in total gives 44. Although since currently we use 32 which is much larger also, maybe 48 is fine.

EDIT: Hmmm, but double precision sometimes needs 17 digits, not 16... so I guess it needs +1 in some cases. In any case 48 would be plenty...

mattip · 2020-03-18T11:15:20Z

We may have to revisit this when we start to handle double-double on ppc64le, but otherwise LGTM.

mhvk · 2020-03-18T12:57:42Z

Thanks, @seberg - you win the price for fix-by-removal!

ahaldane · 2020-03-18T15:33:55Z

Double-double on ppc64 only needs 107 bits (53+54) mantissa, so 48 chars should be enough there too. I think the quad (113-bit) mantissa is the biggest longdouble there is for anything in the foreseeable future, so 48 should be enough per analysis above.

seberg · 2020-03-18T15:39:50Z

Ah, good @ahaldane I was not sure if numbers such as 1e300 + 1e-300 are valid double-doubles (even if rather useless added precision), or whether ppc64 forces the second double to directly following the previous double in precision...

ahaldane · 2020-03-18T17:14:20Z

gcc and glibc typically drop bits past the 106th, so we are in good company to assume that too.

There is some debate about whether those non-"normalized" values are valid, and the standard appears to be actively being updated. Eg, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399 among others. gcc and glibc actually also have inconsistent treatments of ppc-double-double: They use different LDBL_MAX last I checked, and glibc sometimes uses the 107th bit in printing routines. It's a little bit bessy!

charris · 2020-03-31T19:32:02Z

Decided to make another release of 1.18 before 1.19, so marked this.

seberg added 00 - Bug component: numpy._core labels Mar 16, 2020

BUG: Increase default string cast size of longdouble/clongdouble

e041cdc

seberg mentioned this pull request Mar 17, 2020

Loss of precision with np.str_ input to np.longdouble #15608

Closed

mattip merged commit 20f22d8 into numpy:master Mar 18, 2020

seberg deleted the simplify-specialized-casts branch March 18, 2020 15:40

charris added the 09 - Backport-Candidate PRs tagged should be backported label Mar 31, 2020

charris mentioned this pull request Apr 7, 2020

BUG,MAINT: Remove incorrect special case in string to number casts #15929

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Apr 7, 2020

eric-wieser mentioned this pull request Apr 20, 2020

Change in bool type coersion? #16023

Closed

Uh oh!

BUG,MAINT: Remove incorrect special case in string to number casts #15766

BUG,MAINT: Remove incorrect special case in string to number casts #15766

Uh oh!

Conversation

seberg commented Mar 16, 2020

Uh oh!

seberg commented Mar 17, 2020

Uh oh!

charris commented Mar 17, 2020

Uh oh!

seberg commented Mar 17, 2020

Uh oh!

charris commented Mar 17, 2020

Uh oh!

seberg commented Mar 17, 2020

Uh oh!

seberg commented Mar 17, 2020

Uh oh!

mhvk commented Mar 17, 2020

Uh oh!

charris commented Mar 17, 2020

Uh oh!

seberg commented Mar 17, 2020

Uh oh!

seberg commented Mar 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Mar 18, 2020

Uh oh!

mhvk commented Mar 18, 2020

Uh oh!

ahaldane commented Mar 18, 2020

Uh oh!

seberg commented Mar 18, 2020

Uh oh!

ahaldane commented Mar 18, 2020

Uh oh!

charris commented Mar 31, 2020

Uh oh!

Uh oh!

seberg commented Mar 17, 2020 •

edited

Loading