BUG 4381 Longdouble from string without precision loss #6199

aarchiba · 2015-08-12T16:17:29Z

Fixes #4381 using sscanf. Also adds tests for #2376 and related loss of precision in string conversions.

jaimefrio · 2015-08-12T16:33:51Z

Python 3 is not your friend: it's complaining about a mismatch in the 19th decimal!

njsmith · 2015-08-12T16:49:00Z

Awesome, and looks good to me modulo test failures...

juliantaylor · 2015-08-12T18:18:06Z

probably won't work on windows, though there should be some other function with a different name which does the same.

juliantaylor · 2015-08-12T18:23:35Z

Oh no I confused this with strtoll which we fixed a while ago. Still should be tested on windows.

aarchiba · 2015-08-12T23:36:48Z

I don't have access to a Windows machine. With MSVC I think we have double==long double, which means this just has to not break anything. I know even less about Intel and other compilers on Windows. But shouldn't Travis check those?

njsmith · 2015-08-12T23:42:52Z

Travis is Linux only, unfortunately. There are other services like appveyor
that could be used in principle, but no one has done the work to set them
up for numpy yet, so in practice we mostly ignore windows ourselves and
then heroic people show up with Windows patches before each release.
.
In principle windows could do something horrible like have decided to error
out on long double specifiers in scanf (because who needs them, eh?), but I
don't think this fear should block merging this -- it's not really any more
risky than most PRs, and if it does happen then the fix would be easy.
On Aug 12, 2015 4:36 PM, "Anne Archibald" [email protected] wrote:

I don't have access to a Windows machine. With MSVC I think we have
double==long double, which means this just has to not break anything. I
know even less about Intel and other compilers on Windows. But shouldn't
Travis check those?

—
Reply to this email directly or view it on GitHub
#6199 (comment).

aarchiba · 2015-08-13T15:54:33Z

See the discussion on bug #4381 for details, but locales are going to be a problem for this code. It is unclear how to work around the problem.

njsmith · 2015-08-13T19:06:59Z

Ugh, right, sscanf is also locale sensitive and everything is terrible.
.
I guess we might be able to get away with using sscanf_l these days?
Someone would have to double check that it's on whatever the oldest
supported osx version is, and also centos5, because I think that's
effectively the oldest Linux we support. But it might well be on both by
now; it's been around a while.
.
Also, we should probably add a test to Travis checking that the test suite
passes even in some locale with as many weird settings as possible, because
there's a lot of room here for bugs that silently give the wrong answer in
a way that we devs might never notice until it eats someone's data. (See:
how close we came to merging this PR.)
On Aug 13, 2015 8:54 AM, "Anne Archibald" [email protected] wrote:

See the discussion on bug #4381
#4381 for details, but locales are
going to be a problem for this code. It is unclear how to work around the
problem.

—
Reply to this email directly or view it on GitHub
#6199 (comment).

aarchiba · 2015-08-16T00:10:42Z

The _l functions would solve the problem, if present; that would probably imply strtold_l as well.

Fortunately foreign locale I/O is tested, for floats, in test_print.py and test_multiarray.py.

njsmith · 2015-08-16T07:44:55Z

My concern is that it sounds like the changes in this PR were broken in foreign locales, but we didn't notice because whatever tests we have aren't enough to catch it... maybe this just means extending those tests to make sure they cover all the different float widths?

aarchiba · 2015-08-16T07:59:24Z

That's just a shortcoming of this PR. All the other float widths go through Python's float object, so there is no need to test them separately. So did long doubles, which is why they lose precision upon conversion. The only thing to check is str/repr of long doubles, because that is supposed to retain full precision and therefore must have its own conversion code. %-printing, format(), and conversion from strings potentially suffer the same problem, but at the moment they go through floats.

aarchiba · 2015-08-16T08:28:25Z

The _l functions appeared in Darwin 8.0, FreeBSD 9.1, and were added to glibc in 1997, as far as I can tell. I would be astonished if they were missing from any Linux system people were still using. Windows I don't know about.

njsmith · 2015-08-16T09:00:03Z

Sweet. Windows we can live without (possibly with some annoying #ifdef)
given its lack of longdouble.

On Sun, Aug 16, 2015 at 1:28 AM, Anne Archibald [email protected]
wrote:

The _l functions appeared in Darwin 8.0, FreeBSD 9.1, and were added to
glibc in 1997, as far as I can tell. I would be astonished if they were
missing from any Linux system people were still using. Windows I don't know
about.

—
Reply to this email directly or view it on GitHub
#6199 (comment).

Nathaniel J. Smith -- http://vorpus.org

aarchiba · 2015-08-17T18:47:57Z

Okay, mostly what is needed now is to correctly detect whether strtold_l is available. I have all the ifdefs in place but I currently just unconditionally define it. Presumably it should be defined, if appropriate, in numpy/config.h, but I don't know how that is made.

aarchiba · 2015-08-17T18:53:22Z

numpy/core/src/multiarray/arraytypes.c.src

+     * handled too, because python accepts bytes as input to
+     * float() (why?).
+     */
+#if PY_MAJOR_VERSION >= 3


I have no experience with the python C API so I couldn't figure out how to write version-agnostic string code. But I think an ifdef is probably necessary to handle both byte and string objects in python 3.

Look in numpy/core/include/numpy/py3_compat.h for string routines. The ones available are the union of the py3k and py2k functions. My guess you will need to check for unicode and convert it to a byte string, then use PyString_AS_STRING to get the c ascii version.

The detection of strtold_l would be part of the configuration, which you can find scattered about in numpy/core/setup_common.py, numpy/core/setup.py and numpy/core/bscript. I haven't checked the check function to see if they will work with those particular functions. It might be easier to just copy the functions and make a library, but it looks like even dtoa.c has platform dependent bits.

EDIT: You can grep for the functions in numpy/core/include/numpy/py3_compat.h to find examples of their use.

aarchiba · 2015-08-18T15:05:26Z

I'm not sure how to debug the bento build, since I haven't had any luck getting bento to work on my machine. It looks like bento is detecting strtold_l but not #defining the correct macro, but I can't see inside the generated config.h.

charris · 2015-08-18T16:24:57Z

Yeah, bento makes things difficult. Current waf crashes bento on my setup (Fedora 22) and can't find atlas in any case. @rgommers Thoughts? I'd like to take another shot at removing bento from numpy, it doesn't buy much for speed, doesn't support all optimizations, and AFAIK, you are the only person who wants to keep it.

rgommers · 2015-08-18T18:40:25Z

Yeah, bento makes things difficult. Current waf crashes bento on my setup (Fedora 22) and can't find atlas in any case. @rgommers Thoughts? I'd like to take another shot at removing bento from numpy, it doesn't buy much for speed, doesn't support all optimizations, and AFAIK, you are the only person who wants to keep it.

@cournape wants to keep it as well, but given that he's put Bento development on hold, I'm OK with this. I think it's a shame that we're going back to a single build system that no one likes, believes in as a future-proof solution or even wants to maintain - but I don't really have the time and energy required to improve that situation. So go ahead.

pv · 2015-08-18T19:04:04Z

Also, are the one-file builds still really necessary? That's also something that could be scrapped...

njsmith · 2015-08-18T19:30:48Z

IIRC the one file builds are right now the only way to make cpython+numpy
static binaries. (Or part of the only way, I assume this requires
additional hacks as well.) There are other options that probably would work
(e.g. using objcopy,
http://mail.scipy.org/pipermail/numpy-discussion/2012-July/063120.html),
but I don't think anyone's tried. Probably because no one who statically
links numpy ever talks to us about it, so it's hard to know what matters or
how to support them...
On Aug 18, 2015 12:04 PM, "Pauli Virtanen" [email protected] wrote:

Also, are the one-file builds still really necessary? That's also
something that could be scrapped...

—
Reply to this email directly or view it on GitHub
#6199 (comment).

aarchiba · 2015-08-18T19:42:00Z

Don't kill bento support on my account. If it comes to that I can submit debugging code to Travis.

charris · 2015-08-18T19:56:39Z

@aarchiba It's not you, it's me ;) The topic comes up regularly because we all run into the same difficulty supporting two build systems. If David were still working regularly to maintain and improve Bento there would be no problem, but he is out earning a living and having a life. While I agree with Ralf that a better system is desirable, in practice is more difficult to maintain the Bento build than the distutils version, and Bento has not fully supported Numpy for a while.

@rgommers If we decide at some point that there is a better build system I'll happily support it. @mwiebe chose CMake as the least evil alternative, but he doesn't like it. @eric-s-raymond likes scons, but David had to work hard, including upstream fixes, to get it working with numpy. It's an unfortunate situation and I don't know what the future holds, but I think we can keep distutils limping along for a few more years.

charris · 2015-08-18T20:16:01Z

I've posted to the list a proposal to remove Bento support.

njsmith · 2015-08-18T20:49:35Z

All build systems are terrible; it's like choosing between thumbscrews and the rack. The sooner one comes to terms with this reality the happier one will be.

I think the key enabler for better python build systems will be when it becomes possible to tell pip 'here's a script that when run will spit out a wheel, treat it as a black box otherwise'. Then one can stop worrying about distutils entirely, let pip worry about actually installing, etc., and just write a script/makefile/whatever to do this one well defined thing.

rgommers · 2015-08-18T21:04:02Z

in practice is more difficult to maintain the Bento build than the distutils version

Since this thread may be referenced in the future a correction: this isn't actually the case. The only reason you see it that way is that numpy.distutils is much more heavily used and therefore gets more regular fixes. But it's harder to fix something in numpy.distutils, due to worse architecture. And distutils patches not getting merged into Python makes that even worse (recent example: http://bugs.python.org/issue16296).

charris · 2015-08-18T21:40:14Z

numpy/core/tests/test_longdouble.py

+
+import sys
+import locale
+import nose


Looks unused.

charris · 2015-08-27T16:48:19Z

Looks about ready. Could you squash the commits? Note that github doesn't notify on branch updates, so it helps to make a comment.

What was the fix for the failing test?

aarchiba · 2015-08-28T06:51:27Z

Squash down to one, or to a few meaningful ones?

The failing test was testing tofile's behaviour with a specific separator but looking for a specific string representation (3.51 -> '3.51'). Python 2.6 didn't repr 3.51 as '3.51' but as 3.5099...98. So for that test I checked the commas and that the numbers round-tripped correctly. The next test tests what happens with a specific number of digits; that one I didn't need to change.

aarchiba · 2015-08-28T09:19:06Z

I would like to make "{0:.40f}".format(o) preserve the accuracy (no way to fix %, array printing already does) but that requires a conversion from format() strings to %-sequences, which will be a decidedly nontrivial function. So I'll make it a separate PR.

charris · 2015-08-28T12:48:07Z

Something like 1-3 commits, use your judgment. There are no rules for that.

Avoid going through python floats when converting string to longdouble. This makes it dramatically easier to produce full-precision long double numbers. Fixed are the constructor (np.longdouble("1.01")), np.fromfile, np.fromstring, np.loadtxt, and np.genfromtxt (and functions based on it). Also fixed is precision loss when using np.tofile. This also fixes numpy#1481, poor handling of bad data in fromfile and fromstring. If the function strtod_l is not available, almost none of this will work, and many tests will fail.

aarchiba · 2015-08-28T13:53:34Z

Squashed down to one commit.

charris · 2015-08-28T16:49:52Z

If the function strtod_l is not available, almost none of this will work, and many tests will fail.

That is strtold_l, right? Hmm, we need to figure out a way around this. Is there a simple failing test that would serve to indicate the presence of strtold_l that could be used to skip failing tests?

charris · 2015-08-28T17:46:10Z

If there is no easy check, I'm thinking we really ought to have the defined macros somewhere accessible, say in numpy.config, but that is another project. If that looks like the proper solution this can go in and we can fix it up later.

aarchiba · 2015-08-28T18:07:41Z

A quick hack checks whether the constructor loses precision lets me set a variable. But yes, it would be nice if the results of config.h also got made visible at the python level.

BUG 4381 Longdouble from string without precision loss

charris · 2015-08-28T18:30:25Z

OK, in it goes. Thanks Anne. I spent so much time nitpicking that I was beginning to feel bad, thanks for your patience ;)

pv · 2015-08-28T18:51:02Z

numpy/core/src/multiarray/numpyos.c

+     * decimal point is encountered. The only case I can think of
+     * where this would matter is if the decimal separator were
+     * something that could occur in the middle of a valid number,
+     * in which case this function does the wrong thing.


Under Python 2.6, for input "1,2345" under "fi_FI.UTF-8" locale, PyOS_ascii_strtod refuses to parse the string at all, and returns an error instead of "1" which was the correct best-efforts result.

If this was not clear from the comment below, it perhaps should be updated.

pv · 2015-08-28T19:26:46Z

The PyOS_ascii_strod issue seems to affect the long double parser:

>>> import sys
>>> sys.version_info
(2, 6, 8, 'final', 0)
>>> import numpy as np
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'fi_FI.UTF-8')
'fi_FI.UTF-8'
>>> np.fromstring('1,2345', sep=' ', dtype=np.longdouble)
array([], dtype=float128)

pv · 2015-08-28T19:30:29Z

Sorry, scratch that, I had some wrong version compiled. It does work,

np.fromstring('1,2345', sep=' ', dtype=np.double)
array([ 1.])

EDIT although the code path used when strold_l is not available might not be safe.

aarchiba · 2015-08-28T20:00:43Z

@charris well, I was getting embarrassed about all the stupid mistakes. Thanks for the attention, I'm getting my fingers retrained to numpy's coding style.

charris · 2015-08-28T20:16:23Z

@pv Looks like edits don't trigger github notifications either. What do you think needs to be done for the strtold_l not available case?

aarchiba · 2015-08-28T20:16:46Z

@pv I tried where possible to fall back to existing code. I have written basically that test in #6264 and can confirm that it works if I disable the strtod_l path. Also only the basic precision-loss test fails; the rest are knownfail.

pv · 2015-08-28T20:46:35Z

@aarchiba: did you try it on Python 2.6 (the only version affected, note you need the french locale available) --- I get test failures on that if I disable the strtold_l path? The easiest fix would be to call NumPyOS_ascii_strtod instead of NumPyOS_ascii_strtod_plain in the fallback path.

Without the strtold_l in addition to test_longdouble.test_fromstring_foreign_sep and test_longdouble.test_fromstring_foreign_value failures I also get

======================================================================
FAIL: test_longdouble.test_repr_roundtrip
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../numpy/core/tests/test_longdouble.py", line 36, in test_repr_roundtrip
    "repr was %s" % repr(o))
  File ".../numpy/testing/utils.py", line 354, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal: repr was 1.0000000000000000001
 ACTUAL: 1.0
 DESIRED: 1.0000000000000000001

aarchiba · 2015-08-28T22:50:37Z

The test_repr_roundtrip failure is intentional: users who lose precision should be informed about it. But all the other tests are switched off, so they're not swamped with test failures. I could turn this one off too, but then people without strtold_l would have no warning about a silent loss of precision that doesn't apply to other, more normal platforms. And they're presumably using long doubles because they know they need the extra precision. The important question is: how common is a missing strtold_l going to be? It's been around a long time.

njsmith · 2015-08-28T23:43:39Z

The simplest approach would just be to impose a hard dependency on
strtold_l, and deal with the fallout if/when someone finds a platform where
this causes compilation to fail...

On Fri, Aug 28, 2015 at 3:50 PM, Anne Archibald [email protected]
wrote:

The test_repr_roundtrip failure is intentional: users who lose precision
should be informed about it. But all the other tests are switched off, so
they're not swamped with test failures. I could turn this one off too, but
then people without strtold_l would have no warning about a silent loss of
precision that doesn't apply to other, more normal platforms. And they're
presumably using long doubles because they know they need the extra
precision. The important question is: how common is a missing strtold_l
going to be? It's been around a long time.

—
Reply to this email directly or view it on GitHub
#6199 (comment).

Nathaniel J. Smith -- http://vorpus.org

jaimefrio · 2015-08-29T05:24:28Z

I'm getting the exact same failure as @pv on master using OS X Yosemite. Should I just get used to seeing it because it is a legit failure on this platform? I'm guessing a sizeable amount of development happens on OS X, so this may produce a lot of noise.

aarchiba · 2015-08-29T08:14:38Z

@jaimefrio that is obviously not okay. But Yosemite should have strtold_l. I have noticed that config.h is not always regenerated; could you try wiping your build directory and building again? If you still get test failures please file a bug report.

jaimefrio · 2015-08-29T14:32:54Z

Ah, much better now! ;-) A clean build did the trick for me. Thanks!

aarchiba reviewed Aug 17, 2015
View reviewed changes

aarchiba changed the title ~~BUG 4381 Longdouble scanf~~ BUG 4381 Longdouble from string without precision loss Aug 17, 2015

charris reviewed Aug 18, 2015
View reviewed changes

numpy/core/tests/test_longdouble.py

import sys

import locale

import nose

Copy link

Member

charris Aug 18, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks unused.

aarchiba force-pushed the longdouble_scanf branch from bb48c3b to 875b2dd Compare August 28, 2015 13:38

aarchiba force-pushed the longdouble_scanf branch from 875b2dd to 6cbd724 Compare August 28, 2015 13:52

MAINT clean up test suite

815fee6

charris added a commit that referenced this pull request Aug 28, 2015

Merge pull request #6199 from aarchiba/longdouble_scanf

5d6a9f0

BUG 4381 Longdouble from string without precision loss

charris merged commit 5d6a9f0 into numpy:master Aug 28, 2015

pv reviewed Aug 28, 2015
View reviewed changes

pv mentioned this pull request Aug 28, 2015

MAINT: clarify unclear comment in NumPyOS_ascii_strtod #6263

Merged

charris mentioned this pull request Feb 19, 2020

Loss of precision with np.str_ input to np.longdouble #15608

Closed

seberg mentioned this pull request Oct 18, 2022

ENH: Update scalar representations as per NEP 51 #22449

Merged

4 tasks

Uh oh!

BUG 4381 Longdouble from string without precision loss #6199

BUG 4381 Longdouble from string without precision loss #6199

Uh oh!

Conversation

aarchiba commented Aug 12, 2015

Uh oh!

jaimefrio commented Aug 12, 2015

Uh oh!

njsmith commented Aug 12, 2015

Uh oh!

juliantaylor commented Aug 12, 2015

Uh oh!

juliantaylor commented Aug 12, 2015

Uh oh!

aarchiba commented Aug 12, 2015

Uh oh!

njsmith commented Aug 12, 2015

Uh oh!

aarchiba commented Aug 13, 2015

Uh oh!

njsmith commented Aug 13, 2015

Uh oh!

aarchiba commented Aug 16, 2015

Uh oh!

njsmith commented Aug 16, 2015

Uh oh!

aarchiba commented Aug 16, 2015

Uh oh!

aarchiba commented Aug 16, 2015

Uh oh!

njsmith commented Aug 16, 2015

Uh oh!

aarchiba commented Aug 17, 2015

Uh oh!

aarchiba Aug 17, 2015

Choose a reason for hiding this comment

Uh oh!

charris Aug 17, 2015

Choose a reason for hiding this comment

Uh oh!

aarchiba commented Aug 18, 2015

Uh oh!

charris commented Aug 18, 2015

Uh oh!

rgommers commented Aug 18, 2015

Uh oh!

pv commented Aug 18, 2015 via email

Uh oh!

njsmith commented Aug 18, 2015

Uh oh!

aarchiba commented Aug 18, 2015

Uh oh!

charris commented Aug 18, 2015

Uh oh!

charris commented Aug 18, 2015

Uh oh!

njsmith commented Aug 18, 2015

Uh oh!

rgommers commented Aug 18, 2015

Uh oh!

charris Aug 18, 2015

Choose a reason for hiding this comment

Uh oh!

charris commented Aug 27, 2015

Uh oh!

aarchiba commented Aug 28, 2015

Uh oh!

aarchiba commented Aug 28, 2015

Uh oh!

charris commented Aug 28, 2015

Uh oh!

aarchiba commented Aug 28, 2015

Uh oh!

charris commented Aug 28, 2015

Uh oh!

charris commented Aug 28, 2015

Uh oh!

aarchiba commented Aug 28, 2015

Uh oh!