Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG 4381 Longdouble from string without precision loss #6199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 28, 2015

Conversation

aarchiba
Copy link
Contributor

Fixes #4381 using sscanf. Also adds tests for #2376 and related loss of precision in string conversions.

@jaimefrio
Copy link
Member

Python 3 is not your friend: it's complaining about a mismatch in the 19th decimal!

@njsmith
Copy link
Member

njsmith commented Aug 12, 2015

Awesome, and looks good to me modulo test failures...

@juliantaylor
Copy link
Contributor

probably won't work on windows, though there should be some other function with a different name which does the same.

@juliantaylor
Copy link
Contributor

Oh no I confused this with strtoll which we fixed a while ago. Still should be tested on windows.

@aarchiba
Copy link
Contributor Author

I don't have access to a Windows machine. With MSVC I think we have double==long double, which means this just has to not break anything. I know even less about Intel and other compilers on Windows. But shouldn't Travis check those?

@njsmith
Copy link
Member

njsmith commented Aug 12, 2015

Travis is Linux only, unfortunately. There are other services like appveyor
that could be used in principle, but no one has done the work to set them
up for numpy yet, so in practice we mostly ignore windows ourselves and
then heroic people show up with Windows patches before each release.
.
In principle windows could do something horrible like have decided to error
out on long double specifiers in scanf (because who needs them, eh?), but I
don't think this fear should block merging this -- it's not really any more
risky than most PRs, and if it does happen then the fix would be easy.
On Aug 12, 2015 4:36 PM, "Anne Archibald" [email protected] wrote:

I don't have access to a Windows machine. With MSVC I think we have
double==long double, which means this just has to not break anything. I
know even less about Intel and other compilers on Windows. But shouldn't
Travis check those?


Reply to this email directly or view it on GitHub
#6199 (comment).

@aarchiba
Copy link
Contributor Author

See the discussion on bug #4381 for details, but locales are going to be a problem for this code. It is unclear how to work around the problem.

@njsmith
Copy link
Member

njsmith commented Aug 13, 2015

Ugh, right, sscanf is also locale sensitive and everything is terrible.
.
I guess we might be able to get away with using sscanf_l these days?
Someone would have to double check that it's on whatever the oldest
supported osx version is, and also centos5, because I think that's
effectively the oldest Linux we support. But it might well be on both by
now; it's been around a while.
.
Also, we should probably add a test to Travis checking that the test suite
passes even in some locale with as many weird settings as possible, because
there's a lot of room here for bugs that silently give the wrong answer in
a way that we devs might never notice until it eats someone's data. (See:
how close we came to merging this PR.)
On Aug 13, 2015 8:54 AM, "Anne Archibald" [email protected] wrote:

See the discussion on bug #4381
#4381 for details, but locales are
going to be a problem for this code. It is unclear how to work around the
problem.


Reply to this email directly or view it on GitHub
#6199 (comment).

@aarchiba
Copy link
Contributor Author

The _l functions would solve the problem, if present; that would probably imply strtold_l as well.

Fortunately foreign locale I/O is tested, for floats, in test_print.py and test_multiarray.py.

@njsmith
Copy link
Member

njsmith commented Aug 16, 2015

My concern is that it sounds like the changes in this PR were broken in foreign locales, but we didn't notice because whatever tests we have aren't enough to catch it... maybe this just means extending those tests to make sure they cover all the different float widths?

@aarchiba
Copy link
Contributor Author

That's just a shortcoming of this PR. All the other float widths go through Python's float object, so there is no need to test them separately. So did long doubles, which is why they lose precision upon conversion. The only thing to check is str/repr of long doubles, because that is supposed to retain full precision and therefore must have its own conversion code. %-printing, format(), and conversion from strings potentially suffer the same problem, but at the moment they go through floats.

@aarchiba
Copy link
Contributor Author

The _l functions appeared in Darwin 8.0, FreeBSD 9.1, and were added to glibc in 1997, as far as I can tell. I would be astonished if they were missing from any Linux system people were still using. Windows I don't know about.

@njsmith
Copy link
Member

njsmith commented Aug 16, 2015

Sweet. Windows we can live without (possibly with some annoying #ifdef)
given its lack of longdouble.

On Sun, Aug 16, 2015 at 1:28 AM, Anne Archibald [email protected]
wrote:

The _l functions appeared in Darwin 8.0, FreeBSD 9.1, and were added to
glibc in 1997, as far as I can tell. I would be astonished if they were
missing from any Linux system people were still using. Windows I don't know
about.


Reply to this email directly or view it on GitHub
#6199 (comment).

Nathaniel J. Smith -- http://vorpus.org

@aarchiba
Copy link
Contributor Author

Okay, mostly what is needed now is to correctly detect whether strtold_l is available. I have all the ifdefs in place but I currently just unconditionally define it. Presumably it should be defined, if appropriate, in numpy/config.h, but I don't know how that is made.

* handled too, because python accepts bytes as input to
* float() (why?).
*/
#if PY_MAJOR_VERSION >= 3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no experience with the python C API so I couldn't figure out how to write version-agnostic string code. But I think an ifdef is probably necessary to handle both byte and string objects in python 3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look in numpy/core/include/numpy/py3_compat.h for string routines. The ones available are the union of the py3k and py2k functions. My guess you will need to check for unicode and convert it to a byte string, then use PyString_AS_STRING to get the c ascii version.

The detection of strtold_l would be part of the configuration, which you can find scattered about in numpy/core/setup_common.py, numpy/core/setup.py and numpy/core/bscript. I haven't checked the check function to see if they will work with those particular functions. It might be easier to just copy the functions and make a library, but it looks like even dtoa.c has platform dependent bits.

EDIT: You can grep for the functions in numpy/core/include/numpy/py3_compat.h to find examples of their use.

@aarchiba aarchiba changed the title BUG 4381 Longdouble scanf BUG 4381 Longdouble from string without precision loss Aug 17, 2015
@aarchiba
Copy link
Contributor Author

I'm not sure how to debug the bento build, since I haven't had any luck getting bento to work on my machine. It looks like bento is detecting strtold_l but not #defining the correct macro, but I can't see inside the generated config.h.

@charris
Copy link
Member

charris commented Aug 18, 2015

Yeah, bento makes things difficult. Current waf crashes bento on my setup (Fedora 22) and can't find atlas in any case. @rgommers Thoughts? I'd like to take another shot at removing bento from numpy, it doesn't buy much for speed, doesn't support all optimizations, and AFAIK, you are the only person who wants to keep it.

@rgommers
Copy link
Member

Yeah, bento makes things difficult. Current waf crashes bento on my setup (Fedora 22) and can't find atlas in any case. @rgommers Thoughts? I'd like to take another shot at removing bento from numpy, it doesn't buy much for speed, doesn't support all optimizations, and AFAIK, you are the only person who wants to keep it.

@cournape wants to keep it as well, but given that he's put Bento development on hold, I'm OK with this. I think it's a shame that we're going back to a single build system that no one likes, believes in as a future-proof solution or even wants to maintain - but I don't really have the time and energy required to improve that situation. So go ahead.

@pv
Copy link
Member

pv commented Aug 18, 2015 via email

@njsmith
Copy link
Member

njsmith commented Aug 18, 2015

IIRC the one file builds are right now the only way to make cpython+numpy
static binaries. (Or part of the only way, I assume this requires
additional hacks as well.) There are other options that probably would work
(e.g. using objcopy,
http://mail.scipy.org/pipermail/numpy-discussion/2012-July/063120.html),
but I don't think anyone's tried. Probably because no one who statically
links numpy ever talks to us about it, so it's hard to know what matters or
how to support them...
On Aug 18, 2015 12:04 PM, "Pauli Virtanen" [email protected] wrote:

Also, are the one-file builds still really necessary? That's also
something that could be scrapped...


Reply to this email directly or view it on GitHub
#6199 (comment).

@aarchiba
Copy link
Contributor Author

Don't kill bento support on my account. If it comes to that I can submit debugging code to Travis.

@charris
Copy link
Member

charris commented Aug 18, 2015

@aarchiba It's not you, it's me ;) The topic comes up regularly because we all run into the same difficulty supporting two build systems. If David were still working regularly to maintain and improve Bento there would be no problem, but he is out earning a living and having a life. While I agree with Ralf that a better system is desirable, in practice is more difficult to maintain the Bento build than the distutils version, and Bento has not fully supported Numpy for a while.

@rgommers If we decide at some point that there is a better build system I'll happily support it. @mwiebe chose CMake as the least evil alternative, but he doesn't like it. @eric-s-raymond likes scons, but David had to work hard, including upstream fixes, to get it working with numpy. It's an unfortunate situation and I don't know what the future holds, but I think we can keep distutils limping along for a few more years.

@charris
Copy link
Member

charris commented Aug 18, 2015

I've posted to the list a proposal to remove Bento support.

@njsmith
Copy link
Member

njsmith commented Aug 18, 2015

All build systems are terrible; it's like choosing between thumbscrews and the rack. The sooner one comes to terms with this reality the happier one will be.

I think the key enabler for better python build systems will be when it becomes possible to tell pip 'here's a script that when run will spit out a wheel, treat it as a black box otherwise'. Then one can stop worrying about distutils entirely, let pip worry about actually installing, etc., and just write a script/makefile/whatever to do this one well defined thing.

@rgommers
Copy link
Member

in practice is more difficult to maintain the Bento build than the distutils version

Since this thread may be referenced in the future a correction: this isn't actually the case. The only reason you see it that way is that numpy.distutils is much more heavily used and therefore gets more regular fixes. But it's harder to fix something in numpy.distutils, due to worse architecture. And distutils patches not getting merged into Python makes that even worse (recent example: http://bugs.python.org/issue16296).


import sys
import locale
import nose
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks unused.

@charris
Copy link
Member

charris commented Aug 27, 2015

Looks about ready. Could you squash the commits? Note that github doesn't notify on branch updates, so it helps to make a comment.

What was the fix for the failing test?

@aarchiba
Copy link
Contributor Author

Squash down to one, or to a few meaningful ones?

The failing test was testing tofile's behaviour with a specific separator but looking for a specific string representation (3.51 -> '3.51'). Python 2.6 didn't repr 3.51 as '3.51' but as 3.5099...98. So for that test I checked the commas and that the numbers round-tripped correctly. The next test tests what happens with a specific number of digits; that one I didn't need to change.

@aarchiba
Copy link
Contributor Author

I would like to make "{0:.40f}".format(o) preserve the accuracy (no way to fix %, array printing already does) but that requires a conversion from format() strings to %-sequences, which will be a decidedly nontrivial function. So I'll make it a separate PR.

@charris
Copy link
Member

charris commented Aug 28, 2015

Something like 1-3 commits, use your judgment. There are no rules for that.

Avoid going through python floats when converting string to
longdouble. This makes it dramatically easier to produce
full-precision long double numbers. Fixed are the constructor
(np.longdouble("1.01")), np.fromfile, np.fromstring, np.loadtxt,
and np.genfromtxt (and functions based on it). Also fixed is
precision loss when using np.tofile.

This also fixes numpy#1481, poor handling of bad data in fromfile
and fromstring.

If the function strtod_l is not available, almost none of this
will work, and many tests will fail.
@aarchiba
Copy link
Contributor Author

Squashed down to one commit.

@charris
Copy link
Member

charris commented Aug 28, 2015

If the function strtod_l is not available, almost none of this will work, and many tests will fail.

That is strtold_l, right? Hmm, we need to figure out a way around this. Is there a simple failing test that would serve to indicate the presence of strtold_l that could be used to skip failing tests?

@charris
Copy link
Member

charris commented Aug 28, 2015

If there is no easy check, I'm thinking we really ought to have the defined macros somewhere accessible, say in numpy.config, but that is another project. If that looks like the proper solution this can go in and we can fix it up later.

@aarchiba
Copy link
Contributor Author

A quick hack checks whether the constructor loses precision lets me set a variable. But yes, it would be nice if the results of config.h also got made visible at the python level.

charris added a commit that referenced this pull request Aug 28, 2015
BUG 4381 Longdouble from string without precision loss
@charris charris merged commit 5d6a9f0 into numpy:master Aug 28, 2015
@charris
Copy link
Member

charris commented Aug 28, 2015

OK, in it goes. Thanks Anne. I spent so much time nitpicking that I was beginning to feel bad, thanks for your patience ;)

* decimal point is encountered. The only case I can think of
* where this would matter is if the decimal separator were
* something that could occur in the middle of a valid number,
* in which case this function does the wrong thing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under Python 2.6, for input "1,2345" under "fi_FI.UTF-8" locale, PyOS_ascii_strtod refuses to parse the string at all, and returns an error instead of "1" which was the correct best-efforts result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was not clear from the comment below, it perhaps should be updated.

@pv
Copy link
Member

pv commented Aug 28, 2015

The PyOS_ascii_strod issue seems to affect the long double parser:

>>> import sys
>>> sys.version_info
(2, 6, 8, 'final', 0)
>>> import numpy as np
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'fi_FI.UTF-8')
'fi_FI.UTF-8'
>>> np.fromstring('1,2345', sep=' ', dtype=np.longdouble)
array([], dtype=float128)

@pv
Copy link
Member

pv commented Aug 28, 2015

Sorry, scratch that, I had some wrong version compiled. It does work,

np.fromstring('1,2345', sep=' ', dtype=np.double)
array([ 1.])

EDIT although the code path used when strold_l is not available might not be safe.

@aarchiba
Copy link
Contributor Author

@charris well, I was getting embarrassed about all the stupid mistakes. Thanks for the attention, I'm getting my fingers retrained to numpy's coding style.

@charris
Copy link
Member

charris commented Aug 28, 2015

@pv Looks like edits don't trigger github notifications either. What do you think needs to be done for the strtold_l not available case?

@aarchiba
Copy link
Contributor Author

@pv I tried where possible to fall back to existing code. I have written basically that test in #6264 and can confirm that it works if I disable the strtod_l path. Also only the basic precision-loss test fails; the rest are knownfail.

@pv
Copy link
Member

pv commented Aug 28, 2015

@aarchiba: did you try it on Python 2.6 (the only version affected, note you need the french locale available) --- I get test failures on that if I disable the strtold_l path? The easiest fix would be to call NumPyOS_ascii_strtod instead of NumPyOS_ascii_strtod_plain in the fallback path.

Without the strtold_l in addition to test_longdouble.test_fromstring_foreign_sep and test_longdouble.test_fromstring_foreign_value failures I also get

======================================================================
FAIL: test_longdouble.test_repr_roundtrip
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../numpy/core/tests/test_longdouble.py", line 36, in test_repr_roundtrip
    "repr was %s" % repr(o))
  File ".../numpy/testing/utils.py", line 354, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal: repr was 1.0000000000000000001
 ACTUAL: 1.0
 DESIRED: 1.0000000000000000001

@aarchiba
Copy link
Contributor Author

The test_repr_roundtrip failure is intentional: users who lose precision should be informed about it. But all the other tests are switched off, so they're not swamped with test failures. I could turn this one off too, but then people without strtold_l would have no warning about a silent loss of precision that doesn't apply to other, more normal platforms. And they're presumably using long doubles because they know they need the extra precision. The important question is: how common is a missing strtold_l going to be? It's been around a long time.

@njsmith
Copy link
Member

njsmith commented Aug 28, 2015

The simplest approach would just be to impose a hard dependency on
strtold_l, and deal with the fallout if/when someone finds a platform where
this causes compilation to fail...

On Fri, Aug 28, 2015 at 3:50 PM, Anne Archibald [email protected]
wrote:

The test_repr_roundtrip failure is intentional: users who lose precision
should be informed about it. But all the other tests are switched off, so
they're not swamped with test failures. I could turn this one off too, but
then people without strtold_l would have no warning about a silent loss of
precision that doesn't apply to other, more normal platforms. And they're
presumably using long doubles because they know they need the extra
precision. The important question is: how common is a missing strtold_l
going to be? It's been around a long time.


Reply to this email directly or view it on GitHub
#6199 (comment).

Nathaniel J. Smith -- http://vorpus.org

@jaimefrio
Copy link
Member

I'm getting the exact same failure as @pv on master using OS X Yosemite. Should I just get used to seeing it because it is a legit failure on this platform? I'm guessing a sizeable amount of development happens on OS X, so this may produce a lot of noise.

@aarchiba
Copy link
Contributor Author

@jaimefrio that is obviously not okay. But Yosemite should have strtold_l. I have noticed that config.h is not always regenerated; could you try wiping your build directory and building again? If you still get test failures please file a bug report.

@jaimefrio
Copy link
Member

Ah, much better now! ;-) A clean build did the trick for me. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants