-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG 4381 Longdouble from string without precision loss #6199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Python 3 is not your friend: it's complaining about a mismatch in the 19th decimal! |
Awesome, and looks good to me modulo test failures... |
probably won't work on windows, though there should be some other function with a different name which does the same. |
Oh no I confused this with strtoll which we fixed a while ago. Still should be tested on windows. |
I don't have access to a Windows machine. With MSVC I think we have double==long double, which means this just has to not break anything. I know even less about Intel and other compilers on Windows. But shouldn't Travis check those? |
Travis is Linux only, unfortunately. There are other services like appveyor
|
See the discussion on bug #4381 for details, but locales are going to be a problem for this code. It is unclear how to work around the problem. |
Ugh, right, sscanf is also locale sensitive and everything is terrible.
|
The _l functions would solve the problem, if present; that would probably imply strtold_l as well. Fortunately foreign locale I/O is tested, for floats, in test_print.py and test_multiarray.py. |
My concern is that it sounds like the changes in this PR were broken in foreign locales, but we didn't notice because whatever tests we have aren't enough to catch it... maybe this just means extending those tests to make sure they cover all the different float widths? |
That's just a shortcoming of this PR. All the other float widths go through Python's float object, so there is no need to test them separately. So did long doubles, which is why they lose precision upon conversion. The only thing to check is str/repr of long doubles, because that is supposed to retain full precision and therefore must have its own conversion code. %-printing, format(), and conversion from strings potentially suffer the same problem, but at the moment they go through floats. |
The _l functions appeared in Darwin 8.0, FreeBSD 9.1, and were added to glibc in 1997, as far as I can tell. I would be astonished if they were missing from any Linux system people were still using. Windows I don't know about. |
Sweet. Windows we can live without (possibly with some annoying #ifdef) On Sun, Aug 16, 2015 at 1:28 AM, Anne Archibald [email protected]
Nathaniel J. Smith -- http://vorpus.org |
Okay, mostly what is needed now is to correctly detect whether strtold_l is available. I have all the ifdefs in place but I currently just unconditionally define it. Presumably it should be defined, if appropriate, in numpy/config.h, but I don't know how that is made. |
* handled too, because python accepts bytes as input to | ||
* float() (why?). | ||
*/ | ||
#if PY_MAJOR_VERSION >= 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no experience with the python C API so I couldn't figure out how to write version-agnostic string code. But I think an ifdef is probably necessary to handle both byte and string objects in python 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look in numpy/core/include/numpy/py3_compat.h
for string routines. The ones available are the union of the py3k and py2k functions. My guess you will need to check for unicode and convert it to a byte string, then use PyString_AS_STRING
to get the c ascii version.
The detection of strtold_l
would be part of the configuration, which you can find scattered about in numpy/core/setup_common.py
, numpy/core/setup.py
and numpy/core/bscript
. I haven't checked the check function to see if they will work with those particular functions. It might be easier to just copy the functions and make a library, but it looks like even dtoa.c
has platform dependent bits.
EDIT: You can grep for the functions in numpy/core/include/numpy/py3_compat.h
to find examples of their use.
I'm not sure how to debug the bento build, since I haven't had any luck getting bento to work on my machine. It looks like bento is detecting strtold_l but not #defining the correct macro, but I can't see inside the generated config.h. |
Yeah, bento makes things difficult. Current waf crashes bento on my setup (Fedora 22) and can't find atlas in any case. @rgommers Thoughts? I'd like to take another shot at removing bento from numpy, it doesn't buy much for speed, doesn't support all optimizations, and AFAIK, you are the only person who wants to keep it. |
@cournape wants to keep it as well, but given that he's put Bento development on hold, I'm OK with this. I think it's a shame that we're going back to a single build system that no one likes, believes in as a future-proof solution or even wants to maintain - but I don't really have the time and energy required to improve that situation. So go ahead. |
Also, are the one-file builds still really necessary? That's also
something that could be scrapped...
|
IIRC the one file builds are right now the only way to make cpython+numpy
|
Don't kill bento support on my account. If it comes to that I can submit debugging code to Travis. |
@aarchiba It's not you, it's me ;) The topic comes up regularly because we all run into the same difficulty supporting two build systems. If David were still working regularly to maintain and improve Bento there would be no problem, but he is out earning a living and having a life. While I agree with Ralf that a better system is desirable, in practice is more difficult to maintain the Bento build than the distutils version, and Bento has not fully supported Numpy for a while. @rgommers If we decide at some point that there is a better build system I'll happily support it. @mwiebe chose CMake as the least evil alternative, but he doesn't like it. @eric-s-raymond likes scons, but David had to work hard, including upstream fixes, to get it working with numpy. It's an unfortunate situation and I don't know what the future holds, but I think we can keep distutils limping along for a few more years. |
I've posted to the list a proposal to remove Bento support. |
All build systems are terrible; it's like choosing between thumbscrews and the rack. The sooner one comes to terms with this reality the happier one will be. I think the key enabler for better python build systems will be when it becomes possible to tell pip 'here's a script that when run will spit out a wheel, treat it as a black box otherwise'. Then one can stop worrying about distutils entirely, let pip worry about actually installing, etc., and just write a script/makefile/whatever to do this one well defined thing. |
Since this thread may be referenced in the future a correction: this isn't actually the case. The only reason you see it that way is that |
|
||
import sys | ||
import locale | ||
import nose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks unused.
Looks about ready. Could you squash the commits? Note that github doesn't notify on branch updates, so it helps to make a comment. What was the fix for the failing test? |
Squash down to one, or to a few meaningful ones? The failing test was testing tofile's behaviour with a specific separator but looking for a specific string representation (3.51 -> '3.51'). Python 2.6 didn't repr 3.51 as '3.51' but as 3.5099...98. So for that test I checked the commas and that the numbers round-tripped correctly. The next test tests what happens with a specific number of digits; that one I didn't need to change. |
I would like to make "{0:.40f}".format(o) preserve the accuracy (no way to fix %, array printing already does) but that requires a conversion from format() strings to %-sequences, which will be a decidedly nontrivial function. So I'll make it a separate PR. |
Something like 1-3 commits, use your judgment. There are no rules for that. |
bb48c3b
to
875b2dd
Compare
Avoid going through python floats when converting string to longdouble. This makes it dramatically easier to produce full-precision long double numbers. Fixed are the constructor (np.longdouble("1.01")), np.fromfile, np.fromstring, np.loadtxt, and np.genfromtxt (and functions based on it). Also fixed is precision loss when using np.tofile. This also fixes numpy#1481, poor handling of bad data in fromfile and fromstring. If the function strtod_l is not available, almost none of this will work, and many tests will fail.
875b2dd
to
6cbd724
Compare
Squashed down to one commit. |
That is |
If there is no easy check, I'm thinking we really ought to have the defined macros somewhere accessible, say in numpy.config, but that is another project. If that looks like the proper solution this can go in and we can fix it up later. |
A quick hack checks whether the constructor loses precision lets me set a variable. But yes, it would be nice if the results of config.h also got made visible at the python level. |
BUG 4381 Longdouble from string without precision loss
OK, in it goes. Thanks Anne. I spent so much time nitpicking that I was beginning to feel bad, thanks for your patience ;) |
* decimal point is encountered. The only case I can think of | ||
* where this would matter is if the decimal separator were | ||
* something that could occur in the middle of a valid number, | ||
* in which case this function does the wrong thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under Python 2.6, for input "1,2345" under "fi_FI.UTF-8" locale, PyOS_ascii_strtod refuses to parse the string at all, and returns an error instead of "1" which was the correct best-efforts result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was not clear from the comment below, it perhaps should be updated.
The PyOS_ascii_strod issue seems to affect the long double parser:
|
Sorry, scratch that, I had some wrong version compiled. It does work,
EDIT although the code path used when |
@charris well, I was getting embarrassed about all the stupid mistakes. Thanks for the attention, I'm getting my fingers retrained to numpy's coding style. |
@pv Looks like edits don't trigger github notifications either. What do you think needs to be done for the |
@aarchiba: did you try it on Python 2.6 (the only version affected, note you need the french locale available) --- I get test failures on that if I disable the Without the
|
The |
The simplest approach would just be to impose a hard dependency on On Fri, Aug 28, 2015 at 3:50 PM, Anne Archibald [email protected]
Nathaniel J. Smith -- http://vorpus.org |
I'm getting the exact same failure as @pv on master using OS X Yosemite. Should I just get used to seeing it because it is a legit failure on this platform? I'm guessing a sizeable amount of development happens on OS X, so this may produce a lot of noise. |
@jaimefrio that is obviously not okay. But Yosemite should have strtold_l. I have noticed that config.h is not always regenerated; could you try wiping your build directory and building again? If you still get test failures please file a bug report. |
Ah, much better now! ;-) A clean build did the trick for me. Thanks! |
Fixes #4381 using sscanf. Also adds tests for #2376 and related loss of precision in string conversions.