-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Allowing unicode fmt in savetxt. #4053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Discussion thread at the mailing list: http://comments.gmane.org/gmane.comp.python.numeric.general/55886 |
@Dapid : when you mean "specifying an encoding for the whole file." in 2bd27c7, do you mean adding
at the beginning of the |
The |
c1 = BytesIO() | ||
c2 = BytesIO() | ||
np.savetxt(c1, a, fmt='%02d : %3.1f') | ||
np.savetxt(c2, a, fmt=u'%02d : %3.1f') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do sixu('%02d : %3.1f')
since the u
prefix is new for python3 in 3.3. You will also need from numpy.compat import sixu
up top.
Ok, this seems to fix the issue. Thanks for the tip, @charris I didn't know about the commit message conventions at the time, so maybe they have to be squashed or rebased. |
Well, since you sort of volunteered, squashing and rewriting the commit message would be nice ;) You can use Otherwise, LGTM. |
elif isinstance(fmt, basestring): | ||
if isinstance(fmt, unicode): | ||
try: | ||
fmt = fmt.format('ascii') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't do anything to the string in my experiments. Perhaps fmt.encode('ascii')
?
Oops, I tried to rebase, and accidentally added commits by other people from master. Should I try to rebase again and get rid of them, or is it better if I leave it alone? |
Looks like you did a merge somewhere that screwed it up. Try a straight Now that I look more closely at this, I'm not clear on what you are doing as it looks like |
If the fmt is unicode, it will be re-encoded in ASCII before use. Raises an error if the conversion is not possible.
I fixed the rebase. I think. @charris, you were right, format was the wrong function, now it uses encode. In the case of a list, it goes via numpy.compat.asstr, that seems to allow unicode. In Python 2.7: |
You still have three commits here, two not yours. Let's see if we can clean it up. What have you been trying?
|
OK, commits look good now ;) |
I am trying to allow fmt in unicode, as was the case of from future import unicode_literals. Perhaps there is no need to explicitly check the unicode failing when fmt cannot be caseted to ASCII, as it would prouce a wrong fmt anyway. |
Now the test fails. It bothers me that it didn't fail before. In order for the format to work, it needs to be unicode, i.e. a string, in python3. In python2 either unicode or ascii will work, but it is probably best to stick with ascii. What was the problem that motivated this fix? I assume you had trouble with something. |
@charris I was the one complaining on the ML about unicode fmt ;-) (http://comments.gmane.org/gmane.comp.python.numeric.general/55886). It was initially a combination of 2 problems: 1) unicode fmt fails 2) the error message is a failure by itself. But 2) was fixed in 1.8 so it's already better: it fails with a clearer error message. |
@pierre-haessig @Dapid A solution that seems to work is to change
to
at the top of the function. Probably the tuples/lists of formats should also be so converted to start with. |
To deal with list/tuple/scalar cases, maybe
Oops, that doesn't work for scalar strings. Guess you need to check for list, etc. and handle them separately. |
I just ran into this same issue, see here: To give a minimal example, what we're doing it passing unicode for >>> import numpy as np
>>> from __future__ import unicode_literals
>>> np.savetxt('/tmp/test.txt', np.arange(10), fmt='%12.8e')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/npyio.py", line 1143, in savetxt
raise ValueError('invalid fmt: %r' % (fmt,))
ValueError: invalid fmt: u'%12.8e' Passing a unicode np.savetxt('/tmp/test.txt', np.arange(10), fmt='%12.8e') Passing bytes works on Python 2 and 3 (I only tested 2.7 and 3.5, and don't care about older versions for the project where the issue occured) np.savetxt('/tmp/test.txt', np.arange(10), fmt=b'%12.8e') I guess using bytes literals is the recommended workaround for now? The docstring says:
This pull request seems to have stalled in 2013. I'm still seeing the issue with numpy @Dapid or any of the other numpy devs -- do you have time to revive this PR or make a new one? I think I've run into such an issue before with some other numpy function. It seems that this is something that could be automated to a large degree: use type annotations others have built (like the PyCharm folk) for numpy to make a list of all such arguments that should work with bytes and unicode input on Python 2 and 3, and try to auto-generate tests that call them with all four cases to get a list of functions / parameters that have this issue? |
The test in the comment above fails on 1.14.2, the small pure python PR looks like it could still be applied to master. @Dapid would you like to rebase this and try again? |
If the format string is unicode, then I'd argue that either:
Does it succeed for both of these cases? |
By now, all that is needed is to also allow unicode strings to pass through. Adds a test for the support which already succeeds on python3. Closes numpygh-4053 (replaces the old PR)
I had forgotten about this problem a long time ago, but anyway it's nice to know it's been fixed! Thanks @seberg ! |
If the fmt is unicode, it will be re-encoded in ASCII before use.
Raises an error if the conversion is not possible.
I have to add a test, but I don't know where are they sitting.