Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Allowing unicode fmt in savetxt. #4053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

Dapid
Copy link

@Dapid Dapid commented Nov 14, 2013

If the fmt is unicode, it will be re-encoded in ASCII before use.

Raises an error if the conversion is not possible.

I have to add a test, but I don't know where are they sitting.

@Dapid
Copy link
Author

Dapid commented Nov 14, 2013

Discussion thread at the mailing list: http://comments.gmane.org/gmane.comp.python.numeric.general/55886

@WarrenWeckesser
Copy link
Member

@pierre-haessig
Copy link

@Dapid : when you mean "specifying an encoding for the whole file." in 2bd27c7, do you mean adding

# -*- coding: utf-8 -*-

at the beginning of the test_io.py file ?

@charris
Copy link
Member

charris commented Nov 14, 2013

The u prefix does not work in python3 < 3.3, so the test is failing.

c1 = BytesIO()
c2 = BytesIO()
np.savetxt(c1, a, fmt='%02d : %3.1f')
np.savetxt(c2, a, fmt=u'%02d : %3.1f')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do sixu('%02d : %3.1f') since the u prefix is new for python3 in 3.3. You will also need from numpy.compat import sixu up top.

@Dapid
Copy link
Author

Dapid commented Dec 3, 2013

Ok, this seems to fix the issue. Thanks for the tip, @charris

I didn't know about the commit message conventions at the time, so maybe they have to be squashed or rebased.

@charris
Copy link
Member

charris commented Dec 3, 2013

Well, since you sort of volunteered, squashing and rewriting the commit message would be nice ;) You can use git rebase -i ... for doing both followed by a force push of your branch.

Otherwise, LGTM.

elif isinstance(fmt, basestring):
if isinstance(fmt, unicode):
try:
fmt = fmt.format('ascii')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do anything to the string in my experiments. Perhaps fmt.encode('ascii')?

@Dapid
Copy link
Author

Dapid commented Dec 3, 2013

Oops, I tried to rebase, and accidentally added commits by other people from master. Should I try to rebase again and get rid of them, or is it better if I leave it alone?

@charris
Copy link
Member

charris commented Dec 3, 2013

Looks like you did a merge somewhere that screwed it up. Try a straight git rebase master to see if that cleans it up any. If all else fails, there is always git reflog that has pointers to earlier states of git.

Now that I look more closely at this, I'm not clear on what you are doing as it looks like fmt needs to be a string. The string conversion of fmt at the top of the function looks incomplete and should probably deal with the sequence case also.

If the fmt is unicode, it will be re-encoded in ASCII before use.

Raises an error if the conversion is not possible.
@Dapid
Copy link
Author

Dapid commented Dec 3, 2013

I fixed the rebase. I think.

@charris, you were right, format was the wrong function, now it uses encode.

In the case of a list, it goes via numpy.compat.asstr, that seems to allow unicode.

In Python 2.7:
numpy.compat.asstr(u'%02d : %3.1f')
'%02d : %3.1f'

@charris
Copy link
Member

charris commented Dec 3, 2013

You still have three commits here, two not yours. Let's see if we can clean it up. What have you been trying?

asstr produces ascii in python 2 and unicode in python 3. Can you be more explicit as to what you are trying of fix?

@charris
Copy link
Member

charris commented Dec 3, 2013

OK, commits look good now ;)

@Dapid
Copy link
Author

Dapid commented Dec 3, 2013

I am trying to allow fmt in unicode, as was the case of from future import unicode_literals.

Perhaps there is no need to explicitly check the unicode failing when fmt cannot be caseted to ASCII, as it would prouce a wrong fmt anyway.

@charris
Copy link
Member

charris commented Dec 3, 2013

Now the test fails. It bothers me that it didn't fail before. In order for the format to work, it needs to be unicode, i.e. a string, in python3. In python2 either unicode or ascii will work, but it is probably best to stick with ascii.

What was the problem that motivated this fix? I assume you had trouble with something.

@pierre-haessig
Copy link

@charris I was the one complaining on the ML about unicode fmt ;-) (http://comments.gmane.org/gmane.comp.python.numeric.general/55886).

It was initially a combination of 2 problems: 1) unicode fmt fails 2) the error message is a failure by itself. But 2) was fixed in 1.8 so it's already better: it fails with a clearer error message.

@charris
Copy link
Member

charris commented Dec 3, 2013

@pierre-haessig @Dapid A solution that seems to work is to change

    if isinstance(fmt, bytes):
        fmt = asstr(fmt)

to

    fmt = asstr(fmt)

at the top of the function.

Probably the tuples/lists of formats should also be so converted to start with.

@charris
Copy link
Member

charris commented Dec 3, 2013

To deal with list/tuple/scalar cases, maybe

fmt = [asstr(i) for i in list(fmt)]

Oops, that doesn't work for scalar strings. Guess you need to check for list, etc. and handle them separately.

@cdeil
Copy link

cdeil commented Nov 30, 2016

I just ran into this same issue, see here:
https://travis-ci.org/astropy/regions/jobs/179816687#L1057

To give a minimal example, what we're doing it passing unicode for fmt, which fails on Python 2:

>>> import numpy as np
>>> from __future__ import unicode_literals
>>> np.savetxt('/tmp/test.txt', np.arange(10), fmt='%12.8e')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/npyio.py", line 1143, in savetxt
    raise ValueError('invalid fmt: %r' % (fmt,))
ValueError: invalid fmt: u'%12.8e'

Passing a unicode fmt does already now work on Python 3:

np.savetxt('/tmp/test.txt', np.arange(10), fmt='%12.8e')

Passing bytes works on Python 2 and 3 (I only tested 2.7 and 3.5, and don't care about older versions for the project where the issue occured)

np.savetxt('/tmp/test.txt', np.arange(10), fmt=b'%12.8e')

I guess using bytes literals is the recommended workaround for now?

The docstring says:

fmt : str or sequence of strs, optional
https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html


This pull request seems to have stalled in 2013. I'm still seeing the issue with numpy 1.11.2.

@Dapid or any of the other numpy devs -- do you have time to revive this PR or make a new one?


I think I've run into such an issue before with some other numpy function. It seems that this is something that could be automated to a large degree: use type annotations others have built (like the PyCharm folk) for numpy to make a list of all such arguments that should work with bytes and unicode input on Python 2 and 3, and try to auto-generate tests that call them with all four cases to get a list of functions / parameters that have this issue?
Is it worth making a separate issue to propose / pursue this idea?

@mattip
Copy link
Member

mattip commented Apr 18, 2018

The test in the comment above fails on 1.14.2, the small pure python PR looks like it could still be applied to master. @Dapid would you like to rebase this and try again?

@eric-wieser
Copy link
Member

eric-wieser commented Apr 18, 2018

If the format string is unicode, then I'd argue that either:

  • The (new) encoding argument should be specified
  • A pre-opened unicode file should be passed

Does it succeed for both of these cases?

seberg added a commit to seberg/numpy that referenced this pull request Apr 26, 2019
By now, all that is needed is to also allow unicode strings to
pass through. Adds a test for the support which already succeeds
on python3.

Closes numpygh-4053 (replaces the old PR)
@pierre-haessig
Copy link

I had forgotten about this problem a long time ago, but anyway it's nice to know it's been fixed!

Thanks @seberg !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants