Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Unicode byteorder seems mostly broken #3939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seberg opened this issue Oct 17, 2013 · 4 comments
Closed

Unicode byteorder seems mostly broken #3939

seberg opened this issue Oct 17, 2013 · 4 comments

Comments

@seberg
Copy link
Member

seberg commented Oct 17, 2013

Is it supposed to be possible to use non-native unicode byteorder?

The unicode comparison functions cannot handle non-native byteorder, however this also applies to the dtype transfer functions. The copyswap functions do anticipate it, but only for 4-byte wide unicode (and I think there can be 2-byte wide as a compile option?), maybe that is why printing works...

In [19]: a = np.array(['asdf']).astype(unicode)

In [20]: a.byteswap().newbyteorder()
Out[20]: 
array([u'asdf'], 
      dtype='>U4')

In [21]: a.astype('>U4')
Out[21]: 
array([u'\U61000000\U73000000\U64000000\U66000000'], 
      dtype='>U4')

In [22]: a == a.byteswap().newbyteorder()
Out[22]: array([False], dtype=bool)
@charris
Copy link
Member

charris commented Oct 17, 2013

Numpy only uses 32 bit unicode, while Python can be either 16 or 32 bits depending on the configuration.

@seberg
Copy link
Member Author

seberg commented Oct 17, 2013

ah ok, that makes sense... So things are a bit simpler. We will need a dedicated unicode dtype transfer function, and either use the new iterator or force a copy in the comparison functions

@seberg
Copy link
Member Author

seberg commented May 23, 2016

@jreback no idea if it interests you, but I just opened gh-7664 to hopefully fix this soonish. OOps, I had forgotten about the "numpy only uses 32bit" part, haha. Makes the code much nicer...

@jreback
Copy link

jreback commented May 24, 2016

thanks @seberg unicode I mostly like not to deal with :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants