Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DEP: Letting fromstring pretend to be frombuffer is a bad idea #9487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 22, 2017

Conversation

eric-wieser
Copy link
Member

Interpreting a unicode string as raw binary data is a terrible idea, especially if the encoding is determined by the system (python 2)

Fixes #9484, by just deprecating that use of the function which didn't make any sense in the first place

@charris charris added this to the 1.14.0 release milestone Sep 25, 2017
@@ -3611,6 +3611,12 @@ PyArray_FromString(char *data, npy_intp slen, PyArray_Descr *dtype,

binary = ((sep == NULL) || (strlen(sep) == 0));
if (binary) {
if (DEPRECATE(
"The binary mode of fromstring is deprecated, and behaved"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs space or newline at end here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "it" instead of "and".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should handle this in array_fromstring. In particular, the problem is in the "s#" format. We can check if the string argument is unicode and, if so, raise a FutureWarning that in the future only byte strings will be accepted. i.e., the user has to use an encoding to convert unicode to bytes. I suppose we could add an encoding keyword, but in this case I think only accepting byte strings should be OK. Note that frombuffer fails for unicode strings in Python 3, so we might also want to change that so that only byte strings are accepted.


.. deprecated:: 1.14
If this argument is not provided, this function falls back on the
behaviour of `frombuffer` - but first, encoding unicode strings in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "after" instead of "- but first,".

@charris
Copy link
Member

charris commented Oct 18, 2017

Tests are easily fixed, but I think we should handle this a bit differently.

@eric-wieser
Copy link
Member Author

Tests and messages fixed up

Interpreting a unicode string as raw binary data is a terrible idea, especially if the encoding is determined by the system (python 2)
@eric-wieser
Copy link
Member Author

Error check moved. I suppose PyArray_FromString is convenient when writing C code, as a somewhat lazy way to construct a new array.

@mhvk
Copy link
Contributor

mhvk commented Oct 21, 2017

This made me realize I had been doing this wrong quite often. So, 👍 on putting a warning with a helpful solution!

@@ -2098,6 +2098,17 @@ array_fromstring(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *keywds
Py_XDECREF(descr);
return NULL;
}

// binary mode, condition copied from PyArray_FromString
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ comments not allowed, use /* ... */. This might change when we eventually require C99, but due to windows that will not happen until we drop Python 2.7, 3.4.

I'll fix this and merge.

@charris charris merged commit 5d54ba0 into numpy:master Oct 22, 2017
@charris
Copy link
Member

charris commented Oct 22, 2017

Thanks Eric.

TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Oct 25, 2017
jreback pushed a commit to pandas-dev/pandas that referenced this pull request Oct 27, 2017
peterpanmj pushed a commit to peterpanmj/pandas that referenced this pull request Oct 31, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017
TomAugspurger added a commit to pandas-dev/pandas that referenced this pull request Dec 11, 2017
@sem-geologist
Copy link

Can't understand this whole logic behind. This thing brakes quite working with byte data. The main problem is that frombuffer returns immutable array, while fromstringwas returning new array.
such array was ready to do i.e. .byteswap(), while immutable (from buffer) is not, and needs additional copy() method to mingle with bytes.

@eric-wieser
Copy link
Member Author

Looks like the DeprecationWarning needs to mention the removed copy, good catch.

The rationale is that "string" means "unicode" in the brave new py3k world, and viewing unicode as binary is nonsensical.

Even though the behavior is well-defined on bytes, it's still completely separate concern from parsing text, which is the other thing this function can do when the separator is provided - so it made sense to me to push people to frombuffer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants