-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
DEP: Letting fromstring pretend to be frombuffer is a bad idea #9487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3e9f009
to
d3f4c25
Compare
numpy/core/src/multiarray/ctors.c
Outdated
@@ -3611,6 +3611,12 @@ PyArray_FromString(char *data, npy_intp slen, PyArray_Descr *dtype, | |||
|
|||
binary = ((sep == NULL) || (strlen(sep) == 0)); | |||
if (binary) { | |||
if (DEPRECATE( | |||
"The binary mode of fromstring is deprecated, and behaved" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs space or newline at end here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "it" instead of "and".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should handle this in array_fromstring
. In particular, the problem is in the "s#"
format. We can check if the string argument is unicode and, if so, raise a FutureWarning
that in the future only byte strings will be accepted. i.e., the user has to use an encoding to convert unicode to bytes. I suppose we could add an encoding keyword, but in this case I think only accepting byte strings should be OK. Note that frombuffer
fails for unicode strings in Python 3, so we might also want to change that so that only byte strings are accepted.
numpy/add_newdocs.py
Outdated
|
||
.. deprecated:: 1.14 | ||
If this argument is not provided, this function falls back on the | ||
behaviour of `frombuffer` - but first, encoding unicode strings in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "after" instead of "- but first,".
Tests are easily fixed, but I think we should handle this a bit differently. |
d3f4c25
to
bbc9983
Compare
Tests and messages fixed up |
Interpreting a unicode string as raw binary data is a terrible idea, especially if the encoding is determined by the system (python 2)
bbc9983
to
6bc01b4
Compare
Error check moved. I suppose |
This made me realize I had been doing this wrong quite often. So, 👍 on putting a warning with a helpful solution! |
@@ -2098,6 +2098,17 @@ array_fromstring(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *keywds | |||
Py_XDECREF(descr); | |||
return NULL; | |||
} | |||
|
|||
// binary mode, condition copied from PyArray_FromString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ comments not allowed, use /* ... */
. This might change when we eventually require C99, but due to windows that will not happen until we drop Python 2.7, 3.4.
I'll fix this and merge.
[ci skip]
Thanks Eric. |
Closes pandas-dev#17986 xref numpy/numpy#9487 (cherry picked from commit dff5109)
Closes #17986 xref numpy/numpy#9487 (cherry picked from commit dff5109)
Can't understand this whole logic behind. This thing brakes quite working with byte data. The main problem is that |
Looks like the DeprecationWarning needs to mention the removed copy, good catch. The rationale is that "string" means "unicode" in the brave new py3k world, and viewing unicode as binary is nonsensical. Even though the behavior is well-defined on bytes, it's still completely separate concern from parsing text, which is the other thing this function can do when the separator is provided - so it made sense to me to push people to frombuffer |
Interpreting a unicode string as raw binary data is a terrible idea, especially if the encoding is determined by the system (python 2)
Fixes #9484, by just deprecating that use of the function which didn't make any sense in the first place