-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
bpo-29803: remove some redandunt ops in unicodeobject.c #660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Objects/unicodeobject.c
Outdated
if (unicode_decode_call_errorhandler_writer( | ||
errors, &errorHandler, | ||
"unicodeescape", message, | ||
&starts, &end, &startinpos, &endinpos, &exc, &s, | ||
&writer)) { | ||
goto onError; | ||
} | ||
if (_PyUnicodeWriter_Prepare(&writer, writer.min_length, 127) < 0) { | ||
goto onError; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The necessary widen has been done in unicode_decode_call_errorhandler_writer
. So I think this is not a must.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And how about this? How could it lead to crash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is even more important. Since WRITE_CHAR
doesn't check the size of the output buffer, we need to allocate the space for writer.min_length = end - s + writer.pos
characters past the last written character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. But isn't the widen done in unicode_decode_call_errorhandler_writer
? When the error handler generates only one character, we don't need more space since we have already got enough space. But when the error handler generates more, unicode_decode_call_errorhandler_writer
allocates the spaces for you. You did the change to unicode_decode_call_errorhandler_writer
to avoid crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remove all details, but when I wrote this code the call of _PyUnicodeWriter_Prepare()
was needed.
Maybe something was changed since that time. I will examine thу code in detail some time later. But now I have no confidence that the removal of this call is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just add assert(writer.min_length <= writer.size - writer.pos)
and see how Python crashes when run tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot understand. :-( writer.min_length
means the least needed space here and it means the least space _PyUnicodeWriter
will allocate for you. And writer.size
is the actually allocated size. So shouldn't the right assertion here is just assert(writer.min_length <= writer.size)
. If you minus writer.pos
, it means the left space, then should use assert(end - s <= writer.size - writer.pos)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. The right assert is assert(end - s <= writer.size - writer.pos)
. Seems _PyUnicodeWriter_Prepare()
is incorrectly used here and in unicode_decode_call_errorhandler_writer()
. And min_length
may be inconsistently used in different decoders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just think it's not necessary but not an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one legitimate fix of copy-paste error, one questionable change and few incorrect removes.
@@ -3922,10 +3922,6 @@ PyUnicode_FSDecoder(PyObject* arg, void* addr) | |||
} | |||
|
|||
if (PyUnicode_Check(path)) { | |||
if (PyUnicode_READY(path) == -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not an error but not a must. After the if ... else ...
we get a code path to ready it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
Objects/unicodeobject.c
Outdated
@@ -6086,17 +6082,13 @@ _PyUnicode_DecodeUnicodeEscape(const char *s, | |||
|
|||
error: | |||
endinpos = s-starts; | |||
writer.min_length = end - s + writer.pos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is needed. See also comments on Rietveld: https://bugs.python.org/review/16334/diff/17685/Objects/unicodeobject.c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, it's discussed. I know this may avoid unnecessary reallocation. But honestly I doubt how useful it could be.
@@ -6439,7 +6427,7 @@ PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode) | |||
if (ch < 0x100) { | |||
*p++ = (char) ch; | |||
} | |||
/* U+0000-U+00ff range: Map 16-bit characters to '\uHHHH' */ | |||
/* U+0100-U+ffff range: Map 16-bit characters to '\uHHHH' */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -3922,10 +3922,6 @@ PyUnicode_FSDecoder(PyObject* arg, void* addr) | |||
} | |||
|
|||
if (PyUnicode_Check(path)) { | |||
if (PyUnicode_READY(path) == -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
Objects/unicodeobject.c
Outdated
if (unicode_decode_call_errorhandler_writer( | ||
errors, &errorHandler, | ||
"unicodeescape", message, | ||
&starts, &end, &startinpos, &endinpos, &exc, &s, | ||
&writer)) { | ||
goto onError; | ||
} | ||
if (_PyUnicodeWriter_Prepare(&writer, writer.min_length, 127) < 0) { | ||
goto onError; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is even more important. Since WRITE_CHAR
doesn't check the size of the output buffer, we need to allocate the space for writer.min_length = end - s + writer.pos
characters past the last written character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed code to update writer.min_length and to call _PyUnicodeWriter_Prepare(). I don't think that this change is correct. I wrote this code long time ago, and I don't recall the rationale, and the code was carefully written for best performances, and also for correctness. If the buffer is too small, you create a buffer overflow...
I admit updating writer.min_length has its effect. It's something I am gonna restore. But I still don't understand why buffer overflow could happen. I'll wait for Serhiy's comment as an answer. |
request another round
I dismissed your reviews to request another round. No offense. |
@serhiy-storchaka , does this look correct now? Or I still mix things up? |
Created separate #5636 for incorrect use of _PyUnicodeWriter_Prepare(). After merging that PR and merging this PR with master the rest of it LGTM. |
@zhangyangyu: Please replace |
Thanks @serhiy-storchaka ! I was going to rebase it after getting home. :-) |
https://bugs.python.org/issue29803