bpo-29803: remove some redandunt ops in unicodeobject.c #660

zhangyangyu · 2017-03-13T06:16:56Z

https://bugs.python.org/issue29803

zhangyangyu · 2017-03-13T06:18:06Z

Objects/unicodeobject.c

        if (unicode_decode_call_errorhandler_writer(
                errors, &errorHandler,
                "unicodeescape", message,
                &starts, &end, &startinpos, &endinpos, &exc, &s,
                &writer)) {
            goto onError;
        }
-        if (_PyUnicodeWriter_Prepare(&writer, writer.min_length, 127) < 0) {
-            goto onError;


The necessary widen has been done in unicode_decode_call_errorhandler_writer. So I think this is not a must.

And how about this? How could it lead to crash?

This is even more important. Since WRITE_CHAR doesn't check the size of the output buffer, we need to allocate the space for writer.min_length = end - s + writer.pos characters past the last written character.

Of course. But isn't the widen done in unicode_decode_call_errorhandler_writer? When the error handler generates only one character, we don't need more space since we have already got enough space. But when the error handler generates more, unicode_decode_call_errorhandler_writer allocates the spaces for you. You did the change to unicode_decode_call_errorhandler_writer to avoid crash.

I don't remove all details, but when I wrote this code the call of _PyUnicodeWriter_Prepare() was needed.

Maybe something was changed since that time. I will examine thу code in detail some time later. But now I have no confidence that the removal of this call is safe.

Just add assert(writer.min_length <= writer.size - writer.pos) and see how Python crashes when run tests.

I cannot understand. :-( writer.min_length means the least needed space here and it means the least space _PyUnicodeWriter will allocate for you. And writer.size is the actually allocated size. So shouldn't the right assertion here is just assert(writer.min_length <= writer.size). If you minus writer.pos, it means the left space, then should use assert(end - s <= writer.size - writer.pos).

You are right. The right assert is assert(end - s <= writer.size - writer.pos). Seems _PyUnicodeWriter_Prepare() is incorrectly used here and in unicode_decode_call_errorhandler_writer(). And min_length may be inconsistently used in different decoders.

I just think it's not necessary but not an error.

serhiy-storchaka

There is one legitimate fix of copy-paste error, one questionable change and few incorrect removes.

serhiy-storchaka · 2017-03-13T06:23:56Z

Objects/unicodeobject.c

@@ -3922,10 +3922,6 @@ PyUnicode_FSDecoder(PyObject* arg, void* addr)
    }

    if (PyUnicode_Check(path)) {
-        if (PyUnicode_READY(path) == -1) {


Why this is removed?

It's not an error but not a must. After the if ... else ... we get a code path to ready it.

serhiy-storchaka · 2017-03-13T06:36:06Z

Objects/unicodeobject.c

@@ -6086,17 +6082,13 @@ _PyUnicode_DecodeUnicodeEscape(const char *s,

      error:
        endinpos = s-starts;
-        writer.min_length = end - s + writer.pos;


This code is needed. See also comments on Rietveld: https://bugs.python.org/review/16334/diff/17685/Objects/unicodeobject.c.

Ohh, it's discussed. I know this may avoid unnecessary reallocation. But honestly I doubt how useful it could be.

serhiy-storchaka · 2017-03-13T06:37:01Z

Objects/unicodeobject.c

@@ -6439,7 +6427,7 @@ PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
        if (ch < 0x100) {
            *p++ = (char) ch;
        }
-        /* U+0000-U+00ff range: Map 16-bit characters to '\uHHHH' */
+        /* U+0100-U+ffff range: Map 16-bit characters to '\uHHHH' */


serhiy-storchaka · 2017-03-13T07:03:19Z

Objects/unicodeobject.c

@@ -3922,10 +3922,6 @@ PyUnicode_FSDecoder(PyObject* arg, void* addr)
    }

    if (PyUnicode_Check(path)) {
-        if (PyUnicode_READY(path) == -1) {


serhiy-storchaka · 2017-03-13T07:08:08Z

Objects/unicodeobject.c

        if (unicode_decode_call_errorhandler_writer(
                errors, &errorHandler,
                "unicodeescape", message,
                &starts, &end, &startinpos, &endinpos, &exc, &s,
                &writer)) {
            goto onError;
        }
-        if (_PyUnicodeWriter_Prepare(&writer, writer.min_length, 127) < 0) {
-            goto onError;


This is even more important. Since WRITE_CHAR doesn't check the size of the output buffer, we need to allocate the space for writer.min_length = end - s + writer.pos characters past the last written character.

vstinner

You removed code to update writer.min_length and to call _PyUnicodeWriter_Prepare(). I don't think that this change is correct. I wrote this code long time ago, and I don't recall the rationale, and the code was carefully written for best performances, and also for correctness. If the buffer is too small, you create a buffer overflow...

zhangyangyu · 2017-03-14T03:10:54Z

I admit updating writer.min_length has its effect. It's something I am gonna restore. But I still don't understand why buffer overflow could happen. I'll wait for Serhiy's comment as an answer.

request another round

zhangyangyu · 2017-03-31T03:10:18Z

I dismissed your reviews to request another round. No offense.

zhangyangyu · 2017-04-27T08:10:08Z

@serhiy-storchaka , does this look correct now? Or I still mix things up?

serhiy-storchaka · 2018-02-12T11:55:22Z

Created separate #5636 for incorrect use of _PyUnicodeWriter_Prepare(). After merging that PR and merging this PR with master the rest of it LGTM.

bedevere-bot · 2018-02-13T10:33:35Z

@zhangyangyu: Please replace # with GH- in the commit message next time. Thanks!

zhangyangyu · 2018-02-13T10:34:06Z

Thanks @serhiy-storchaka ! I was going to rebase it after getting home. :-)

remove redandunt ops in unicodeobject.c

c53c234

zhangyangyu added the type-feature A feature request or enhancement label Mar 13, 2017

zhangyangyu requested review from vstinner and serhiy-storchaka March 13, 2017 06:16

the-knights-who-say-ni added the CLA signed label Mar 13, 2017

zhangyangyu commented Mar 13, 2017

View reviewed changes

serhiy-storchaka requested changes Mar 13, 2017

View reviewed changes

serhiy-storchaka previously requested changes Mar 13, 2017

View reviewed changes

vstinner previously requested changes Mar 13, 2017

View reviewed changes

zhangyangyu added 2 commits March 29, 2017 21:02

restore writer.min_length update

b78d981

add invariant assertions

c2b6c08

serhiy-storchaka self-requested a review April 27, 2017 11:32

Merge branch 'master' into unicode-cleanup

48bfd51

serhiy-storchaka approved these changes Feb 13, 2018

View reviewed changes

bedevere-bot added the awaiting merge label Feb 13, 2018

serhiy-storchaka added the skip news label Feb 13, 2018

zhangyangyu merged commit 2b77a92 into python:master Feb 13, 2018

bedevere-bot removed the awaiting merge label Feb 13, 2018

zhangyangyu deleted the unicode-cleanup branch February 13, 2018 10:33

Uh oh!

bpo-29803: remove some redandunt ops in unicodeobject.c #660

bpo-29803: remove some redandunt ops in unicodeobject.c #660

Uh oh!

Conversation

zhangyangyu commented Mar 13, 2017 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

zhangyangyu commented Mar 14, 2017

Uh oh!

zhangyangyu commented Mar 31, 2017

Uh oh!

zhangyangyu commented Apr 27, 2017

Uh oh!

serhiy-storchaka commented Feb 12, 2018

Uh oh!

bedevere-bot commented Feb 13, 2018

Uh oh!

zhangyangyu commented Feb 13, 2018

Uh oh!

Uh oh!

zhangyangyu commented Mar 13, 2017 •

edited by bedevere-bot

Loading