Raise `ValueError` if `\x00` character exists for `eval` argument #4052

moreal · 2022-08-12T17:28:25Z

It isn't a full perfect solution for eval implementation but it fixes:

eval became to receive bytes also.
eval raises ValueError when there is null (\x00) character before compiling.

You can see also:

    str = _Py_SourceAsString(source, "eval", "string, bytes or code", &cf, &source_copy);

p.s. The message says like it should receive string or bytes or code but it also receives bytearray well. 🤔

fanninpm · 2022-08-12T18:18:39Z

extra_tests/snippets/syntax_non_utf8.py

@@ -5,8 +5,7 @@

 dir_path = os.path.dirname(os.path.realpath(__file__))

-# TODO: RUSTPYTHON, RustPython raises a SyntaxError here, but cpython raise a ValueError
-error = SyntaxError if platform.python_implementation() == 'RustPython' else ValueError
+error = ValueError
 with assert_raises(error):


Suggested change

with assert_raises(error):

with assert_raises(ValueError):

I applied your suggestion in 7e88c8f. Thank you! 🙏🏻

moreal · 2022-08-13T03:11:35Z

About the CI / Check Rust code with rustfmt and clippy (pull_request) is broken, you can see #4051

youknowone · 2022-08-13T08:01:21Z

vm/src/stdlib/builtins.rs

+        source: Either<
+            Either<PyStrRef, crate::builtins::PyBytesRef>,
+            PyRef<crate::builtins::PyCode>,
+        >,


is this exactly (str|bytes) or ArgStrOrBytesLike?
ArgStrOrBytesLike accepts more types like bytearray. Some functions take exactly bytes, while other functions take also bytearray

When I run rustpython with ArgStrOrBytesLike, it seems very suit for this case. string, bytes or code is just error message and the eval also receives bytearray Buffer 🤔 . So I'll push a commit apply ArgStrOrBytesLike. Thanks! 🙏🏻

youknowone · 2022-08-13T08:01:58Z

vm/src/stdlib/builtins.rs

+                }
+
+                Ok(Either::A(
+                    vm.ctx.new_str(std::str::from_utf8(source).unwrap()),


what happens if bytes include non-ascii bytes?
what happens if bytes include invalid utf8 character?

what happens if bytes include invalid utf8 character?

It causes panic. I tested with the below python code:

eval(b'\xff')

And it prints:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }', vm/src/stdlib/builtins.rs:268:64 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Because the C language const char * type for string and also byte array, it can be represented as a single type with a compiler option. So I'll see more the compiler, parser section. If you have useful documentations to recommend, please leave them as comments 🙇🏻‍♂️

// _PyPegen_run_parser_from_string if (flags != NULL && flags->cf_flags & PyCF_IGNORE_COOKIE) { tok = PyTokenizer_FromUTF8(str, exec_input); } else { tok = PyTokenizer_FromString(str, exec_input); }

What I thought was .unwrap() need to be avoided because CPython would not panic but raise a SyntaxError.
The major difference between FromUTF8 and FromString is using decoder or not. The decode error seems to be wrapped by SyntaxError somewhere.

Ah, I agree with your thought. 🙏🏻 I'll work for wrapping the decode error.

When you execute eval(b'\xff') in CPython implementation, you will see the below output:

>>> eval(b"\xff") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 1 � ^ SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

I set SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte message as format and I tried to apply it in 714ce4d.

After the commit, in RustPython:

>>>>> eval(b'\xff') Traceback (most recent call last): File "<stdin>", line 1, in <module> SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

youknowone

looks great! I left a few minor change requests

vm/src/stdlib/builtins.rs

youknowone · 2022-08-13T18:27:53Z

vm/src/stdlib/builtins.rs

+                match std::str::from_utf8(source) {
+                    Ok(s) => Ok(Either::A(vm.ctx.new_str(s))),
+                    Err(err) => {
+                        let msg = format!(
+                            "(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte",
+                            source[err.valid_up_to()],
+                            err.valid_up_to()
+                        );
+                        Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg))
+                    }


I prefer to handle error first not to disturb to read main control flow

Suggested change

match std::str::from_utf8(source) {

Ok(s) => Ok(Either::A(vm.ctx.new_str(s))),

Err(err) => {

let msg = format!(

"(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte",

source[err.valid_up_to()],

err.valid_up_to()

);

Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg))

}

let source = std::str::from_utf8(source).map_err(|err| {

let msg = format!(

"(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte",

source[err.valid_up_to()],

err.valid_up_to()

);

Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg))

})?;

Either::A(vm.ctx.new_str(s))),

I applied it in c672d8f.

youknowone · 2022-08-13T18:40:13Z

vm/src/stdlib/builtins.rs

+        let code = match source {
+            Either::A(either) => {
+                let source: &[u8] = &either.borrow_bytes();
+                if source.iter().any(|&b| b == 0) {


Suggested change

if source.iter().any(|&b| b == 0) {

if source.contains(&0) {

I am sorry, I was forgetting about this but slice has contains method

I applied it in 879ed6e

Co-authored-by: Jeong YunWon <[email protected]>

youknowone

Thank you!

moreal added 3 commits August 13, 2022 02:22

Make eval able to receive bytes also

56f08a1

Raise ValueError if null character exists for eval argument

bb8526a

Remove todo comment of resolved issue

c7689c9

fanninpm reviewed Aug 12, 2022

View reviewed changes

moreal added 3 commits August 13, 2022 07:29

Apply cargo fmt

20e7752

Inline error type

7e88c8f

Unmark resolved tests

6d01742

moreal requested a review from fanninpm August 13, 2022 03:12

youknowone reviewed Aug 13, 2022

View reviewed changes

moreal added 2 commits August 13, 2022 19:08

Use ArgStrOrBytesLike python argument type

0bd702d

Wrap unicode error as syntax error

714ce4d

moreal requested a review from youknowone August 13, 2022 18:21

youknowone reviewed Aug 13, 2022

View reviewed changes

moreal and others added 2 commits August 14, 2022 03:47

Make loop simple with `slice::contains

879ed6e

Co-authored-by: Jeong YunWon <[email protected]>

Make mapping error simple with map_err

c672d8f

moreal force-pushed the correct-eval branch from 6f8a7d4 to c672d8f Compare August 13, 2022 18:48

moreal requested a review from youknowone August 13, 2022 18:49

youknowone approved these changes Aug 13, 2022

View reviewed changes

youknowone added the z-ca-2022 Tag to track contrubution-academy 2022 label Aug 13, 2022

youknowone merged commit d82c2b0 into RustPython:main Aug 13, 2022

Raise ValueError if \x00 character exists for eval argument #4052

Raise ValueError if \x00 character exists for eval argument #4052

Uh oh!

Conversation

moreal commented Aug 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moreal commented Aug 13, 2022

Uh oh!

youknowone Aug 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moreal Aug 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

youknowone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

youknowone Aug 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

youknowone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Raise `ValueError` if `\x00` character exists for `eval` argument #4052

Raise `ValueError` if `\x00` character exists for `eval` argument #4052

moreal commented Aug 12, 2022 •

edited

Loading

youknowone Aug 13, 2022 •

edited

Loading

moreal Aug 13, 2022 •

edited

Loading

youknowone Aug 13, 2022 •

edited

Loading