Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Raise ValueError if \x00 character exists for eval argument #4052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 13, 2022

Conversation

moreal
Copy link
Contributor

@moreal moreal commented Aug 12, 2022

It isn't a full perfect solution for eval implementation but it fixes:

  • eval became to receive bytes also.
  • eval raises ValueError when there is null (\x00) character before compiling.

You can see also:

    str = _Py_SourceAsString(source, "eval", "string, bytes or code", &cf, &source_copy);

p.s. The message says like it should receive string or bytes or code but it also receives bytearray well. 🤔

@@ -5,8 +5,7 @@

dir_path = os.path.dirname(os.path.realpath(__file__))

# TODO: RUSTPYTHON, RustPython raises a SyntaxError here, but cpython raise a ValueError
error = SyntaxError if platform.python_implementation() == 'RustPython' else ValueError
error = ValueError
with assert_raises(error):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with assert_raises(error):
with assert_raises(ValueError):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied your suggestion in 7e88c8f. Thank you! 🙏🏻

@moreal
Copy link
Contributor Author

moreal commented Aug 13, 2022

About the CI / Check Rust code with rustfmt and clippy (pull_request) is broken, you can see #4051

@moreal moreal requested a review from fanninpm August 13, 2022 03:12
Comment on lines 251 to 254
source: Either<
Either<PyStrRef, crate::builtins::PyBytesRef>,
PyRef<crate::builtins::PyCode>,
>,
Copy link
Member

@youknowone youknowone Aug 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this exactly (str|bytes) or ArgStrOrBytesLike?
ArgStrOrBytesLike accepts more types like bytearray. Some functions take exactly bytes, while other functions take also bytearray

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run rustpython with ArgStrOrBytesLike, it seems very suit for this case. string, bytes or code is just error message and the eval also receives bytearray Buffer 🤔 . So I'll push a commit apply ArgStrOrBytesLike. Thanks! 🙏🏻

}

Ok(Either::A(
vm.ctx.new_str(std::str::from_utf8(source).unwrap()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if bytes include non-ascii bytes?
what happens if bytes include invalid utf8 character?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if bytes include invalid utf8 character?

It causes panic. I tested with the below python code:

eval(b'\xff')

And it prints:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }', vm/src/stdlib/builtins.rs:268:64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Because the C language const char * type for string and also byte array, it can be represented as a single type with a compiler option. So I'll see more the compiler, parser section. If you have useful documentations to recommend, please leave them as comments 🙇🏻‍♂️

// _PyPegen_run_parser_from_string
    if (flags != NULL && flags->cf_flags & PyCF_IGNORE_COOKIE) {
        tok = PyTokenizer_FromUTF8(str, exec_input);
    } else {
        tok = PyTokenizer_FromString(str, exec_input);
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I thought was .unwrap() need to be avoided because CPython would not panic but raise a SyntaxError.
The major difference between FromUTF8 and FromString is using decoder or not. The decode error seems to be wrapped by SyntaxError somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I agree with your thought. 🙏🏻 I'll work for wrapping the decode error.

Copy link
Contributor Author

@moreal moreal Aug 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you execute eval(b'\xff') in CPython implementation, you will see the below output:

>>> eval(b"\xff")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
    �
    ^
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

I set SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte message as format and I tried to apply it in 714ce4d.

After the commit, in RustPython:

>>>>> eval(b'\xff')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

@moreal moreal requested a review from youknowone August 13, 2022 18:21
Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! I left a few minor change requests

Comment on lines 267 to 276
match std::str::from_utf8(source) {
Ok(s) => Ok(Either::A(vm.ctx.new_str(s))),
Err(err) => {
let msg = format!(
"(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte",
source[err.valid_up_to()],
err.valid_up_to()
);
Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to handle error first not to disturb to read main control flow

Suggested change
match std::str::from_utf8(source) {
Ok(s) => Ok(Either::A(vm.ctx.new_str(s))),
Err(err) => {
let msg = format!(
"(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte",
source[err.valid_up_to()],
err.valid_up_to()
);
Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg))
}
let source = std::str::from_utf8(source).map_err(|err| {
let msg = format!(
"(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte",
source[err.valid_up_to()],
err.valid_up_to()
);
Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg))
})?;
Either::A(vm.ctx.new_str(s))),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied it in c672d8f.

let code = match source {
Either::A(either) => {
let source: &[u8] = &either.borrow_bytes();
if source.iter().any(|&b| b == 0) {
Copy link
Member

@youknowone youknowone Aug 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if source.iter().any(|&b| b == 0) {
if source.contains(&0) {

I am sorry, I was forgetting about this but slice has contains method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied it in 879ed6e

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@youknowone youknowone added the z-ca-2022 Tag to track contrubution-academy 2022 label Aug 13, 2022
@youknowone youknowone merged commit d82c2b0 into RustPython:main Aug 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
z-ca-2022 Tag to track contrubution-academy 2022
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants