-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Raise ValueError if \x00 character exists for eval argument
#4052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # TODO: RUSTPYTHON, RustPython raises a SyntaxError here, but cpython raise a ValueError | ||
| error = SyntaxError if platform.python_implementation() == 'RustPython' else ValueError | ||
| error = ValueError | ||
| with assert_raises(error): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| with assert_raises(error): | |
| with assert_raises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied your suggestion in 7e88c8f. Thank you! 🙏🏻
|
About the |
vm/src/stdlib/builtins.rs
Outdated
| source: Either< | ||
| Either<PyStrRef, crate::builtins::PyBytesRef>, | ||
| PyRef<crate::builtins::PyCode>, | ||
| >, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this exactly (str|bytes) or ArgStrOrBytesLike?
ArgStrOrBytesLike accepts more types like bytearray. Some functions take exactly bytes, while other functions take also bytearray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run rustpython with ArgStrOrBytesLike, it seems very suit for this case. string, bytes or code is just error message and the eval also receives bytearray Buffer 🤔 . So I'll push a commit apply ArgStrOrBytesLike. Thanks! 🙏🏻
vm/src/stdlib/builtins.rs
Outdated
| } | ||
|
|
||
| Ok(Either::A( | ||
| vm.ctx.new_str(std::str::from_utf8(source).unwrap()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if bytes include non-ascii bytes?
what happens if bytes include invalid utf8 character?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if bytes include invalid utf8 character?
It causes panic. I tested with the below python code:
eval(b'\xff')And it prints:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }', vm/src/stdlib/builtins.rs:268:64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Because the C language const char * type for string and also byte array, it can be represented as a single type with a compiler option. So I'll see more the compiler, parser section. If you have useful documentations to recommend, please leave them as comments 🙇🏻♂️
// _PyPegen_run_parser_from_string
if (flags != NULL && flags->cf_flags & PyCF_IGNORE_COOKIE) {
tok = PyTokenizer_FromUTF8(str, exec_input);
} else {
tok = PyTokenizer_FromString(str, exec_input);
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I thought was .unwrap() need to be avoided because CPython would not panic but raise a SyntaxError.
The major difference between FromUTF8 and FromString is using decoder or not. The decode error seems to be wrapped by SyntaxError somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I agree with your thought. 🙏🏻 I'll work for wrapping the decode error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you execute eval(b'\xff') in CPython implementation, you will see the below output:
>>> eval(b"\xff")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
�
^
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I set SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte message as format and I tried to apply it in 714ce4d.
After the commit, in RustPython:
>>>>> eval(b'\xff')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
youknowone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great! I left a few minor change requests
vm/src/stdlib/builtins.rs
Outdated
| match std::str::from_utf8(source) { | ||
| Ok(s) => Ok(Either::A(vm.ctx.new_str(s))), | ||
| Err(err) => { | ||
| let msg = format!( | ||
| "(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte", | ||
| source[err.valid_up_to()], | ||
| err.valid_up_to() | ||
| ); | ||
| Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to handle error first not to disturb to read main control flow
| match std::str::from_utf8(source) { | |
| Ok(s) => Ok(Either::A(vm.ctx.new_str(s))), | |
| Err(err) => { | |
| let msg = format!( | |
| "(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte", | |
| source[err.valid_up_to()], | |
| err.valid_up_to() | |
| ); | |
| Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg)) | |
| } | |
| let source = std::str::from_utf8(source).map_err(|err| { | |
| let msg = format!( | |
| "(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte", | |
| source[err.valid_up_to()], | |
| err.valid_up_to() | |
| ); | |
| Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg)) | |
| })?; | |
| Either::A(vm.ctx.new_str(s))), | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied it in c672d8f.
vm/src/stdlib/builtins.rs
Outdated
| let code = match source { | ||
| Either::A(either) => { | ||
| let source: &[u8] = &either.borrow_bytes(); | ||
| if source.iter().any(|&b| b == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if source.iter().any(|&b| b == 0) { | |
| if source.contains(&0) { |
I am sorry, I was forgetting about this but slice has contains method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied it in 879ed6e
Co-authored-by: Jeong YunWon <[email protected]>
youknowone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
It isn't a full perfect solution for
evalimplementation but it fixes:evalbecame to receivebytesalso.evalraisesValueErrorwhen there isnull(\x00) character before compiling.You can see also:
_Py_SourceAsStringp.s. The message says like it should receive
stringorbytesorcodebut it also receivesbytearraywell. 🤔