-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Raise ValueError
if \x00
character exists for eval
argument
#4052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -5,8 +5,7 @@ | |||
|
|||
dir_path = os.path.dirname(os.path.realpath(__file__)) | |||
|
|||
# TODO: RUSTPYTHON, RustPython raises a SyntaxError here, but cpython raise a ValueError | |||
error = SyntaxError if platform.python_implementation() == 'RustPython' else ValueError | |||
error = ValueError | |||
with assert_raises(error): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with assert_raises(error): | |
with assert_raises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied your suggestion in 7e88c8f. Thank you! 🙏🏻
About the |
vm/src/stdlib/builtins.rs
Outdated
source: Either< | ||
Either<PyStrRef, crate::builtins::PyBytesRef>, | ||
PyRef<crate::builtins::PyCode>, | ||
>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this exactly (str
|bytes
) or ArgStrOrBytesLike
?
ArgStrOrBytesLike
accepts more types like bytearray
. Some functions take exactly bytes
, while other functions take also bytearray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run rustpython with ArgStrOrBytesLike
, it seems very suit for this case. string, bytes or code
is just error message and the eval
also receives bytearray
Buffer
🤔 . So I'll push a commit apply ArgStrOrBytesLike
. Thanks! 🙏🏻
vm/src/stdlib/builtins.rs
Outdated
} | ||
|
||
Ok(Either::A( | ||
vm.ctx.new_str(std::str::from_utf8(source).unwrap()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if bytes
include non-ascii bytes?
what happens if bytes
include invalid utf8 character?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if bytes include invalid utf8 character?
It causes panic
. I tested with the below python code:
eval(b'\xff')
And it prints:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }', vm/src/stdlib/builtins.rs:268:64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Because the C
language const char *
type for string and also byte array, it can be represented as a single type with a compiler option. So I'll see more the compiler, parser section. If you have useful documentations to recommend, please leave them as comments 🙇🏻♂️
// _PyPegen_run_parser_from_string
if (flags != NULL && flags->cf_flags & PyCF_IGNORE_COOKIE) {
tok = PyTokenizer_FromUTF8(str, exec_input);
} else {
tok = PyTokenizer_FromString(str, exec_input);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I thought was .unwrap()
need to be avoided because CPython would not panic but raise a SyntaxError
.
The major difference between FromUTF8 and FromString is using decoder or not. The decode error seems to be wrapped by SyntaxError somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I agree with your thought. 🙏🏻 I'll work for wrapping the decode error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you execute eval(b'\xff')
in CPython implementation, you will see the below output:
>>> eval(b"\xff")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
�
^
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I set SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
message as format and I tried to apply it in 714ce4d.
After the commit, in RustPython:
>>>>> eval(b'\xff')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great! I left a few minor change requests
vm/src/stdlib/builtins.rs
Outdated
match std::str::from_utf8(source) { | ||
Ok(s) => Ok(Either::A(vm.ctx.new_str(s))), | ||
Err(err) => { | ||
let msg = format!( | ||
"(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte", | ||
source[err.valid_up_to()], | ||
err.valid_up_to() | ||
); | ||
Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to handle error first not to disturb to read main control flow
match std::str::from_utf8(source) { | |
Ok(s) => Ok(Either::A(vm.ctx.new_str(s))), | |
Err(err) => { | |
let msg = format!( | |
"(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte", | |
source[err.valid_up_to()], | |
err.valid_up_to() | |
); | |
Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg)) | |
} | |
let source = std::str::from_utf8(source).map_err(|err| { | |
let msg = format!( | |
"(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte", | |
source[err.valid_up_to()], | |
err.valid_up_to() | |
); | |
Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg)) | |
})?; | |
Either::A(vm.ctx.new_str(s))), | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied it in c672d8f.
vm/src/stdlib/builtins.rs
Outdated
let code = match source { | ||
Either::A(either) => { | ||
let source: &[u8] = &either.borrow_bytes(); | ||
if source.iter().any(|&b| b == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if source.iter().any(|&b| b == 0) { | |
if source.contains(&0) { |
I am sorry, I was forgetting about this but slice has contains method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied it in 879ed6e
Co-authored-by: Jeong YunWon <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
It isn't a full perfect solution for
eval
implementation but it fixes:eval
became to receivebytes
also.eval
raisesValueError
when there isnull
(\x00
) character before compiling.You can see also:
_Py_SourceAsString
p.s. The message says like it should receive
string
orbytes
orcode
but it also receivesbytearray
well. 🤔