-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
expr: fix some multibyte issues #8606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
de55ab3
to
82fde69
Compare
Should make tests/expr/expr-multibyte pass
82fde69
to
2322bc7
Compare
GNU testsuite comparison:
|
let pattern_str = match encoding { | ||
UEncoding::Utf8 => { | ||
// In UTF-8 locale, try to parse as UTF-8 | ||
match String::from_utf8(right_bytes.clone()) { | ||
Ok(s) => { | ||
check_posix_regex_errors(&s)?; | ||
s | ||
} | ||
Err(_) => { | ||
// Invalid UTF-8 pattern in UTF-8 locale - use lossy conversion | ||
let s = String::from_utf8_lossy(&right_bytes).into_owned(); | ||
check_posix_regex_errors(&s)?; | ||
s | ||
} | ||
} | ||
} | ||
UEncoding::Ascii => { | ||
// In C locale, validate pattern and use lossy conversion | ||
let s = String::from_utf8_lossy(&right_bytes).into_owned(); | ||
check_posix_regex_errors(&s)?; | ||
s | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, I think you don't have to distinguish between UEncoding::Utf8
and UEncoding::Ascii
. It would allow you to simplify the block to something like:
let pattern_str = match encoding { | |
UEncoding::Utf8 => { | |
// In UTF-8 locale, try to parse as UTF-8 | |
match String::from_utf8(right_bytes.clone()) { | |
Ok(s) => { | |
check_posix_regex_errors(&s)?; | |
s | |
} | |
Err(_) => { | |
// Invalid UTF-8 pattern in UTF-8 locale - use lossy conversion | |
let s = String::from_utf8_lossy(&right_bytes).into_owned(); | |
check_posix_regex_errors(&s)?; | |
s | |
} | |
} | |
} | |
UEncoding::Ascii => { | |
// In C locale, validate pattern and use lossy conversion | |
let s = String::from_utf8_lossy(&right_bytes).into_owned(); | |
check_posix_regex_errors(&s)?; | |
s | |
} | |
}; | |
let pattern_str = String::from_utf8(right_bytes.clone()) | |
.unwrap_or_else(|_| String::from_utf8_lossy(&right_bytes).into()); | |
check_posix_regex_errors(&pattern_str)?; |
Some(&mut region), | ||
); | ||
|
||
if let Some(_pos) = pos { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use is_some()
as you don't care about the value:
if let Some(_pos) = pos { | |
if pos.is_some() { |
Some(&mut region), | ||
); | ||
|
||
if let Some(_pos) = pos { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here:
if let Some(_pos) = pos { | |
if pos.is_some() { |
Some(&mut region), | ||
); | ||
|
||
if let Some(_pos) = pos { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here:
if let Some(_pos) = pos { | |
if pos.is_some() { |
@@ -369,6 +277,257 @@ fn check_posix_regex_errors(pattern: &str) -> ExprResult<()> { | |||
} | |||
} | |||
|
|||
/// Evaluate a match expression with locale-aware regex matching | |||
fn evaluate_match_expression(left_bytes: Vec<u8>, right_bytes: Vec<u8>) -> ExprResult<NumOrStr> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would split this function into multiple functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that can be split in a "build_regex" and a "find_match" functions
Should make tests/expr/expr-multibyte pass