Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

sylvestre
Copy link
Contributor

Should make tests/expr/expr-multibyte pass

Should make tests/expr/expr-multibyte pass
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/expr/expr-multibyte is no longer failing!

@sylvestre sylvestre marked this pull request as ready for review September 10, 2025 20:14
@sylvestre sylvestre requested a review from cakebaker September 10, 2025 20:14
Comment on lines +289 to +311
let pattern_str = match encoding {
UEncoding::Utf8 => {
// In UTF-8 locale, try to parse as UTF-8
match String::from_utf8(right_bytes.clone()) {
Ok(s) => {
check_posix_regex_errors(&s)?;
s
}
Err(_) => {
// Invalid UTF-8 pattern in UTF-8 locale - use lossy conversion
let s = String::from_utf8_lossy(&right_bytes).into_owned();
check_posix_regex_errors(&s)?;
s
}
}
}
UEncoding::Ascii => {
// In C locale, validate pattern and use lossy conversion
let s = String::from_utf8_lossy(&right_bytes).into_owned();
check_posix_regex_errors(&s)?;
s
}
};
Copy link
Contributor

@cakebaker cakebaker Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, I think you don't have to distinguish between UEncoding::Utf8 and UEncoding::Ascii. It would allow you to simplify the block to something like:

Suggested change
let pattern_str = match encoding {
UEncoding::Utf8 => {
// In UTF-8 locale, try to parse as UTF-8
match String::from_utf8(right_bytes.clone()) {
Ok(s) => {
check_posix_regex_errors(&s)?;
s
}
Err(_) => {
// Invalid UTF-8 pattern in UTF-8 locale - use lossy conversion
let s = String::from_utf8_lossy(&right_bytes).into_owned();
check_posix_regex_errors(&s)?;
s
}
}
}
UEncoding::Ascii => {
// In C locale, validate pattern and use lossy conversion
let s = String::from_utf8_lossy(&right_bytes).into_owned();
check_posix_regex_errors(&s)?;
s
}
};
let pattern_str = String::from_utf8(right_bytes.clone())
.unwrap_or_else(|_| String::from_utf8_lossy(&right_bytes).into());
check_posix_regex_errors(&pattern_str)?;

Some(&mut region),
);

if let Some(_pos) = pos {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use is_some() as you don't care about the value:

Suggested change
if let Some(_pos) = pos {
if pos.is_some() {

Some(&mut region),
);

if let Some(_pos) = pos {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

Suggested change
if let Some(_pos) = pos {
if pos.is_some() {

Some(&mut region),
);

if let Some(_pos) = pos {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

Suggested change
if let Some(_pos) = pos {
if pos.is_some() {

@@ -369,6 +277,257 @@ fn check_posix_regex_errors(pattern: &str) -> ExprResult<()> {
}
}

/// Evaluate a match expression with locale-aware regex matching
fn evaluate_match_expression(left_bytes: Vec<u8>, right_bytes: Vec<u8>) -> ExprResult<NumOrStr> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split this function into multiple functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that can be split in a "build_regex" and a "find_match" functions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants