-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(sort): add locale-aware numeric sorting support(sort-h-thousands-sep.sh) #9848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
CodSpeed Performance ReportMerging this PR will not alter performanceComparing Summary
Footnotes
|
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
it needs tests |
4e5498a to
be1ea65
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
ff7e69e to
22fa140
Compare
|
GNU testsuite comparison: |
1 similar comment
|
GNU testsuite comparison: |
9688011 to
cfbcf7e
Compare
|
GNU testsuite comparison: |
1 similar comment
|
GNU testsuite comparison: |
|
Hey Matt! #10339 (comment) we had a discussion here about the two different PR's for the thousands seperator here, do you think it would be possible to rebase your PR to the latest mainline to fix the issues related to humanreadable suffixes and the tokenizer issues. |
Implement NumericLocaleSettings to handle thousands separators and decimal points based on locale. Update tokenization logic to accommodate blank thousands separators for numeric and human-numeric modes, improving parsing of locale-specific numbers. Also refactor numeric locale detection for safety/readability and clean up related initialization/spell-checker ignore.
β¦in sv_SE locale Add a new test function `test_human_numeric_blank_thousands_sep_locale` to verify that the sort utility correctly handles human-readable numeric sorting when the locale's thousands separator is a blank space (e.g., in sv_SE.UTF-8 or sv_SE). This ensures proper behavior of the `-h` flag with key-based sorting in such locales, preventing potential sorting errors with space-separated numeric strings.
Use array slice for trim_end_matches and String::len for length check to improve readability and efficiency in test_human_numeric_blank_thousands_sep_locale.
Implement NumericLocaleSettings to handle thousands separators and decimal points based on locale. Update tokenization logic to accommodate blank thousands separators for numeric and human-numeric modes, improving parsing of locale-specific numbers. Also refactor numeric locale detection for safety/readability and clean up related initialization/spell-checker ignore.
6d50fc4 to
9dc0648
Compare
Use struct literal initialization instead of creating a mutable default and assigning fields, improving code conciseness and readability without changing functionality.
|
GNU testsuite comparison: |
- Ignore thousands separators in debug annotations to match GNU output - Simplify NumInfo parsing by removing redundant thousands separator logic - Enhance detection of numeric locale settings to handle multibyte separators like NBSP correctly, maintaining single-byte behavior for compatibility with upstream GNU coreutils
|
GNU testsuite comparison: |
|
Any idea how we broke sort-debug-keys and sort-debug-warn in the last merge? I think its because the default grouping for C is an empty string instead of whats provided by the ICU library |
src/uu/sort/src/sort.rs
Outdated
| let ignore_non_printing = matches.get_flag(options::IGNORE_NONPRINTING); | ||
| let ignore_case = matches.get_flag(options::IGNORE_CASE); | ||
|
|
||
| if ordering_incompatible(mode_flags, dictionary_order, ignore_non_printing) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the
!matches.contains_id(options::KEY)
&&
For fixing the issue with sort-debug-keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked out this pr locally and I think this is the only thing stopping the debug warn gnu test from passing, mind adding it to this pr too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked out this pr locally and I think this is the only thing stopping the debug warn gnu test from passing, mind adding it to this pr too?
ok add
|
In the i18n helper function for GROUPING_SEP can we override the default grouping seperator for the C locale? I was thinking something like this? |
- Update detect_numeric_locale to check for C locale (ASCII encoding and "und" locale) - In C locale, set thousands_sep to None to avoid incorrect grouping separators - Adjust test expectations to match new sorting behavior for numeric fields in C locale
The assignment of NumInfo::parse result was reformatted by splitting it across two lines to enhance code readability and adhere to line length guidelines.
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
β¦ixture Update the expected output for the multiple decimals numeric sort test to reflect the proper ascending order. The values "576,446.88800000" and "576,446.890" were misplaced and have been repositioned to their correct locations in the sorted sequence, ensuring the test accurately validates the sorting logic. The debug fixture was updated accordingly.
|
GNU testsuite comparison: |
|
maybe do we it in a separate PR ? |
Previously, the ordering_incompatible check was performed unconditionally, causing errors even when the --key option was used, where such incompatibilities might not apply. This change adds a condition to skip the check if --key is present, ensuring correct behavior for key-based sorting.
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
Implement NumericLocaleSettings to handle thousands separators and decimal points based on locale. Update tokenization logic to accommodate blank thousands separators for numeric and human-numeric modes, ensuring proper parsing of numbers with locale-specific formatting. This enhances compatibility with international number representations.