Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Optimize String#casecmp? for ASCII-only strings #13711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sferik
Copy link
Contributor

@sferik sferik commented Jun 26, 2025

Optimized the routine for case-insensitive string comparison to avoid unnecessary allocations for ASCII-only strings. The check now uses ENC_CODERANGE and a direct comparison before falling back to Unicode case folding.

@nobu
Copy link
Member

nobu commented Jun 26, 2025

Have you tried the benchmark, make benchmark ITEM=string_casecmp_p.yml?

It doesn't seem significant differences.

compare-ruby: ruby 3.5.0dev (2025-06-24T10:29:39Z master 45a2c95) +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-06-25T13:41:55Z master 077dbb8) +PRISM [arm64-darwin24]

compare-ruby built-ruby
casecmp_p-1 10.536M 10.520M
1.00x -
casecmp_p-10 4.052M 4.051M
1.00x -
casecmp_p-100 402.662k 391.737k
1.03x -
casecmp_p-1000 38.399k 38.077k
1.01x -
casecmp_p-nonascii1 1.507M 1.489M
1.01x -
casecmp_p-nonascii10 184.192k 183.097k
1.01x -
casecmp_p-nonascii100 19.785k 19.762k
1.00x -
casecmp_p-nonascii1000 2.002k 2.002k
1.00x -

@sferik
Copy link
Contributor Author

sferik commented Jun 26, 2025

Are you sure you ran the benchmark against my branch? It looks like compare-ruby and build-ruby are both pointing to different commits on the master branch.

Here are my results when comparing my branch against Ruby 3.5.0preview1:

compare-ruby: ruby 3.5.0preview1 (2025-04-18 master d06ec25be4) +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-06-26T03:14:35Z optimize-string-ca.. 5bc4a83d30) +PRISM [arm64-darwin25]
compare-ruby built-ruby
casecmp_p-1 8.952M 15.267M
- 1.71x
casecmp_p-10 3.575M 2.906M
1.23x -
casecmp_p-100 412.446k 321.833k
1.28x -
casecmp_p-1000 47.028k 33.194k
1.42x -
casecmp_p-nonascii1 1.358M 1.400M
- 1.03x
casecmp_p-nonascii10 175.101k 168.288k
1.04x -
casecmp_p-nonascii100 18.316k 17.500k
1.05x -
casecmp_p-nonascii1000 1.836k 1.760k

My code is around 1.7× faster for short strings (36 characters). My version is less efficient for longer strings (1.23× slower for 360-character strings, 1.28× slower for 3,600-character strings, and 1.42× slower for 36,000-character strings). In my testing, the break-even point is around 180 characters. My intuition is that most string comparisons are for strings shorter than 180 characters, especially if those strings are all ASCII characters. Also, if the strings are not equivalent, the comparison will be faster, since it will short-circuit as soon as two characters are not the same.

I think this patch will make most programs run faster. That said, it is not a pure win, so I will understand if you don't accept it. Thank you for taking the time to consider it.

@sferik sferik force-pushed the optimize-string-casecmp-for-ascii branch from 651a611 to f774071 Compare June 30, 2025 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants