Optimized Unicode.simpleFold() … #101

mykeul · 2019-12-06T11:37:49Z

… which was the hot spot (35.8%) in my use-case. VisualVM no more sees it with this change.

coveralls · 2019-12-06T11:42:34Z

Pull Request Test Coverage Report for Build 283

0 of 0 changed or added relevant lines in 0 files are covered.
2 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.006%) to 94.043%

Files with Coverage Reduction	New Missed Lines	%
/home/travis/build/google/re2j/java/com/google/re2j/Unicode.java	1	66.67%
/home/travis/build/google/re2j/java/com/google/re2j/UnicodeTables.java	1	99.86%

Totals
Change from base Build 280:	-0.006%
Covered Lines:	2984
Relevant Lines:	3173

💛 - Coveralls

…se-case. VisualVM no more sees it with this change.

sjamesr

Very sorry for the delay in this review. Since UnicodeTables.java is generated, the changes to that file should be implemented by modifying the make_unicode_tables.awk script.

I think it might be worth writing the unicode tables generator in Java, I'm going to have a stab at that right now.

This change does give a nice speedup in compilation and matching, probably worth the 35kb sparse array that it creates.

alandonovan

There's no need to rewrite the generator script. The necessary change can easily be made to the awk script.

alandonovan · 2020-05-31T01:22:17Z

java/com/google/re2j/UnicodeTables.java

+      {0x212A, 0x004B},
+      {0x212B, 0x00C5},
+    };
+    final char[] result = new char[tmp[tmp.length - 1][0] + 1];


A comment is necessary here.

// Precompute the case folding mapping to avoid binary search at run time.
// The 'result' array maps each cased char to the next char in its orbit.
// The orbit is a cycle such as k -> K -> K [Kelvin] -> k.

sjamesr · 2020-05-31T06:27:48Z

No need to rewrite it to support this change, but it could definitely use rewriting. I have a prototype that uses ICU4J to emit the same information.

@mykeul

This implements the approach taken by @mykeul in google#101. Instead of a dense array of (codepoint, case-folded codepoint) mappings, CASE_ORBIT becomes a sparse array whose indices represent the key in this map.

@mykeul

This implements the approach taken by @mykeul in google#101. Instead of a dense array of (codepoint, case-folded codepoint) mappings, CASE_ORBIT becomes a sparse array whose indices represent the key in this map. The previous pull request was written before UnicodeTablesGenerator existed.

@mykeul

This implements the approach taken by @mykeul in #101. Instead of a dense array of (codepoint, case-folded codepoint) mappings, CASE_ORBIT becomes a sparse array whose indices represent the key in this map. The previous pull request was written before UnicodeTablesGenerator existed.

sjamesr · 2020-06-09T16:36:46Z

Thank you, I implemented the same approach with the new UnicodeTablesGenerator in #114

googlebot added the cla: yes label Dec 6, 2019

Optimized Unicode.simpleFold() which was the hot spot (35.8%) in my u…

234765d

…se-case. VisualVM no more sees it with this change.

mykeul force-pushed the optimized-simpleFold branch from 68d56a0 to 234765d Compare December 6, 2019 12:55

sjamesr reviewed May 31, 2020

View reviewed changes

alandonovan reviewed May 31, 2020

View reviewed changes

sjamesr mentioned this pull request Jun 9, 2020

eliminate runtime binary search in simpleCodeFold #114

Merged

sjamesr closed this Jun 9, 2020

mykeul deleted the optimized-simpleFold branch June 10, 2020 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized Unicode.simpleFold() … #101

Optimized Unicode.simpleFold() … #101

Uh oh!

mykeul commented Dec 6, 2019

Uh oh!

coveralls commented Dec 6, 2019 •

edited

Loading

Uh oh!

sjamesr left a comment •

edited

Loading

Uh oh!

alandonovan left a comment

Uh oh!

alandonovan May 31, 2020

Uh oh!

sjamesr commented May 31, 2020

Uh oh!

sjamesr commented Jun 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Optimized Unicode.simpleFold() … #101

Optimized Unicode.simpleFold() … #101

Uh oh!

Conversation

mykeul commented Dec 6, 2019

Uh oh!

coveralls commented Dec 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 283

💛 - Coveralls

Uh oh!

sjamesr left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alandonovan left a comment

Choose a reason for hiding this comment

Uh oh!

alandonovan May 31, 2020

Choose a reason for hiding this comment

Uh oh!

sjamesr commented May 31, 2020

Uh oh!

sjamesr commented Jun 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coveralls commented Dec 6, 2019 •

edited

Loading

sjamesr left a comment •

edited

Loading