Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@mykeul
Copy link
Contributor

@mykeul mykeul commented Dec 6, 2019

… which was the hot spot (35.8%) in my use-case. VisualVM no more sees it with this change.

@coveralls
Copy link

coveralls commented Dec 6, 2019

Pull Request Test Coverage Report for Build 283

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.006%) to 94.043%

Files with Coverage Reduction New Missed Lines %
/home/travis/build/google/re2j/java/com/google/re2j/Unicode.java 1 66.67%
/home/travis/build/google/re2j/java/com/google/re2j/UnicodeTables.java 1 99.86%
Totals Coverage Status
Change from base Build 280: -0.006%
Covered Lines: 2984
Relevant Lines: 3173

💛 - Coveralls

…se-case. VisualVM no more sees it with this change.
@mykeul mykeul force-pushed the optimized-simpleFold branch from 68d56a0 to 234765d Compare December 6, 2019 12:55
Copy link
Contributor

@sjamesr sjamesr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very sorry for the delay in this review. Since UnicodeTables.java is generated, the changes to that file should be implemented by modifying the make_unicode_tables.awk script.

I think it might be worth writing the unicode tables generator in Java, I'm going to have a stab at that right now.

This change does give a nice speedup in compilation and matching, probably worth the 35kb sparse array that it creates.

Copy link
Contributor

@alandonovan alandonovan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to rewrite the generator script. The necessary change can easily be made to the awk script.

{0x212A, 0x004B},
{0x212B, 0x00C5},
};
final char[] result = new char[tmp[tmp.length - 1][0] + 1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment is necessary here.

// Precompute the case folding mapping to avoid binary search at run time.
// The 'result' array maps each cased char to the next char in its orbit.
// The orbit is a cycle such as k -> K -> K [Kelvin] -> k.

@sjamesr
Copy link
Contributor

sjamesr commented May 31, 2020

No need to rewrite it to support this change, but it could definitely use rewriting. I have a prototype that uses ICU4J to emit the same information.

sjamesr added a commit to sjamesr/re2j that referenced this pull request Jun 9, 2020
This implements the approach taken by @mykeul in
google#101. Instead of a dense array of
(codepoint, case-folded codepoint) mappings, CASE_ORBIT becomes a sparse
array whose indices represent the key in this map.
sjamesr added a commit to sjamesr/re2j that referenced this pull request Jun 9, 2020
This implements the approach taken by @mykeul in
google#101. Instead of a dense array of
(codepoint, case-folded codepoint) mappings, CASE_ORBIT becomes a sparse
array whose indices represent the key in this map.

The previous pull request was written before UnicodeTablesGenerator
existed.
sjamesr added a commit that referenced this pull request Jun 9, 2020
This implements the approach taken by @mykeul in
#101. Instead of a dense array of
(codepoint, case-folded codepoint) mappings, CASE_ORBIT becomes a sparse
array whose indices represent the key in this map.

The previous pull request was written before UnicodeTablesGenerator
existed.
@sjamesr
Copy link
Contributor

sjamesr commented Jun 9, 2020

Thank you, I implemented the same approach with the new UnicodeTablesGenerator in #114

@sjamesr sjamesr closed this Jun 9, 2020
@mykeul mykeul deleted the optimized-simpleFold branch June 10, 2020 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants