WiP New encoding for the Unicode database. #5289
Draft
+2,479
−3,821
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The Unicode database used by
jl.Character.getTypeand the variousjl.Character.isXYZmethods is now entirely generated by an sbt task.The relevant methods are isolated in a dedicated object
UnicodeData, to allow DCE without slowing down access to them.More importantly, we use a new encoding that is both more code size-efficient and run-time-efficient.
For run-time, we store each range as a pair
(firstCP, data)in a single array. The data contain the type and a bit-field of properties that cannot directly be derived from the type. A single, efficient methoddataHasAnyFlagcan test for a set of types and a set of properties. For example,isAlphabeticreturnstruewhen either the type is any of the five majorXYZ_LETTERtypes orLETTER_NUMBER, or when the propertyOtherAlphabeticis on.For code size, we compress each pair as a single
Int. It contains thedata(minus "computed" fields of the bit-field that are computed at load time) as well as adifffrom the previous range'sfirstCPto the current one. Some diffs are too large to fit in the available bits, and are special-cased.Since the logic is bit hackery-heavy, we also generate exhaustive test data with straightforward logic.