write FractionalUCA_blanked.txt #1155
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
... like running blankweights.sed over FractionalUCA.txt.
The generated FractionalUCA.txt contains all of the data for building the CLDR root collation data. Diffing it across Unicode versions is nearly unusable, because inserting any character into the sort order changes the primary weights of every following character. And the fractional byte sequence weights are subject to manual and automatic tweaks in their computation.
Therefore, for many years I have used the blankweights.sed script to generate a modified file with “blanked weights” that preserves the sort order and the number of non-zero weights. I find diffing this file across versions valuable. See the UCA tools docs for some more details.
For example:
With these code changes, the Java code directly writes FractionalUCA_blanked.txt, just like it has also already written a FractionalUCA_SHORT.txt file (which omits the inline comments with allkeys.txt weights and character types & names). It should make the UCA-CLDR update workflow easier.
I am adding this file to the CLDR root collation data files.
I suggest reviewing one commit at a time. I intend to rebase & merge the three commits.