This is a datawork generator repo, designed to be part of an ETL workflow.
The output is an object with hanzi-radical key-value pairs.
The keys are Hanzi (Chinese characters), and the values are the radical numbers (1-214):
Here is an example output of the JSON object:
{
...
"㗑": 30,
"㗒": 30,
"㗓": 30,
"㗔": 30,
"㗕": 30,
...
}
The data source is Unicode Consortium's Unihan Database (Unihan.zip).
Unified Character Database (UCD) provides important meta data for Unicode Han characters.
The files used in this generator repo have been archived in the zip file called Unihan.zip
.
Unihan.zip is available in the UCD.
- Extract data from Unihan (UCD).
- Transform data into key-value pairs.
- Load the JSON file into modern application.
+ Import into JavaScript module.
+ Filter with
jq
Donwload the zip file:
$ wget https://www.unicode.org/Public/UCD/latest/ucd/Unihan.zip
Unzip the data files into the Unihan
folder:
$ unzip Unihan.zip -d Unihan
Extract radical-stroke (kRSUnicode
) data from text.
$ cat Unihan/Unihan_IRGSources.txt | grep kRSUnicode > rs.txt
Build the index file (hr.json
):
$ node mkhrjson.js
- OUTPUTS:
hr.json
- INTERMEDIATE FILES:
rs.txt
- MIT License