Also available: 韓國語 (Korean).
Gukhanmun converts Korean text written in mixed script (國漢文混用體, hanja characters interleaved with hangul) into hangul-only text. It is the successor to Seonbi, narrowed to the hanja-to-hangul conversion pipeline and extended along several axes: streaming I/O, pluggable dictionaries, lattice-based segmentation, and a wider range of output formats. The library is implemented in Rust and exposed as a Rust library, a command-line tool, a WebAssembly package, and a native Node-API addon.
- Lattice segmentation finds the best split rather than greedily taking the longest match. 行事場所 segments as 行事 + 場所, not 行事場 + 所.
- Pluggable dictionaries: in-memory map, mmap-friendly FST files, or CDB
files, composable via
ChainDictionary. - The bundled Standard Korean Language Dictionary (標準國語大辭典) and Open Korean Dictionary (우리말샘) ship as compiled FST/CDB, so there is nothing extra to download.
- Format adapters for plain text, HTML fragments, and Markdown. The engine is format-neutral; adapters handle parsing and serialization.
- Five rendering modes: hangul-only, hangul(hanja) parentheses, hanja(hangul) parentheses, ruby markup, and original mixed script with selective glossing.
- Streaming-first: the engine buffers only within a single hanja conversion span, not the whole document.
- Initial sound law (頭音法則) for South Korean orthography, applied to fallback readings. Dictionary entries encode the correct reading already.
- The core crate (
gukhanmun-core) isno_std+alloc, suitable for embedded targets. - JavaScript and TypeScript bindings ship in two flavours: a WebAssembly package that runs in browsers, Deno, Node.js, Bun, and edge runtimes, and a native Node-API addon for higher server-side throughput.
If you use mise, install a prebuilt binary with a single command:
mise use -g aqua:dahlia/gukhanmunThe -g flag installs it globally. Omit it to activate the tool only in the
current project directory.
On Windows, install via winget:
winget install HongMinhee.GukhanmunIf you have a Rust toolchain installed, install from crates.io:
cargo install gukhanmun-cli gukhanmun-mkdictThis compiles the binaries and places them in ~/.cargo/bin/. Make sure that
directory is on your PATH.
Prebuilt binaries for Linux (x86_64, aarch64), macOS (x86_64, aarch64), and Windows (x86_64) are attached to each release on GitHub:
https://github.com/dahlia/gukhanmun/releases
Download the archive for your platform, extract it, and place the gukhanmun
and gukhanmun-mkdict executables somewhere on your PATH.
Add to Cargo.toml:
cargo add gukhanmun-coreOptionally add format adapters and dictionary backends:
cargo add gukhanmun-html gukhanmun-markdown
cargo add gukhanmun-stdict gukhanmun-opendict
cargo add gukhanmun-fst gukhanmun-cdbInstall the WebAssembly package for most JavaScript environments:
npm add @gukhanmun/wasm @gukhanmun/stdict-fst
pnpm add @gukhanmun/wasm @gukhanmun/stdict-fst
yarn add @gukhanmun/wasm @gukhanmun/stdict-fst
bun add @gukhanmun/wasm @gukhanmun/stdict-fst
deno add --jsr @gukhanmun/wasm @gukhanmun/stdict-fstOf you need better server-side performance and don't mind a native dependency, install the Node-API package instead:
npm add @gukhanmun/napi @gukhanmun/stdict-fst
pnpm add @gukhanmun/napi @gukhanmun/stdict-fst
yarn add @gukhanmun/napi @gukhanmun/stdict-fst
bun add @gukhanmun/napi @gukhanmun/stdict-fst
deno add npm:@gukhanmun/napi jsr:@gukhanmun/stdict-fstecho "漢字 北京 標識" | gukhanmun
# → 한자 베이징 표지
echo "漢字" | gukhanmun --rendering hangul-hanja-parens
# → 한자(漢字)
echo "<p>漢字</p>" | gukhanmun --format text/html
# → <p>한자</p>use gukhanmun_core::{MapDictionary, RenderMode, convert_plain_text};
let mut dict = MapDictionary::new();
dict.insert("漢字", "한자");
let output = convert_plain_text("漢字", &dict, RenderMode::HangulOnly);
assert_eq!(output, "한자");For the full guide, including HTML/Markdown adapters, rendering modes, presets, and the JavaScript API, visit https://gukhanmun.org/.
The project is a Cargo workspace. All crates share the same version.
| Crate | Description |
|---|---|
gukhanmun-core |
Format-neutral IR, engine, dictionary trait, lattice segmenter, fallback phoneticizer. no_std + alloc. |
gukhanmun-html |
HTML fragment reader and writer; HtmlScopeData with lang inheritance and preserved-tag handling. |
gukhanmun-markdown |
Markdown adapter over pulldown-cmark; inline HTML is re-scanned for lang attributes. |
gukhanmun-fst |
FST-backed HanjaDictionary implementation for mmap-friendly on-disk dictionaries. |
gukhanmun-cdb |
CDB-trie HanjaDictionary implementation; trivially auditable on-disk format. |
gukhanmun-stdict |
The bundled Standard Korean Language Dictionary as an embedded FST byte array. |
gukhanmun-opendict |
The bundled Open Korean Dictionary (우리말샘) data. |
gukhanmun-dict-extract |
Shared dictionary dump extraction helpers. |
gukhanmun-mkdict |
CLI tool to build FST or CDB dictionary files from TSV, CSV, or JSON Lines input. |
gukhanmun-cli |
The gukhanmun command-line binary. |
The project also publishes seven JavaScript packages, all sharing the same version as the Rust crates.
| Package | JSR | npm | Description |
|---|---|---|---|
@gukhanmun/types |
JSR | npm | TypeScript type declarations shared by the WASM and NAPI packages. No runtime code. |
@gukhanmun/wasm |
JSR | npm | WebAssembly build. Runs in browsers, Deno, Node.js, and Bun. |
@gukhanmun/napi |
npm | Native Node.js addon via napi-rs. Faster than WASM for server-side use. | |
@gukhanmun/stdict-fst |
JSR | npm | Bundled Standard Korean Language Dictionary in FST format. |
@gukhanmun/stdict-cdb |
JSR | npm | Bundled Standard Korean Language Dictionary in CDB format. |
@gukhanmun/opendict-fst |
JSR | npm | Bundled Open Korean Dictionary categories in FST format. |
@gukhanmun/opendict-cdb |
JSR | npm | Bundled Open Korean Dictionary categories in CDB format. |
DESIGN.md covers the full architecture: intermediate representation, lattice segmentation algorithm, dictionary trait design, middleware system, and format adapter internals.
Distributed under GPL 3.0. See LICENSE.