Thanks to visit codestin.com
Credit goes to lib.rs

1 unstable release

0.1.0 Nov 28, 2022

#2631 in Text processing

Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App

477 downloads per month
Used in idfy_common

GPL-3.0 license

1MB
2K SLoC

kakasi

crates.io docs.rs licence

kakasi is a Rust library to transliterate hiragana, katakana and kanji (Japanese text) into rōmaji (Latin/Roman alphabet).

It was ported from the pykakasi library which itself is a port of the original kakasi library written in C.

Usage

Transliterate:

let res = kakasi::convert("こんにちは世界!");
assert_eq!(res.hiragana, "こんにちはせかい!");
assert_eq!(res.romaji, "konnichiha sekai!");

Check if a string contains Japanese characters:

use kakasi::IsJapanese;

assert_eq!(kakasi::is_japanese("Abc"), IsJapanese::False);
assert_eq!(kakasi::is_japanese("日本"), IsJapanese::Maybe);
assert_eq!(kakasi::is_japanese("ラスト"), IsJapanese::True);

CLI

$ cargo install kakasi

## Convert to romaji
$ kakasi こんにちは世界!
konnichiha sekai!

## Convert to hiragana
$ kakasi -k こんにちは世界!
こんにちはせかい!

## Read from file
$ kakasi -f rust_article.txt

## Read from STDIN
$ echo "こんにちは世界!" | kakasi

Performance

CPU: AMD Ryzen 7 5700G

Text Conversion time Speed
Sentence (161 B) 7.0911 µs 22.70 MB/s
Rust wikipedia article (31705 B) 1.5055 ms 21.06 MB/s

CLI comparison

Time to convert a 100KB test file using the CLI:

Library Time Speed
kakasi (Rust) 7.4 ms 13.5 MB/s
kakasi (C) 33.5 ms 2.99 MB/s
pykakasi (Python) 810.6 ms 0.123 MB/s

Test commands:

CLI performance was measured with hyperfine.

hyperfine --warmup 3 'cat 100K.txt | kakasi-rs'
hyperfine --warmup 3 'cat 100K.txt | kakasi -i utf-8 -Ka -Ha -Ja -Sa -s'
hyperfine --warmup 3 'cat 100K.txt | python bin/kakasi -Ka -Ha -Ja -Sa -s'

License

kakasi is published under the GNU GPL-3.0 license.

The Kakasi dictionaries (Files: codegen/dict/kakasidict.utf8, codegen/dict/itajidict.utf8, codegen/dict/hepburn.utf8) were taken from the pykakasi project, published under the GNU GPL-3.0 license.

pykakasi

Copyright (C) 2010-2021 Hiroshi Miura and contributors(see AUTHORS)

The dictionaries originate from the kakasi project, published under the GNU GPL-2.0 license.

original kakasi

Copyright (C) 1992 1993 1994
Hironobu Takahashi ([email protected]),
Masahiko Sato ([email protected]),
Yukiyoshi Kameyama, Miki Inooka, Akihiko Sasaki, Dai Ando, Junichi Okukawa,
Katsushi Sato and Nobuhiro Yamagishi

For testing I included a copy of the Japanese Rust wikipedia article (tests/rust_article.txt). The article is published under the Creative Commons Attribution-ShareAlike License 3.0.

Dependencies