3 unstable releases
| 0.4.0 | Aug 21, 2025 |
|---|---|
| 0.3.1 | Aug 6, 2025 |
| 0.3.0 | Jul 12, 2025 |
#375 in Text processing
167 downloads per month
Used in ib-pinyin
1.5MB
16K
SLoC
ib-matcher
A multilingual, flexible and fast string, glob and regex matcher. Support 拼音匹配 (Chinese pinyin match) and ローマ字検索 (Japanese romaji match).
Features
- Unicode support
- Fully UTF-8 support and limited support for UTF-16 and UTF-32.
- Unicode case insensitivity (simple case folding).
- Chinese pinyin matching (拼音匹配)
- Support characters with multiple readings (i.e. heteronyms, 多音字).
- Support multiple pinyin notations, including Quanpin (全拼), Jianpin (简拼) and many Shuangpin (双拼) notations.
- Support mixing multiple notations during matching.
- Japanese romaji matching (ローマ字検索)
- Support characters with multiple readings (i.e. heteronyms, 同形異音語).
- Support Hepburn romanization system only at the moment.
- glob()-style pattern matching (i.e.
?,*,[]and**)- Support different anchor modes, treating surrounding wildcards as anchors and special anchors in file paths.
- Support two seperators (
//) or a complement separator (\) as a glob star (*/**).
- Regular expression
- Support the same syntax as
regex, including wildcards, repetitions, alternations, groups, etc. - Support custom matching callbacks, which can be used to implement ad hoc look-around, backreferences, balancing groups/recursion/subroutines, combining domain-specific parsers, etc.
- Support the same syntax as
- Relatively high performance
- Generally on par with the
regexcrate, depending on the case it can be faster or slower.
- Generally on par with the
And all of the above features are optional. You don't need to pay the performance and binary size cost for features you don't use.
See documentation for details.
You can also use ib-pinyin if you only need Chinese pinyin match, which is simpler and more stable.
Usage
// cargo add ib-matcher --features pinyin,romaji
use ib_matcher::matcher::{IbMatcher, PinyinMatchConfig, RomajiMatchConfig};
let matcher = IbMatcher::builder("la vie est drôle").build();
assert!(matcher.is_match("LA VIE EST DRÔLE"));
let matcher = IbMatcher::builder("βίος").build();
assert!(matcher.is_match("Βίοσ"));
assert!(matcher.is_match("ΒΊΟΣ"));
let matcher = IbMatcher::builder("pysousuoeve")
.pinyin(PinyinMatchConfig::default())
.build();
assert!(matcher.is_match("拼音搜索Everything"));
let matcher = IbMatcher::builder("konosuba")
.romaji(RomajiMatchConfig::default())
.is_pattern_partial(true)
.build();
assert!(matcher.is_match("この素晴らしい世界に祝福を"));
glob()-style pattern matching
See glob module for more details. Here is a quick example:
// cargo add ib-matcher --features syntax-glob,regex,romaji
use ib_matcher::{
matcher::MatchConfig,
regex::lita::Regex,
syntax::glob::{parse_wildcard_path, PathSeparator}
};
let re = Regex::builder()
.ib(MatchConfig::builder().romaji(Default::default()).build())
.build_from_hir(
parse_wildcard_path()
.separator(PathSeparator::Windows)
.call("wifi**miku"),
)
.unwrap();
assert!(re.is_match(r"C:\Windows\System32\ja-jp\WiFiTask\ミク.exe"));
Regular expression
See regex module for more details. Here is a quick example:
// cargo add ib-matcher --features regex,pinyin,romaji
use ib_matcher::{
matcher::{MatchConfig, PinyinMatchConfig, RomajiMatchConfig},
regex::{cp::Regex, Match},
};
let config = MatchConfig::builder()
.pinyin(PinyinMatchConfig::default())
.romaji(RomajiMatchConfig::default())
.build();
let re = Regex::builder()
.ib(config.shallow_clone())
.build("raki.suta")
.unwrap();
assert_eq!(re.find("「らき☆すた」"), Some(Match::must(0, 3..18)));
let re = Regex::builder()
.ib(config.shallow_clone())
.build("pysou.*?(any|every)thing")
.unwrap();
assert_eq!(re.find("拼音搜索Everything"), Some(Match::must(0, 0..22)));
let config = MatchConfig::builder()
.pinyin(PinyinMatchConfig::default())
.romaji(RomajiMatchConfig::default())
.mix_lang(true)
.build();
let re = Regex::builder()
.ib(config.shallow_clone())
.build("(?x)^zangsounofuri-?ren # Mixing pinyin and romaji")
.unwrap();
assert_eq!(re.find("葬送のフリーレン"), Some(Match::must(0, 0..24)));
// cargo add ib-matcher --features regex,regex-callback
use ib_matcher::regex::cp::Regex;
let re = Regex::builder()
.callback("ascii", |input, at, push| {
let haystack = &input.haystack()[at..];
if haystack.len() > 0 && haystack[0].is_ascii() {
push(1);
}
})
.build(r"(ascii)+\d(ascii)+")
.unwrap();
let hay = "that4U this4me";
assert_eq!(&hay[re.find(hay).unwrap().span()], " this4me");
Test
cargo build
cargo test --features pinyin,romaji
Dependencies
~1.8–4.5MB
~85K SLoC