1 unstable release

Uses new Rust 2024

0.1.0	Jan 5, 2026

#1485 in Text processing

Used in 2 crates

MIT/Apache

370KB
9K SLoC

mecrab

Core runtime library for MeCrab morphological analyzer.

Features

MeCab Compatible: Works with IPADIC/UniDic dictionaries
High Performance: Memory-mapped dictionaries, SIMD-optimized Viterbi (AVX2)
Thread-safe: Safe concurrent access
Live Updates: Add/remove words at runtime
Semantic Linking: Wikidata URI attachment with JSON-LD/RDF export
N-best Search: A* algorithm for multiple path analysis
Streaming: Sentence boundary detection for large text processing
Phonetic Transduction: Kana/Romaji/X-SAMPA/IPA conversion

Installation

[dependencies]
mecrab = "0.1"

Feature Flags

Feature	Description
`json`	JSON output format
`parallel`	Parallel batch processing (rayon)
`simd`	SIMD optimizations (AVX2)
`wasm`	WebAssembly bindings
`python`	Python bindings (PyO3)

Usage

use mecrab::MeCrab;

let mecrab = MeCrab::new()?;
let result = mecrab.parse("すもももももももものうち")?;
println!("{}", result);

// Add custom words
mecrab.add_word("ChatGPT", "チャットジーピーティー", "チャットジーピーティー", 5000);

// N-best paths
use mecrab::viterbi::NbestSearch;
let nbest = NbestSearch::new(&mecrab);
for path in nbest.search("東京", 5)? {
    println!("Cost: {}", path.total_cost);
}

// Phonetic conversion
use mecrab::phonetic::PhoneticTransducer;
let transducer = PhoneticTransducer::new();
println!("{}", transducer.to_romaji("こんにちは")); // konnichiha

Module Structure

mecrab/src/
├── lib.rs           # Public API
├── dict/            # Dictionary loading
│   ├── mod.rs       # Token, SysDic, OverlayDictionary
│   └── user_dict.rs # User dictionary persistence
├── lattice/         # Lattice construction
├── viterbi/         # Viterbi algorithm
│   ├── mod.rs       # Core Viterbi
│   ├── simd.rs      # AVX2 acceleration
│   ├── nbest.rs     # N-best A* search
│   └── analysis.rs  # Cost analysis
├── semantic/        # Semantic enrichment
│   ├── mod.rs       # SemanticEntry, EntityType
│   ├── pool.rs      # SemanticPool (5-byte entries)
│   ├── jsonld.rs    # JSON-LD export
│   ├── rdf.rs       # RDF/Turtle/N-Triples export
│   ├── disambiguation.rs  # Disambiguation strategies
│   └── extension.rs # TokenExtension
├── phonetic/        # Phonetic processing
│   ├── mod.rs       # Reading extraction
│   └── transducer.rs # Kana/Romaji/X-SAMPA/IPA
├── stream.rs        # Streaming text processing
├── normalize.rs     # Text normalization
├── bench.rs         # Benchmarking utilities
└── error.rs         # Error types

License

MIT OR Apache-2.0

Dependencies

~4–8.5MB
~219K SLoC