This project provides Rust FFI bindings for the libmarisa library, a space-efficient trie data structure implementation.
- Lookup: Check whether a string exists in the dictionary
- Reverse lookup: Restore a key from its ID
- Common prefix search: Find keys from prefixes of a given string
- Predictive search: Find keys starting with a given string
- Memory efficient: Significantly more compact than other trie implementations
- Cross-platform: Works on Linux, macOS, and Windows
- Rust 1.70+
- CMake 3.22+
- C++ compiler with C++20 support
Add this to your Cargo.toml:
[dependencies]
marisa-ffi = "0.3.1"- Clone the repository:
git clone https://github.com/shinecrazy/marisa-ffi.git
cd marisa-ffi- Build the FFI library:
cargo build --releaseAdd this to your Cargo.toml:
[dependencies]
marisa-ffi = "0.3.1"use marisa_ffi::{Trie, Keyset, Agent};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create a keyset and add some keys
let mut keyset = Keyset::new()?;
keyset.push("hello")?;
keyset.push("world")?;
keyset.push("rust")?;
// Build the trie
let trie = Trie::build(&keyset)?;
// Lookup a key
if let Some(id) = trie.lookup("hello") {
println!("Found 'hello' with ID: {}", id);
}
// Predictive search
let mut agent = Agent::new()?;
if trie.predictive_search("h", &mut agent)? {
println!("Found key: {}", agent.key()?);
}
Ok(())
}The main trie structure for efficient string storage and lookup.
impl Trie {
/// Create a new empty trie
pub fn new() -> Result<Self, MarisaError>;
/// Build a trie from a keyset
pub fn build(keyset: &Keyset) -> Result<Self, MarisaError>;
/// Load a trie from a file
pub fn load(&mut self, filename: &str) -> Result<(), MarisaError>;
/// Save the trie to a file
pub fn save(&self, filename: &str) -> Result<(), MarisaError>;
/// Lookup a key and return its ID if found
pub fn lookup(&self, key: &str) -> Option<u32>;
/// Perform predictive search starting with the given prefix
pub fn predictive_search(&self, prefix: &str, agent: &mut Agent) -> Result<bool, MarisaError>;
/// Perform reverse lookup to get a key by its ID
pub fn reverse_lookup(&self, id: u32, agent: &mut Agent) -> Result<(), MarisaError>;
/// Perform common prefix search
pub fn common_prefix_search(&self, prefix: &str, agent: &mut Agent) -> Result<bool, MarisaError>;
}A collection of keys used to build tries.
impl Keyset {
/// Create a new empty keyset
pub fn new() -> Result<Self, MarisaError>;
/// Add a key to the keyset
pub fn push(&mut self, key: &str) -> Result<(), MarisaError>;
/// Add a key with a specific ID to the keyset
pub fn push_with_id(&mut self, key: &str, id: u32) -> Result<(), MarisaError>;
/// Reset the keyset
pub fn reset(&mut self);
}An agent for performing search operations and retrieving results.
impl Agent {
/// Create a new agent
pub fn new() -> Result<Self, MarisaError>;
/// Get the current key as a string
pub fn key(&self) -> Result<String, MarisaError>;
/// Get the current key ID
pub fn id(&self) -> u32;
/// Move to the next result
pub fn next(&mut self) -> Result<bool, MarisaError>;
}See examples/basic_usage.rs for a comprehensive example demonstrating all features.
See examples/benchmark.rs for performance testing and comparison.
Marisa trie is significantly more space-efficient than other implementations:
| Implementation | Size (bytes) | Remarks |
|---|---|---|
| darts-clone | 376,613,888 | Compacted double-array trie |
| tx-trie | 127,727,058 | LOUDS-based trie |
| marisa-trie | 50,753,560 | MARISA trie |
Source: English Wikipedia page titles (9,805,576 keys)
All operations return Result types with detailed error information:
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum MarisaError {
StateError,
NullError,
BoundError,
RangeError,
CodeError,
ResetError,
SizeError,
MemoryError,
IoError,
FormatError,
KeyNotFound,
NoResults,
Utf8Error(std::str::Utf8Error),
}This project is licensed under the same terms as libmarisa:
- BSD-2-Clause OR LGPL-2.1-or-later
Contributions are welcome! Please feel free to submit issues and pull requests.
- s-yata for the original libmarisa implementation
- The Rust community for excellent FFI tooling and documentation