refactor!: Remove DictionaryFetcher and DictionaryStore, and move to Tokenizer #397

phenylshima · 2025-02-02T03:22:11Z

Purpose

Decouple jpreprocess and lindera, providing users with a way to use tokenizers other than lindera.

Main Breaking Changes

`jpreprocess` Crate:

The type variable trait for JPreprocess has been changed from DictionaryFetcher to Tokenizer. If you are using JPreprocess<DefaultFetcher>, replace it with JPreprocess<DefaultTokenizer>.
The method JPreprocess::with_dictionary_fetcher has been removed. For advanced use cases that previously required with_dictionary_fetcher, use from_tokenizer instead, though some modifications are required.

`jpreprocess-dictionary` Crate:

The DictionaryFetcher and DictionaryStore traits have been removed. The Tokenizer and Token traits will now serve a similar purpose, allowing precise control over dictionary loading behavior, but they require additional implementation for the tokenization step.
The DefaultFetcher has been removed. The DefaultTokenizer provides almost the same functionality but does not detect older dictionaries.

`jpreprocess-njd` Crate:

The function signature of NJD::from_tokens has been changed to accept any type implementing the Token trait, and it no longer requires DictionaryFetcher as an argument.
The NJDNode::load method now accepts WordEntry by immutable borrow.

crates/jpreprocess-dictionary/src/serializer/mod.rs

crates/jpreprocess-njd/src/lib.rs

phenylshima force-pushed the fetcher-to-tokenizer branch from ea8b768 to f19e8eb Compare February 2, 2025 06:58

phenylshima marked this pull request as ready for review February 2, 2025 07:01

cm-ayf reviewed Feb 2, 2025

View reviewed changes

crates/jpreprocess-dictionary/src/serializer/mod.rs Show resolved Hide resolved

crates/jpreprocess-njd/src/lib.rs Show resolved Hide resolved

crates/jpreprocess-njd/src/lib.rs Outdated Show resolved Hide resolved

phenylshima added 9 commits February 2, 2025 16:56

add tokenizer

aecd323

DictionaryFetcher, DictionaryStore -> Tokenizer

15dae1d

update examples, binaries, bindings

a1b1fbe

clippy fix

2dcce2e

fix dict ident

e98da95

cloudy tea

28eecb6

fix jpreprocess dictionary parser

df34c35

bump msrv

74b571c

move from_entries to FromIterator

1342709

phenylshima force-pushed the fetcher-to-tokenizer branch from 98aa56c to 1342709 Compare February 2, 2025 07:56

cm-ayf reviewed Feb 2, 2025

View reviewed changes

crates/jpreprocess-njd/src/lib.rs Outdated Show resolved Hide resolved

rm from_entries

6462014

cm-ayf approved these changes Feb 2, 2025

View reviewed changes

phenylshima merged commit 73561a8 into main Feb 2, 2025
15 checks passed

phenylshima deleted the fetcher-to-tokenizer branch February 2, 2025 09:26

phenylshima mentioned this pull request Feb 2, 2025

refactor!: Dictionary builder #395

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor!: Remove DictionaryFetcher and DictionaryStore, and move to Tokenizer #397

refactor!: Remove DictionaryFetcher and DictionaryStore, and move to Tokenizer #397

Uh oh!

phenylshima commented Feb 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

refactor!: Remove DictionaryFetcher and DictionaryStore, and move to Tokenizer #397

refactor!: Remove DictionaryFetcher and DictionaryStore, and move to Tokenizer #397

Uh oh!

Conversation

phenylshima commented Feb 2, 2025

Purpose

Main Breaking Changes

jpreprocess Crate:

jpreprocess-dictionary Crate:

jpreprocess-njd Crate:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`jpreprocess` Crate:

`jpreprocess-dictionary` Crate:

`jpreprocess-njd` Crate: