Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tags: blaa/fuzzdex

Tags

1.2.0

Toggle 1.2.0's commit message
Take should score into account in early scanning break.

With a very common 'must' token, the 'should score' wasn't taken into account
enough, and on small enough `limit` the best result wasn't returned.

1.1.0

Toggle 1.1.0's commit message
Migrate to pyo3 version 0.17 + performance tweaks.

- Rustc 1.62 from Debian testing couldn't install pyo3 0.17.
- I tested newer pyo3 using nightly and after small fixes it works.
- Migrated to different levenshtein algorithm, in internal tests
  seems faster by around 700 queries / s.
- Updated all other packages

1.0

Toggle 1.0's commit message
Change `must token` API and prepare a 1.0 release.

- Breaking API change, bumping major from 0 to 1.
- Handle clippy suggestions.
  - There's only one error, but make it explicit.
  - .cloned()
- get_index helper to make use of ? and unindent code.

0.6.1

Toggle 0.6.1's commit message
Don't optimize scanning if good enough edit distance is yet not achie…

…ved.

- Disable scan_cutoff unless you already have 0 edit distance entry.
- Disable result limit optimization unless edit distance achieved.
- Add test that catches the problem.

TODO:
- Handle score a bit better so edit_distance can be shifted to 1 from 0.

0.6

Toggle 0.6's commit message
Add support for 1-2 letter long tokens.

Searching for "1 may" street would ignore the "1" which is pretty important to
distinguish it from other streets with very short tokens. It should be mostly
used for should tokens, but works with must-tokens as well.

This change will increase the memory usage and might slow fuzzdex down a bit.

0.4

Toggle 0.4's commit message
Change sorting and early finishing in the main algorithms.

Previously there was a "bug" where wrong score was compared and result scanning
finished early. Currently the cutoff is configurable, data is sorted first and
when limit is reached the phrase scanning stops. For each phrase we sort tokens
and stop measuring levenshtein distance on first valid match.

Data is usually sorted by score (decreasing), then length (increasing) as it's
best to have high score from shorter phrase. Maybe trigram scores should always
be divided by amount of letters or trigrams they come from.

0.2

Toggle 0.2's commit message
Add distance function and fix parallel execution error.

- Tests for the Python side of the API.
- Reproduced AlreadyBorrowed error and "fixed" it by removing parallel execution
  for now.
- Add missing .lock files.
- Unify .query/.search across Rust and Python

0.1

Toggle 0.1's commit message
Working code, version 0.1.

- 21x faster than Elasticsearch solution for my dataset (a small one).
- Untested and probably broken in few ways.