Codestin Search App

1.2.0

Take should score into account in early scanning break.

With a very common 'must' token, the 'should score' wasn't taken into account
enough, and on small enough `limit` the best result wasn't returned.

Dec 2, 2022
caad63b
zip
tar.gz

1.1.0

Migrate to pyo3 version 0.17 + performance tweaks.

- Rustc 1.62 from Debian testing couldn't install pyo3 0.17.
- I tested newer pyo3 using nightly and after small fixes it works.
- Migrated to different levenshtein algorithm, in internal tests
  seems faster by around 700 queries / s.
- Updated all other packages

Nov 11, 2022
ad02664
zip
tar.gz

1.0

Change `must token` API and prepare a 1.0 release.

- Breaking API change, bumping major from 0 to 1.
- Handle clippy suggestions.
  - There's only one error, but make it explicit.
  - .cloned()
- get_index helper to make use of ? and unindent code.

Nov 9, 2022
9b7d0f2
zip
tar.gz

0.6.1

Don't optimize scanning if good enough edit distance is yet not achie…

…ved.

- Disable scan_cutoff unless you already have 0 edit distance entry.
- Disable result limit optimization unless edit distance achieved.
- Add test that catches the problem.

TODO:
- Handle score a bit better so edit_distance can be shifted to 1 from 0.

Aug 17, 2022
83dce80
zip
tar.gz

0.6

Add support for 1-2 letter long tokens.

Searching for "1 may" street would ignore the "1" which is pretty important to
distinguish it from other streets with very short tokens. It should be mostly
used for should tokens, but works with must-tokens as well.

This change will increase the memory usage and might slow fuzzdex down a bit.

Jul 21, 2022
d86011f
zip
tar.gz

0.4

Change sorting and early finishing in the main algorithms.

Previously there was a "bug" where wrong score was compared and result scanning
finished early. Currently the cutoff is configurable, data is sorted first and
when limit is reached the phrase scanning stops. For each phrase we sort tokens
and stop measuring levenshtein distance on first valid match.

Data is usually sorted by score (decreasing), then length (increasing) as it's
best to have high score from shorter phrase. Maybe trigram scores should always
be divided by amount of letters or trigrams they come from.

Jul 5, 2022
1d17ffb
zip
tar.gz

0.2

Add distance function and fix parallel execution error.

- Tests for the Python side of the API.
- Reproduced AlreadyBorrowed error and "fixed" it by removing parallel execution
  for now.
- Add missing .lock files.
- Unify .query/.search across Rust and Python

Jun 21, 2022
66636b0
zip
tar.gz

0.1

Working code, version 0.1.

- 21x faster than Elasticsearch solution for my dataset (a small one).
- Untested and probably broken in few ways.

Jun 21, 2022
6ea25b9
zip
tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

1.2.0

1.1.0

1.0

0.6.1

0.6

0.4

0.2

0.1

Tags: blaa/fuzzdex