Tags: blaa/fuzzdex
Tags
Migrate to pyo3 version 0.17 + performance tweaks. - Rustc 1.62 from Debian testing couldn't install pyo3 0.17. - I tested newer pyo3 using nightly and after small fixes it works. - Migrated to different levenshtein algorithm, in internal tests seems faster by around 700 queries / s. - Updated all other packages
Don't optimize scanning if good enough edit distance is yet not achie… …ved. - Disable scan_cutoff unless you already have 0 edit distance entry. - Disable result limit optimization unless edit distance achieved. - Add test that catches the problem. TODO: - Handle score a bit better so edit_distance can be shifted to 1 from 0.
Add support for 1-2 letter long tokens. Searching for "1 may" street would ignore the "1" which is pretty important to distinguish it from other streets with very short tokens. It should be mostly used for should tokens, but works with must-tokens as well. This change will increase the memory usage and might slow fuzzdex down a bit.
Change sorting and early finishing in the main algorithms. Previously there was a "bug" where wrong score was compared and result scanning finished early. Currently the cutoff is configurable, data is sorted first and when limit is reached the phrase scanning stops. For each phrase we sort tokens and stop measuring levenshtein distance on first valid match. Data is usually sorted by score (decreasing), then length (increasing) as it's best to have high score from shorter phrase. Maybe trigram scores should always be divided by amount of letters or trigrams they come from.