feat: make Compressor::train 2x faster with bitmap index#16
Merged
Conversation
The slowest part of Compressor::train is the double-nested loops over codes. Now compress_count when it records code pairs will also populate a bitmap index, where `pairs_index[code1].set(code2)` will indicate that code2 followed code1 in compressed output. In the `optimize` loop, we can eliminate tight loop iterations by accessing `pairse_index[code1].second_codes()` which yields the value code2 values. This results in a speedup from ~1ms -> 500micros.
a10y
commented
Aug 20, 2024
Comment on lines
-35
to
-40
| pub fn reset(&mut self) { | ||
| for idx in 0..COUNTS1_SIZE { | ||
| self.counts1[idx] = 0; | ||
| } | ||
| for idx in 0..COUNTS2_SIZE { | ||
| self.counts2[idx] = 0; |
Contributor
Author
There was a problem hiding this comment.
this was slower than just building a new Counter b/c of the vec![0] change made in the previous PR
i don't want to lose my 30s CI checks
64beee7 to
f74d185
Compare
Merged
a10y
pushed a commit
that referenced
this pull request
Aug 20, 2024
## 🤖 New release * `fsst-rs`: 0.2.0 -> 0.2.1 <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.2.1](v0.2.0...v0.2.1) - 2024-08-20 ### Added - make Compressor::train 2x faster with bitmap index ([#16](#16)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/MarcoIeni/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Closed
AdamGS
reviewed
Aug 21, 2024
Comment on lines
+61
to
+63
| if self.block == 0 { | ||
| return None; | ||
| } |
Contributor
There was a problem hiding this comment.
Shouldn't it be possible to skip this check?
a1412744807
added a commit
to a1412744807/rs-fsst
that referenced
this pull request
Oct 27, 2025
## 🤖 New release * `fsst-rs`: 0.2.0 -> 0.2.1 <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.2.1](spiraldb/fsst@v0.2.0...v0.2.1) - 2024-08-20 ### Added - make Compressor::train 2x faster with bitmap index ([#16](spiraldb/fsst#16)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/MarcoIeni/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Goodbai-1206
added a commit
to Goodbai-1206/fsst-rs
that referenced
this pull request
Oct 29, 2025
## 🤖 New release * `fsst-rs`: 0.2.0 -> 0.2.1 <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.2.1](spiraldb/fsst@v0.2.0...v0.2.1) - 2024-08-20 ### Added - make Compressor::train 2x faster with bitmap index ([#16](spiraldb/fsst#16)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/MarcoIeni/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
akkjs887
added a commit
to akkjs887/rt-rsst
that referenced
this pull request
Oct 29, 2025
## 🤖 New release * `fsst-rs`: 0.2.0 -> 0.2.1 <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.2.1](spiraldb/fsst@v0.2.0...v0.2.1) - 2024-08-20 ### Added - make Compressor::train 2x faster with bitmap index ([#16](spiraldb/fsst#16)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/MarcoIeni/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
yelirekhmon
added a commit
to yelirekhmon/fstrs
that referenced
this pull request
Oct 30, 2025
## 🤖 New release * `fsst-rs`: 0.2.0 -> 0.2.1 <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.2.1](spiraldb/fsst@v0.2.0...v0.2.1) - 2024-08-20 ### Added - make Compressor::train 2x faster with bitmap index ([#16](spiraldb/fsst#16)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/MarcoIeni/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The slowest part of Compressor::train is the double-nested loops over codes.
Now compress_count when it records code pairs will also populate a bitmap index, where
pairs_index[code1].set(code2)will indicate that code2 followed code1 in compressed output.In the
optimizeloop, we can eliminate tight loop iterations by accessingpairse_index[code1].second_codes()which yields the value code2 values.This results in a speedup from ~1ms -> 500micros for the training benchmark. We're sub-millisecond!
This also makes Miri somewhat palatable to run for all but
test_large, so I've re-enabled it for CI (currently it runs in 2.5 minutes. Far cry from the < 30s build+test step but I guess it's for a good cause)