Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: make Compressor::train 2x faster with bitmap index#16

Merged
a10y merged 6 commits into
developfrom
aduffy/train-speedup
Aug 20, 2024
Merged

feat: make Compressor::train 2x faster with bitmap index#16
a10y merged 6 commits into
developfrom
aduffy/train-speedup

Conversation

@a10y

@a10y a10y commented Aug 20, 2024

Copy link
Copy Markdown
Contributor

The slowest part of Compressor::train is the double-nested loops over codes.

Now compress_count when it records code pairs will also populate a bitmap index, where pairs_index[code1].set(code2) will indicate that code2 followed code1 in compressed output.

In the optimize loop, we can eliminate tight loop iterations by accessing pairse_index[code1].second_codes() which yields the value code2 values.

This results in a speedup from ~1ms -> 500micros for the training benchmark. We're sub-millisecond!

This also makes Miri somewhat palatable to run for all but test_large, so I've re-enabled it for CI (currently it runs in 2.5 minutes. Far cry from the < 30s build+test step but I guess it's for a good cause)

a10y added 2 commits August 20, 2024 16:44
The slowest part of Compressor::train is the double-nested loops
over codes.

Now compress_count when it records code pairs will also populate
a bitmap index, where `pairs_index[code1].set(code2)` will indicate
that code2 followed code1 in compressed output.

In the `optimize` loop, we can eliminate tight loop iterations by
accessing `pairse_index[code1].second_codes()` which yields the value
code2 values.

This results in a speedup from ~1ms -> 500micros.
Comment thread src/builder.rs
Comment on lines -35 to -40
pub fn reset(&mut self) {
for idx in 0..COUNTS1_SIZE {
self.counts1[idx] = 0;
}
for idx in 0..COUNTS2_SIZE {
self.counts2[idx] = 0;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was slower than just building a new Counter b/c of the vec![0] change made in the previous PR

@a10y a10y force-pushed the aduffy/train-speedup branch from 64beee7 to f74d185 Compare August 20, 2024 21:58
@a10y a10y merged commit d7e836c into develop Aug 20, 2024
@a10y a10y deleted the aduffy/train-speedup branch August 20, 2024 22:04
@github-actions github-actions Bot mentioned this pull request Aug 20, 2024
a10y pushed a commit that referenced this pull request Aug 20, 2024
## 🤖 New release
* `fsst-rs`: 0.2.0 -> 0.2.1

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [0.2.1](v0.2.0...v0.2.1) -
2024-08-20

### Added
- make Compressor::train 2x faster with bitmap index
([#16](#16))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/MarcoIeni/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@a10y a10y mentioned this pull request Aug 20, 2024
Comment thread src/builder.rs
Comment on lines +61 to +63
if self.block == 0 {
return None;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be possible to skip this check?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! #18

a1412744807 added a commit to a1412744807/rs-fsst that referenced this pull request Oct 27, 2025
## 🤖 New release
* `fsst-rs`: 0.2.0 -> 0.2.1

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [0.2.1](spiraldb/fsst@v0.2.0...v0.2.1) -
2024-08-20

### Added
- make Compressor::train 2x faster with bitmap index
([#16](spiraldb/fsst#16))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/MarcoIeni/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Goodbai-1206 added a commit to Goodbai-1206/fsst-rs that referenced this pull request Oct 29, 2025
## 🤖 New release
* `fsst-rs`: 0.2.0 -> 0.2.1

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [0.2.1](spiraldb/fsst@v0.2.0...v0.2.1) -
2024-08-20

### Added
- make Compressor::train 2x faster with bitmap index
([#16](spiraldb/fsst#16))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/MarcoIeni/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
akkjs887 added a commit to akkjs887/rt-rsst that referenced this pull request Oct 29, 2025
## 🤖 New release
* `fsst-rs`: 0.2.0 -> 0.2.1

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [0.2.1](spiraldb/fsst@v0.2.0...v0.2.1) -
2024-08-20

### Added
- make Compressor::train 2x faster with bitmap index
([#16](spiraldb/fsst#16))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/MarcoIeni/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
yelirekhmon added a commit to yelirekhmon/fstrs that referenced this pull request Oct 30, 2025
## 🤖 New release
* `fsst-rs`: 0.2.0 -> 0.2.1

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [0.2.1](spiraldb/fsst@v0.2.0...v0.2.1) -
2024-08-20

### Added
- make Compressor::train 2x faster with bitmap index
([#16](spiraldb/fsst#16))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/MarcoIeni/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants