Decompress in 8-byte blocks#82
Conversation
CodSpeed Performance ReportMerging #82 will improve performances by 31.01%Comparing Summary
Benchmarks breakdown
|
| while out_ptr.cast_const() <= block_out_end && in_ptr < block_in_end { | ||
| // Note that we load a little-endian u64 here. | ||
| let next_block = in_ptr.cast::<u64>().read_unaligned(); | ||
| let escape_mask = (next_block & 0x8080808080808080) |
There was a problem hiding this comment.
Claude did a great job of explaining this:
This bit manipulation trick is used to detect special characters (typically escape characters or null terminators) in a 32-bit word. Let me break it down step by step:
let escape_mask = (next_block & 0x80808080)
& ((((!next_block) & 0x7F7F7F7F) + 0x7F7F7F7F) ^ 0x80808080);
-
next_block & 0x80808080- This extracts the most significant bit (MSB) from each byte in the 32-bit word. The MSB will be 1 for bytes with values ≥ 128. -
(!next_block) & 0x7F7F7F7F- This inverts all bits innext_block, then masks off the MSBs, keeping only the 7 lower bits of each byte. -
((!next_block) & 0x7F7F7F7F) + 0x7F7F7F7F- This adds 0x7F (127) to each byte. If a byte in the originalnext_blockwas 0 or very small, this addition will cause a carry into the MSB position. -
(...) ^ 0x80808080- This XORs with 0x80808080, which flips the MSB of each byte. As a result, bytes that didn't cause a carry will now have their MSB set to 1. -
The final
&operation combines the results: it will only keep bits set in both expressions. This identifies bytes that:- Have their MSB set in the original data (from step 1)
- Did NOT cause a carry in step 3 (which happens for null bytes or small values)
This trick is commonly used in optimized string processing to identify null terminators (0x00) and other special characters (typically ASCII control characters with values < 32) without doing byte-by-byte comparisons.
In essence, it creates a bit mask where each set bit indicates the position of a special character in the 32-bit word.
## 🤖 New release * `fsst-rs`: 0.5.0 -> 0.5.1 (✓ API compatible changes) <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.5.1](v0.5.0...v0.5.1) - 2025-03-12 ### Other - Decompress in 8-byte blocks ([#82](#82)) - *(deps)* lock file maintenance ([#83](#83)) - Assert enough room in decoded buffer ([#79](#79)) - *(deps)* update rust crate criterion to v2.9.1 ([#80](#80)) - *(deps)* update mozilla-actions/sccache-action action to v0.0.8 ([#78](#78)) - *(deps)* lock file maintenance ([#77](#77)) - Add codspeed ([#76](#76)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
## 🤖 New release * `fsst-rs`: 0.5.0 -> 0.5.1 (✓ API compatible changes) <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.5.1](spiraldb/fsst@v0.5.0...v0.5.1) - 2025-03-12 ### Other - Decompress in 8-byte blocks ([#82](spiraldb/fsst#82)) - *(deps)* lock file maintenance ([#83](spiraldb/fsst#83)) - Assert enough room in decoded buffer ([#79](spiraldb/fsst#79)) - *(deps)* update rust crate criterion to v2.9.1 ([#80](spiraldb/fsst#80)) - *(deps)* update mozilla-actions/sccache-action action to v0.0.8 ([#78](spiraldb/fsst#78)) - *(deps)* lock file maintenance ([#77](spiraldb/fsst#77)) - Add codspeed ([#76](spiraldb/fsst#76)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
## 🤖 New release * `fsst-rs`: 0.5.0 -> 0.5.1 (✓ API compatible changes) <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.5.1](spiraldb/fsst@v0.5.0...v0.5.1) - 2025-03-12 ### Other - Decompress in 8-byte blocks ([#82](spiraldb/fsst#82)) - *(deps)* lock file maintenance ([#83](spiraldb/fsst#83)) - Assert enough room in decoded buffer ([#79](spiraldb/fsst#79)) - *(deps)* update rust crate criterion to v2.9.1 ([#80](spiraldb/fsst#80)) - *(deps)* update mozilla-actions/sccache-action action to v0.0.8 ([#78](spiraldb/fsst#78)) - *(deps)* lock file maintenance ([#77](spiraldb/fsst#77)) - Add codspeed ([#76](spiraldb/fsst#76)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
## 🤖 New release * `fsst-rs`: 0.5.0 -> 0.5.1 (✓ API compatible changes) <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.5.1](spiraldb/fsst@v0.5.0...v0.5.1) - 2025-03-12 ### Other - Decompress in 8-byte blocks ([#82](spiraldb/fsst#82)) - *(deps)* lock file maintenance ([#83](spiraldb/fsst#83)) - Assert enough room in decoded buffer ([#79](spiraldb/fsst#79)) - *(deps)* update rust crate criterion to v2.9.1 ([#80](spiraldb/fsst#80)) - *(deps)* update mozilla-actions/sccache-action action to v0.0.8 ([#78](spiraldb/fsst#78)) - *(deps)* lock file maintenance ([#77](spiraldb/fsst#77)) - Add codspeed ([#76](spiraldb/fsst#76)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
## 🤖 New release * `fsst-rs`: 0.5.0 -> 0.5.1 (✓ API compatible changes) <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.5.1](spiraldb/fsst@v0.5.0...v0.5.1) - 2025-03-12 ### Other - Decompress in 8-byte blocks ([#82](spiraldb/fsst#82)) - *(deps)* lock file maintenance ([#83](spiraldb/fsst#83)) - Assert enough room in decoded buffer ([#79](spiraldb/fsst#79)) - *(deps)* update rust crate criterion to v2.9.1 ([#80](spiraldb/fsst#80)) - *(deps)* update mozilla-actions/sccache-action action to v0.0.8 ([#78](spiraldb/fsst#78)) - *(deps)* lock file maintenance ([#77](spiraldb/fsst#77)) - Add codspeed ([#76](spiraldb/fsst#76)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
NOTE: we may want to fall back to a 4-byte unrolling for WASM / 4-byte architectures, but it may not matter.