Codestin Search App

a10y · 2024-08-20T23:50:20Z

Implements a new codec, FSSTCompressor using latest version of fsst-rs library
Adds new metadata field on CompressionTree to allow reuse between the sampling and compressing stages. For example, we can save the ALP exponents to not have to calculate them twice. This is very important for FSST so that we save the overhead of training the table twice
Adds new compression benchmark using the lineitem table's l_comment column, with scalefactor=1, which is just over 6million rows. By default this is loaded as a ChunkedArray with 733 partitions. Compressing with FSST enabled takes 1.6s. Compressing on the canonicalized array takes ~550ms. We should be able to speed this up by at least ~2x, see FSSTCompressor #664 (comment), and we can potentially do even better. We probably want to be able to FSST compress a ChunkedArray directly so that we avoid the overhead of training/compressing each chunk from scratch.

a10y · 2024-08-21T14:47:00Z

Ran the compress_taxi benchmark, got ~80% slower. I am a bit surprised that the biggest culprit seems to be creating new counters in the FSST training loop. That doesn't even scale w.r.t. to the size of the input array, it's just a flat 2MB allocation. The zeroing of the vector seems to be the biggest problem. I think we can avoid that with a second bitmap, let me try that out

a10y · 2024-08-21T15:44:27Z

Alright, using the change in spiraldb/fsst#21 helped a lot.

New benchmark result:

end to end - taxi/compress
                        time:   [100.73 ms 101.72 ms 102.96 ms]
                        change: [-45.073% -43.470% -42.057%] (p = 0.00 < 0.05)
                        Performance has improved.

Which is about 10ms or ~11% slower than running without FSST.

a10y · 2024-08-21T15:45:18Z

And I think we can go even lower, ideally we'd just use the trained compressor over the samples to compress the full array

robert3005 · 2024-08-21T16:07:58Z

Just bear in mind that the samples can be very small compared to data, i.e. 1024 elements. I would say just retrain it

a10y · 2024-08-23T03:19:46Z

Ok I've done a few things today

Introduced a way to reuse compressors in our samplingcompressor code
Keep tweaking some things on the FSST side, including matching how the paper author's sample the full input, and tried to reduce memory accesses/extraneous instructions as much as possible (in draft at feat: port in more from the C++ code spiraldb/fsst#24)
I'm trying to run some comparisons against the C++ code. Here's a screenshot comparing a microbenchmark for compressing thecomments column from the TPCH orders table (1.5mm rows) using Rust vs the C++ implementation. We seem roughly on-par with the C++ implementation here, the timings seemed consistent after several runs

a10y · 2024-08-23T04:09:46Z

Ok I added a new benchmark now which just compresses the comments column in-memory via Vortex, and i'm seeing it take ~500ms, which is roughly 2-3x longer than just doing the compression without Vortex.

I think the root of the performance difference is the chunking. Here's a comparison between running FSST over the comments column chunked as per our TPC loading infra (nchunks=192) and the canonicalized version of the comments array, which is not chunked:

So somewhere I guess there's some fixed-size overhead in FSST training (probably a combo of allocations and double-tight-loops over 0...511) that when you try and run FSST hundreds of times, they start to add up and can skew your results.

I'm curious how DuckDB and other folks deal with FSST + chunking, it seems like we might want to treat it as a special thing that can do its own sampling + have shared symbol table across chunks

a10y · 2024-08-23T17:33:31Z

I'm currently blocking this on some work in spiraldb/fsst#24

a10y · 2024-09-03T19:00:01Z

Currently 59% of fsst_compress time is spent actually compressing, we break out of the fast loop to do push_null and data copying. Something to improve on in flup

…ssor

a10y · 2024-09-03T19:24:09Z

+        //  so we transmute to kill the lifetime complaints.
+        //  This is fine because the returned `Decompressor`'s lifetime is tied to the lifetime
+        //  of these same arrays.
+        let symbol_lengths = unsafe { std::mem::transmute::<&[u8], &[u8]>(symbol_lengths) };


curious for a sanity check here, or if there's another way i should be doing this. it feels a bit wrong, but I think it is currently the best way to do the thing I want...

nvm this is wrong, if we actually canonicalize this pointer is invalid

ok this should be fixed now, instead of returning a decompressor this constructs one on-the-fly to pass to a provided function

a10y added 2 commits August 20, 2024 18:14

re-enable miri on FSSTArray tests

7e99d94

FSSTCompressor

335636a

robert3005 reviewed Aug 21, 2024

View reviewed changes

Comment thread vortex-sampling-compressor/src/compressors/fsst.rs Outdated

robert3005 reviewed Aug 21, 2024

View reviewed changes

Comment thread vortex-sampling-compressor/src/compressors/fsst.rs Outdated

robert3005 reviewed Aug 21, 2024

View reviewed changes

Comment thread vortex-sampling-compressor/src/compressors/fsst.rs

fixes

fc81829

a10y commented Aug 21, 2024

View reviewed changes

Comment thread bench-vortex/benches/random_access.rs

a10y commented Aug 21, 2024

View reviewed changes

Comment thread encodings/fsst/tests/fsst_tests.rs

a10y added 2 commits August 22, 2024 12:53

some things

47c21a2

weird clippy

23f32ce

more

f7f7429

a10y commented Aug 23, 2024

View reviewed changes

Comment thread Cargo.toml Outdated

save

536871a

a10y added 3 commits September 3, 2024 15:00

more

3517e59

Merge remote-tracking branch 'origin/develop' into aduffy/fsst-compre…

cefb8ce

…ssor

better

aff74c1

a10y commented Sep 3, 2024

View reviewed changes

Comment thread bench-vortex/src/tpch/mod.rs

prints

f55092b

a10y commented Sep 3, 2024

View reviewed changes

Comment thread encodings/fsst/src/array.rs

a10y commented Sep 3, 2024

View reviewed changes

a10y marked this pull request as ready for review September 3, 2024 19:30

decompressor -> with_decompressor

1d1d56a

robert3005 reviewed Sep 3, 2024

View reviewed changes

Comment thread bench-vortex/benches/compress_benchmark.rs Outdated

robert3005 reviewed Sep 3, 2024

View reviewed changes

Comment thread vortex-sampling-compressor/src/compressors/mod.rs Outdated

be slightly safer

79197e9

a10y force-pushed the aduffy/fsst-compressor branch from 58a58eb to 79197e9 Compare September 3, 2024 21:26

a10y commented Sep 3, 2024

View reviewed changes

Comment thread vortex-sampling-compressor/src/compressors/mod.rs

comment

87e881c

robert3005 reviewed Sep 3, 2024

View reviewed changes

Comment thread vortex-sampling-compressor/src/compressors/fsst.rs Outdated

fix comment

cffa478

robert3005 approved these changes Sep 3, 2024

View reviewed changes

a10y enabled auto-merge (squash) September 3, 2024 22:20

bump fsst-rs

9800ea7

a10y merged commit fd49140 into develop Sep 3, 2024

a10y deleted the aduffy/fsst-compressor branch September 3, 2024 22:29

robert3005 mentioned this pull request Sep 5, 2024

Encoding: FSST #9

Closed

Conversation

a10y commented Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

a10y commented Aug 21, 2024

Uh oh!

a10y commented Aug 21, 2024

Uh oh!

a10y commented Aug 21, 2024

Uh oh!

Uh oh!

Uh oh!

robert3005 commented Aug 21, 2024

Uh oh!

a10y commented Aug 23, 2024

Uh oh!

Uh oh!

a10y commented Aug 23, 2024

Uh oh!

a10y commented Aug 23, 2024

Uh oh!

a10y commented Sep 3, 2024

Uh oh!

Uh oh!

Uh oh!

a10y Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

a10y Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

a10y Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

a10y commented Aug 20, 2024 •

edited

Loading