Thanks to visit codestin.com
Credit goes to github.com

Skip to content

perf(parquet): improve decoding performance for parquet#6759

Open
RaphDal wants to merge 77 commits intomasterfrom
rd_parquet_decode
Open

perf(parquet): improve decoding performance for parquet#6759
RaphDal wants to merge 77 commits intomasterfrom
rd_parquet_decode

Conversation

@RaphDal
Copy link
Contributor

@RaphDal RaphDal commented Feb 9, 2026

This PR brings several improvements to parquet support, improving decoding performance.

Changes

  • Adding a benchmark to measure decoding throughput of different column type/encoding combinations.
  • Remove unnecessary allocation/zeroing in decompression path.
  • Skip definition decoding (null bitmap) when null_count=0.
  • Improved Plain/DeltaBinaryPacked/Rle/RleDictionary decoders implementation for better performance.
  • Adding lookup Table for boolean unpack.
  • Specialized UUID byte swapping.
  • Adding batch decoder for DeltaBinaryPacked.
  • Batching nullable bitmap.
  • Bulk copying of PlainPrimitiveDecoder.
  • Comprehensive testing of decoders.
  • Fixing bug with incorrect null sentinel values.

Benchmarks

Ran on Apple M5, there is a lot of jitter.

Decoding

Iteration: decode 100k values (default page size).

  • Median change: -87.10%
Type Encoding Cardinality Null % Master (ns/iter) Patched (ns/iter) Patched (M elements/s) Change (±)
array delta_len 0 1,142,737 959,031 104.3 -16.08% ± 1.70%
array delta_len 20 916,331 786,131 127.2 -14.21% ± 1.29%
array plain 0 577,010 386,182 258.9 -33.07% ± 0.85%
array plain 20 493,360 339,179 294.8 -31.25% ± 1.22%
binary delta_len 0 1,056,638 916,466 109.1 -13.27% ± 1.29%
binary delta_len 20 936,500 757,334 132.0 -19.13% ± 2.10%
binary plain 0 536,423 403,817 247.6 -24.72% ± 1.36%
binary plain 20 485,270 352,759 283.5 -27.31% ± 0.69%
boolean plain 0 188,961 4,454 22,452.4 -97.64% ± 0.27%
boolean rle 0 193,054 2,042 48,980.6 -98.94% ± 0.03%
byte delta_bp 0 642,125 47,010 2,127.2 -92.68% ± 0.13%
byte dict 1000 0 340,960 55,071 1,815.8 -83.85% ± 1.11%
byte dict 1000 20 304,034 46,475 2,151.7 -84.71% ± 0.59%
byte dict 100 0 387,551 51,795 1,930.7 -86.64% ± 0.99%
byte dict 100 20 312,892 46,274 2,161.0 -85.21% ± 0.63%
byte dict 10 0 401,985 46,837 2,135.1 -88.35% ± 0.40%
byte dict 10 20 304,061 51,545 1,940.1 -83.05% ± 2.37%
byte dict 256 0 353,187 54,374 1,839.1 -84.60% ± 0.95%
byte dict 256 20 291,609 48,125 2,077.9 -83.50% ± 0.95%
byte plain 0 164,133 3,034 32,963.8 -98.15% ± 0.01%
char delta_bp 0 633,527 28,487 3,510.3 -95.50% ± 0.13%
char plain 0 171,893 3,381 29,576.7 -98.03% ± 0.09%
date delta_bp 0 428,989 33,741 2,963.8 -92.13% ± 0.12%
date delta_bp 20 477,614 63,639 1,571.4 -86.68% ± 0.17%
date dict 1000 0 413,985 49,344 2,026.6 -88.08% ± 0.51%
date dict 1000 20 416,532 55,279 1,809.0 -86.73% ± 1.41%
date dict 100 0 403,009 52,289 1,912.5 -87.03% ± 1.01%
date dict 100 20 393,288 49,782 2,008.8 -87.34% ± 1.16%
date dict 10 0 362,224 44,267 2,259.0 -87.78% ± 0.43%
date dict 10 20 522,817 44,528 2,245.8 -91.48% ± 0.73%
date dict 256 0 461,130 44,842 2,230.0 -90.28% ± 1.09%
date dict 256 20 370,965 43,664 2,290.2 -88.23% ± 0.87%
date plain 0 9,747 10,366 9,647.3 +6.34% ± 2.41%
date plain 20 107,260 15,165 6,594.3 -85.86% ± 0.28%
double dict 1000 0 677,757 50,126 1,995.0 -92.60% ± 0.66%
double dict 1000 20 484,193 50,134 1,994.6 -89.65% ± 0.90%
double dict 100 0 435,922 49,901 2,004.0 -88.55% ± 0.35%
double dict 100 20 445,821 49,194 2,032.8 -88.97% ± 0.61%
double dict 10 0 415,168 44,460 2,249.2 -89.29% ± 0.25%
double dict 10 20 424,133 54,424 1,837.4 -87.17% ± 0.86%
double dict 256 0 427,713 72,421 1,380.8 -83.07% ± 1.25%
double dict 256 20 818,640 40,276 2,482.9 -95.08% ± 0.80%
double plain 0 9,729 10,299 9,709.3 +5.86% ± 4.84%
double plain 20 113,138 16,379 6,105.4 -85.52% ± 0.75%
float dict 1000 0 413,319 58,917 1,697.3 -85.75% ± 1.30%
float dict 1000 20 318,630 63,288 1,580.1 -80.14% ± 1.20%
float dict 100 0 392,572 55,509 1,801.5 -85.86% ± 0.73%
float dict 100 20 351,555 51,252 1,951.1 -85.42% ± 0.42%
float dict 10 0 359,874 53,893 1,855.5 -85.02% ± 0.70%
float dict 10 20 338,140 55,581 1,799.2 -83.56% ± 1.11%
float dict 256 0 335,418 54,712 1,827.8 -83.69% ± 1.25%
float dict 256 20 329,172 44,429 2,250.8 -86.50% ± 1.30%
float plain 0 5,512 5,768 17,336.2 +4.65% ± 10.40%
float plain 20 108,429 8,574 11,662.6 -92.09% ± 0.15%
geobyte delta_bp 0 729,096 48,870 2,046.2 -93.30% ± 0.10%
geobyte delta_bp 20 609,874 48,104 2,078.8 -92.11% ± 0.14%
geobyte plain 0 256,284 3,054 32,739.5 -98.81% ± 0.01%
geobyte plain 20 207,817 6,415 15,587.6 -96.91% ± 0.14%
geoint delta_bp 0 528,998 28,456 3,514.2 -94.62% ± 0.03%
geoint delta_bp 20 471,780 39,560 2,527.8 -91.61% ± 0.13%
geoint plain 0 102,989 4,893 20,437.5 -95.25% ± 0.05%
geoint plain 20 106,993 8,306 12,039.8 -92.24% ± 0.12%
geolong delta_bp 0 527,806 33,864 2,953.0 -93.58% ± 0.14%
geolong delta_bp 20 482,536 64,064 1,561.0 -86.72% ± 0.18%
geolong plain 0 108,600 9,758 10,248.1 -91.01% ± 0.42%
geolong plain 20 107,572 15,292 6,539.6 -85.78% ± 0.44%
geoshort delta_bp 0 693,874 28,721 3,481.8 -95.86% ± 0.02%
geoshort delta_bp 20 596,573 39,209 2,550.4 -93.43% ± 0.09%
geoshort plain 0 245,274 3,412 29,309.3 -98.61% ± 0.03%
geoshort plain 20 200,868 6,469 15,458.4 -96.78% ± 0.11%
int delta_bp 0 427,934 28,223 3,543.2 -93.40% ± 0.03%
int delta_bp 20 476,162 38,776 2,578.9 -91.86% ± 0.17%
int dict 1000 0 383,453 50,049 1,998.0 -86.95% ± 0.64%
int dict 1000 20 359,803 45,818 2,182.6 -87.27% ± 0.41%
int dict 100 0 389,366 56,420 1,772.4 -85.51% ± 1.01%
int dict 100 20 359,534 47,294 2,114.4 -86.85% ± 0.38%
int dict 10 0 376,230 58,866 1,698.8 -84.35% ± 0.94%
int dict 10 20 404,406 45,211 2,211.8 -88.82% ± 0.77%
int dict 256 0 454,350 35,644 2,805.5 -92.15% ± 0.54%
int dict 256 20 359,195 35,288 2,833.8 -90.18% ± 0.38%
int plain 0 4,830 4,938 20,250.2 +2.24% ± 3.53%
int plain 20 108,091 8,452 11,831.1 -92.18% ± 0.19%
ipv4 delta_bp 0 568,732 33,423 2,991.9 -94.12% ± 0.13%
ipv4 delta_bp 20 500,635 39,199 2,551.1 -92.17% ± 0.16%
ipv4 plain 0 102,344 5,068 19,731.8 -95.05% ± 0.21%
ipv4 plain 20 111,545 8,278 12,080.1 -92.58% ± 0.15%
long128 plain 0 118,782 20,165 4,959.1 -83.02% ± 0.72%
long128 plain 20 117,344 29,347 3,407.5 -74.99% ± 0.43%
long256 plain 0 140,733 40,287 2,482.2 -71.37% ± 0.63%
long256 plain 20 132,028 61,505 1,625.9 -53.41% ± 1.08%
long delta_bp 0 432,822 32,881 3,041.3 -92.40% ± 0.08%
long delta_bp 20 476,923 63,834 1,566.6 -86.62% ± 0.12%
long dict 1000 0 446,012 46,252 2,162.0 -89.63% ± 0.47%
long dict 1000 20 380,850 53,871 1,856.3 -85.86% ± 0.95%
long dict 100 0 401,225 55,817 1,791.6 -86.09% ± 0.86%
long dict 100 20 435,891 47,888 2,088.2 -89.01% ± 0.59%
long dict 10 0 444,497 43,183 2,315.7 -90.28% ± 0.64%
long dict 10 20 372,291 43,396 2,304.4 -88.34% ± 0.42%
long dict 256 0 383,830 42,174 2,371.1 -89.01% ± 0.58%
long dict 256 20 362,753 43,076 2,321.5 -88.13% ± 0.67%
long plain 0 9,706 10,088 9,912.7 +3.94% ± 3.57%
long plain 20 109,789 14,723 6,792.0 -86.59% ± 0.21%
short delta_bp 0 604,397 31,904 3,134.4 -94.72% ± 0.16%
short dict 1000 0 392,296 55,739 1,794.1 -85.79% ± 1.12%
short dict 1000 20 297,554 47,489 2,105.8 -84.04% ± 0.65%
short dict 100 0 476,567 51,239 1,951.6 -89.25% ± 1.17%
short dict 100 20 315,858 52,093 1,919.6 -83.51% ± 1.08%
short dict 10 0 332,836 50,613 1,975.8 -84.79% ± 0.44%
short dict 10 20 368,221 45,360 2,204.6 -87.68% ± 1.26%
short dict 256 0 367,668 34,535 2,895.6 -90.61% ± 0.41%
short dict 256 20 313,221 51,961 1,924.5 -83.41% ± 0.91%
short plain 0 166,834 3,322 30,105.9 -98.01% ± 0.06%
string delta_len 0 1,302,978 1,028,769 97.2 -21.04% ± 2.72%
string delta_len 20 1,048,197 891,060 112.2 -14.99% ± 1.80%
string plain 0 827,243 643,596 155.4 -22.20% ± 1.99%
string plain 20 702,703 573,463 174.4 -18.39% ± 1.94%
symbol dict 0 300,938 36,910 2,709.3 -87.74% ± 0.21%
symbol dict 20 272,354 38,294 2,611.4 -85.94% ± 0.52%
timestamp delta_bp 0 441,311 37,593 2,660.1 -91.48% ± 1.05%
timestamp delta_bp 20 494,362 69,732 1,434.1 -85.89% ± 0.38%
timestamp dict 1000 0 424,871 48,015 2,082.7 -88.70% ± 0.82%
timestamp dict 1000 20 431,323 46,542 2,148.6 -89.21% ± 0.80%
timestamp dict 100 0 379,408 51,073 1,958.0 -86.54% ± 0.66%
timestamp dict 100 20 387,867 49,104 2,036.5 -87.34% ± 0.79%
timestamp dict 10 0 353,380 42,609 2,346.9 -87.94% ± 0.29%
timestamp dict 10 20 399,984 45,093 2,217.6 -88.73% ± 0.47%
timestamp dict 256 0 397,179 41,690 2,398.7 -89.50% ± 0.69%
timestamp dict 256 20 393,792 36,533 2,737.3 -90.72% ± 0.44%
timestamp int96_dict 1000 0 446,371 64,686 1,545.9 -85.51% ± 0.20%
timestamp int96_dict 1000 20 419,823 64,478 1,550.9 -84.64% ± 1.54%
timestamp int96_dict 100 0 471,702 76,241 1,311.6 -83.84% ± 1.11%
timestamp int96_dict 100 20 418,980 60,006 1,666.5 -85.68% ± 0.55%
timestamp int96_dict 10 0 440,494 89,664 1,115.3 -79.64% ± 0.98%
timestamp int96_dict 10 20 413,822 59,628 1,677.1 -85.59% ± 0.57%
timestamp int96_dict 256 0 468,948 52,987 1,887.3 -88.70% ± 0.73%
timestamp int96_dict 256 20 418,506 49,424 2,023.3 -88.19% ± 0.33%
timestamp int96_plain 0 284,787 28,342 3,528.3 -90.05% ± 0.11%
timestamp int96_plain 20 273,188 42,800 2,336.5 -84.33% ± 1.45%
timestamp plain 0 9,910 10,835 9,229.8 +9.33% ± 9.33%
timestamp plain 20 107,857 15,945 6,271.4 -85.22% ± 0.31%
uuid plain 0 419,607 29,917 3,342.6 -92.87% ± 0.31%
uuid plain 20 322,229 33,022 3,028.3 -89.75% ± 0.38%
varchar delta_len 0 1,286,717 955,456 104.7 -25.74% ± 2.36%
varchar delta_len 20 1,012,126 786,820 127.1 -22.26% ± 2.11%
varchar plain 0 766,155 640,637 156.1 -16.38% ± 2.71%
varchar plain 20 648,563 507,611 197.0 -21.73% ± 1.78%

Clickbench

LZ4 - No JIT

Query Native (ms) Parquet (patch) (ms) Parquet (master) (ms)
q0 0.05 0.05 0.05
q1 7.46 11.13 32.72
q2 3.33 48.38 166.10
q3 8.24 29.27 41.29
q4 582.27 598.28 752.78
q5 236.61 426.64 405.08
q6 0.13 24.40 27.13
q7 46.73 39.71 70.52
q8 705.45 828.19 893.44
q9 974.27 1076.46 1306.47
q10 98.72 130.94 161.66
q11 100.14 160.07 189.64
q12 291.10 415.05 536.96
q13 423.31 617.65 631.93
q14 353.74 457.48 542.48
q15 627.46 688.69 739.17
q16 1314.12 1358.33 1641.19
q17 1323.06 1428.14 1500.67
q18 2155.66 1862.12 2326.58
q19 25.93 42.72 57.27
q20 324.15 1208.27 1205.13
q21 333.52 1216.86 1448.97
q22 268.88 1984.54 2699.37
q23 13.39 60.51 71.94
q24 1.97 3.65 2.50
q25 49.45 513.64 592.55
q26 1.93 3.01 3.21
q27 389.67 1432.94 1375.18
q28 2706.48 3262.49 3082.35
q29 1.84 27.48 65.07
q30 244.69 368.89 438.08
q31 244.16 383.39 485.58
q32 3347.58 3068.47 6290.27
q33 2013.55 3027.17 4268.67
q34 1587.28 3010.46 4345.07
q35 418.00 457.46 626.09
q36 51.79 937.83 1075.45
q37 51.98 755.10 839.10
q38 42.64 117.24 199.13
q39 68.16 186.15 228.63
q40 44.26 115.41 183.28
q41 40.82 100.58 176.53
q42 8.53 14.76 21.00

UNCOMPRESSED - JIT

JIT Enabled (raw):

Query Native (ms) Parquet (ms)
q0 0.05 0.05
q1 7.73 14.45
q2 3.46 23.13
q3 8.07 20.98
q4 473.92 745.61
q5 234.85 486.49
q6 0.13 34.24
q7 10.04 17.97
q8 726.16 978.19
q9 971.30 1450.31
q10 80.67 140.04
q11 82.14 153.96
q12 269.65 421.44
q13 392.52 583.65
q14 324.46 437.91
q15 588.61 728.95
q16 1410.78 1398.66
q17 1409.21 1566.13
q18 1932.34 2290.84
q19 9.70 30.00
q20 320.46 1179.17
q21 335.79 1202.78
q22 317.57 10922.80
q23 12.80 45.73
q24 1.16 2.17
q25 40.68 411.32
q26 1.15 2.58
q27 323.99 1358.67
q28 2199.80 2959.43
q29 1.73 7.93
q30 223.81 376.08
q31 235.53 418.54
q32 3253.44 4678.71
q33 1655.19 10661.52
q34 1591.12 11665.26
q35 443.15 612.40
q36 24.15 2250.11
q37 21.23 2107.05
q38 11.60 116.60
q39 41.49 190.49
q40 12.71 106.88
q41 12.03 94.94
q42 3.66 13.15

@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch rd_parquet_decode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

RaphDal and others added 27 commits February 9, 2026 14:56
@RaphDal
Copy link
Contributor Author

RaphDal commented Feb 24, 2026

@coderabbitai please review

@kafka1991
Copy link
Collaborator

kafka1991 commented Feb 26, 2026

@coderabbitai pls do a critical review

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

I'll conduct a thorough critical review of this PR. Let me start by examining the code changes, particularly focusing on the areas with low test coverage and the bug fix mentioned.


🏁 Script executed:

#!/bin/bash
# Get the list of changed files in this PR
gh pr view 6759 --json files --jq '.files[].path' | head -20

Length of output: 757


🏁 Script executed:

#!/bin/bash
# Look at the delta_bitpacked decoder that has 0% coverage
fd -t f "decoder.rs" qdbr/parquet2/src/encoding/delta_bitpacked/ --exec cat {}

Length of output: 130


🏁 Script executed:

#!/bin/bash
# Examine the bug fix around null sentinel values mentioned in PR description
rg -n "sentinel" --type rust -C 5

Length of output: 919


🏁 Script executed:

#!/bin/bash
# Check for unsafe code blocks in the modified decoders
rg -n "unsafe" qdbr/src/parquet_read/decoders/ --type rust -C 3

Length of output: 71


🏁 Script executed:

#!/bin/bash
# Look at the slice_reader with low coverage
cat qdbr/parquet2/src/read/page/slice_reader.rs

Length of output: 75


🏁 Script executed:

#!/bin/bash
# Examine the test files to understand what's being tested
fd -t f "test" qdbr/src/parquet_read/decoders/ --exec basename {}

Length of output: 118


🏁 Script executed:

#!/bin/bash
# Look at how null_count=0 optimization is implemented
rg -n "null_count.*0" qdbr/src/parquet_read/ --type rust -C 5

Length of output: 62


🏁 Script executed:

#!/bin/bash
# Fix paths - examine the delta_bitpacked decoder with 0% coverage
cat core/rust/qdbr/parquet2/src/encoding/delta_bitpacked/decoder.rs

Length of output: 15334


🏁 Script executed:

#!/bin/bash
# Check for unsafe code blocks in the modified decoders
rg -n "unsafe" core/rust/qdbr/src/parquet_read/decoders/ --type rust -C 3

Length of output: 36574


🏁 Script executed:

#!/bin/bash
# Look at the slice_reader with low coverage
cat core/rust/qdbr/parquet2/src/read/page/slice_reader.rs

Length of output: 6224


🏁 Script executed:

#!/bin/bash
# Look at how null_count=0 optimization is implemented
rg -n "null_count" core/rust/qdbr/src/parquet_read/ --type rust -C 3 | head -80

Length of output: 4946


🏁 Script executed:

#!/bin/bash
# Examine the main decode.rs file which has 78% coverage - what's missing?
cat core/rust/qdbr/src/parquet_read/decode.rs | head -200

Length of output: 6223


🏁 Script executed:

#!/bin/bash
# Look at the benchmark implementation to understand what's being measured
cat core/rust/qdbr/benches/decode_page.rs | head -150

Length of output: 3790


🏁 Script executed:

#!/bin/bash
# Check the diff for the null sentinel bug fix
gh pr diff 6759 --patch | grep -A 10 -B 10 "sentinel\|null.*=" | head -50

Length of output: 143


🏁 Script executed:

#!/bin/bash
# Look at test files to understand test coverage
fd -t f "test" core/rust/qdbr/src/parquet_read/ -e rs | head -10

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Search for tests related to delta_binary_packed
rg -n "mod tests" core/rust/qdbr/src/parquet_read/decoders/delta_binary_packed.rs -A 20

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check what null sentinel values are used
rg -n "null_value|NULL" core/rust/qdbr/src/parquet_read/decoders/ --type rust | grep -E "(const|let|fn|null_value)" | head -40

Length of output: 3106


🏁 Script executed:

#!/bin/bash
# Look at the delta_bitpacked batch decoder implementation more carefully for edge cases
cat core/rust/qdbr/parquet2/src/encoding/delta_bitpacked/decoder.rs | grep -A 30 "decode_batch"

Length of output: 1231


🏁 Script executed:

#!/bin/bash
# Check if there are any overflow or boundary checks in decoders
rg -n "overflow|saturating|checked|wrapping" core/rust/qdbr/src/parquet_read/decoders/delta_binary_packed.rs -C 2

Length of output: 4582


🏁 Script executed:

#!/bin/bash
# Look for error handling in the new batch decoder
rg -n "Error|Result|unwrap" core/rust/qdbr/parquet2/src/encoding/delta_bitpacked/decoder.rs -C 2 | head -80

Length of output: 2994


🏁 Script executed:

#!/bin/bash
# Check how the null_count optimization is used - potential for bugs if metadata is wrong
rg -n "null_count\(\)" core/rust/qdbr/src/parquet_read/decode.rs -B 3 -A 10 | head -50

Length of output: 427


@kafka1991, conducting a critical review as requested. I have serious concerns about this PR:

🚨 Critical: Zero Test Coverage on Core Performance Path

The delta_bitpacked decoder has 0% coverage with 65 new lines. This is unacceptable for code that's in the critical decode path. The new decode_batch method in core/rust/qdbr/parquet2/src/encoding/delta_bitpacked/decoder.rs has zero tests despite:

  1. Complex state management across miniblocks
  2. Manual buffer management with a hardcoded 128-element stack array
  3. Wrapping arithmetic for delta encoding
  4. Multiple early returns and edge cases

Question: Why is there no test coverage here? The existing tests only cover the iterator interface, not the batch decoder.

🚨 Trusting Parquet Metadata Without Validation

Line 2100-2103 in decode.rs:

let nc = page.header.null_count();
if nc == Some(0) {
    return Ok(None);
}

You're trusting parquet metadata to skip definition level decoding. What happens with:

  • Corrupted parquet files where null_count=0 but nulls exist?
  • Maliciously crafted files?
  • Files from buggy writers?

This optimization trades correctness for performance. Have you validated this against malformed inputs? Where are the fuzz tests?

🚨 Undocumented Bug Fix

The PR description mentions:

"Fixing bug with incorrect null sentinel values"

Where is this bug fix? I can't find:

  • What the incorrect values were
  • What they should be
  • Test cases that would have caught this
  • Verification the fix is complete

This is a data corruption bug and needs full documentation and regression tests.

⚠️ Unsafe Code with Fragile Assumptions

Lines 313-316, 334-337, 348-351, 358-363 in delta_binary_packed.rs and throughout other decoders:

unsafe {
    *self.buffers_ptr.add(self.buffers_offset) = self.null_value;
}

Every decoder relies on the comment:

"SAFETY: destination pointer stays in-bounds because decode paths reserve output upfront."

This assumes:

  1. Reservation calculations are always correct
  2. Definition levels match value counts perfectly
  3. Parquet metadata is accurate

What validates these assumptions? One off-by-one error and you have UB. Where are the debug assertions? Why not use safe slice indexing in debug builds?

⚠️ Panic Points in Production

Line 286 in delta_bitpacked/decoder.rs:

let current_block = self.current_block.as_mut().unwrap();

Line 85:

Some(bitpacked::Decoder::try_new(miniblock, num_bits, length).unwrap())

Line 111, 144-145:

out[written + i] = self.min_delta + mb.next().unwrap_or_default() as i64;
// ...
.map(|x| x.next().unwrap_or_default())
.unwrap_or(0) as i64;

Multiple unwrap() calls in hot paths. Why are these safe? What prevents panics? unwrap_or_default() will silently return 0 on error - is that the intended behavior or are you hiding bugs?

⚠️ Wrapping Arithmetic Everywhere

Lines 327, 331, 391-396, 410-411, 431-436, 486-490 in delta_binary_packed.rs:

self.current_value = self.current_value.wrapping_add(...)

Why is wrapping arithmetic correct here? Delta encoding typically uses zigzag encoding for signed values. Document why overflow wrapping is semantically correct, or prove these operations can't overflow with valid inputs.

⚠️ Magic Number: 128-Element Buffer

Line 237 in delta_bitpacked/decoder.rs:

let mut delta_buf = [0i64; 128];

Why 128? Is this:

  • Related to miniblock size?
  • A cache line optimization?
  • Arbitrary?

This should be a named constant with documentation. What's the performance impact of different sizes?

Missing Test Coverage for Edge Cases

I found no tests for:

  • ❌ Single-value pages
  • ❌ Empty miniblocks (bitwidth=0)
  • ❌ Miniblock boundary conditions
  • ❌ Maximum value counts (overflow scenarios)
  • ❌ Mismatched null_count metadata
  • ❌ Truncated page data
  • ❌ Invalid bitwidths (>64)

The benchmarks only test the happy path with well-formed data. Where's the adversarial testing?

Questionable: SliceReader at 16.67% Coverage

slice_reader.rs has 16.67% coverage with only 1 of 6 new lines covered. This code handles page header parsing and could be an attack vector. Why is this untested?

Performance Claims Need Validation

The benchmarks show impressive numbers, but:

  1. Jitter acknowledged: "there is a lot of jitter" - are results statistically significant?
  2. Apple M5: These are unreleased chips. Do you mean M4/M3?
  3. No error bars: The ± percentages are provided but confidence intervals missing
  4. Clickbench has regressions: q5 (native 236ms → parquet 426ms), q20-22 show large slowdowns
  5. Plain encoding regressions: Several plain-encoded types show slowdowns (date +6.34%, double +5.86%, float +4.65%, int +2.24%)

Challenge: Have you profiled where the remaining bottlenecks are? The string/varchar/array improvements are modest (-14% to -33%) compared to primitives (-85%+). Why?

Recommendations

Before merging, you must:

  1. Add comprehensive tests for delta_bitpacked batch decoder (target 80%+ coverage)
  2. Document and test the null sentinel bug fix with regression tests
  3. Add validation for the null_count optimization or document the security trade-off
  4. Add debug assertions for all unsafe pointer arithmetic
  5. Replace unwrap() with proper error propagation or document panic conditions
  6. Fuzz test the decoders with malformed parquet files
  7. Document wrapping arithmetic semantics
  8. Add edge case tests for all decoders
  9. Clarify the M5 benchmark environment
  10. Investigate Clickbench regressions (q5, q20-22)

This PR has impressive performance gains but sacrifices safety and correctness without adequate testing. The 74% overall coverage is insufficient for parser code handling untrusted input.

What's your plan to address these concerns?


🧠 Learnings used
Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, for PRs requested by javier, provide line-cited, push-back style reviews with concrete follow-ups (tests, diffs, and performance considerations).

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: For questdb/questdb, javier expects a rigorous, push-back style review; always include concrete line-cited findings and propose targeted tests/fixes when reviewing javier’s PRs.

Learnt from: jerrinot
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-04T11:18:36.492Z
Learning: For PRs by jerrinot, always use a thorough and critical review mode: question design decisions, challenge assumptions, scrutinize implementations for edge cases and potential issues, and avoid superficial validation. They prefer hard-core reviews that push back on the code.

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, javier prefers hard‑core, critical reviews that challenge design choices and edge cases; respond with deep, push-back style analyses for javier’s requests.

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the full PR diff to analyze the changes
gh pr diff 6759 --repo questdb/questdb > /tmp/pr_diff.txt

# Show first part of the diff to understand the scope
head -n 500 /tmp/pr_diff.txt

# Get total line count
wc -l /tmp/pr_diff.txt

Length of output: 16326


🏁 Script executed:

#!/bin/bash
# Extract key decoder implementation files from the diff
echo "=== Delta Binary Packed Decoder ==="
sed -n '/^diff.*delta_binary_packed\/decoder.rs/,/^diff.*[^delta_binary_packed]/p' /tmp/pr_diff.txt | head -n 400

echo -e "\n=== Plain Decoder ==="
sed -n '/^diff.*decoders\/plain.rs/,/^diff/p' /tmp/pr_diff.txt | head -n 400

echo -e "\n=== RLE Dictionary Decoder ==="
sed -n '/^diff.*decoders\/rle_dictionary.rs/,/^diff/p' /tmp/pr_diff.txt | head -n 400

Length of output: 26789


🏁 Script executed:

#!/bin/bash
# Extract more decoder implementations and critical sections
echo "=== Delta Bitpacked Encoding Decoder (parquet2 library) ==="
sed -n '/^diff.*parquet2.*delta_bitpacked/,/^diff/p' /tmp/pr_diff.txt | head -n 500

echo -e "\n=== Decompression/Page Reader Changes ==="
sed -n '/^diff.*slice_reader.rs/,/^diff/p' /tmp/pr_diff.txt | head -n 300

echo -e "\n=== Main Decode Logic ==="
sed -n '/^diff.*parquet_read\/decode.rs/,/^diff/p' /tmp/pr_diff.txt | head -n 600

Length of output: 32314


🏁 Script executed:

#!/bin/bash
# Look for unsafe blocks and potential issues
echo "=== Count of unsafe blocks ==="
grep -c "unsafe" /tmp/pr_diff.txt

echo -e "\n=== Unsafe pointer operations in plain decoder ==="
grep -A5 -B5 "unsafe" /tmp/pr_diff.txt | grep -A5 -B5 "PlainPrimitiveDecoder" | head -n 100

echo -e "\n=== Unsafe operations in RLE dictionary ==="
grep -A5 -B5 "unsafe" /tmp/pr_diff.txt | grep -A5 -B5 "RleDictionary" | head -n 100

echo -e "\n=== Bounds checking patterns ==="
grep -E "(bounds|overflow|len\(\)|checked_|saturating_)" /tmp/pr_diff.txt | head -n 80

echo -e "\n=== Test files added ==="
grep -E "^\+.*#\[test\]|^\+.*mod tests" /tmp/pr_diff.txt | wc -l

Length of output: 4559


@kafka1991, I'll provide a critical review as requested. This PR has significant performance improvements but introduces substantial risk through extensive unsafe code with inadequate testing and questionable error handling patterns.

Critical Issues

1. Zero Test Coverage on Critical Performance Path ⚠️

The new delta_bitpacked/decoder.rs file has 0% coverage despite adding 65 lines including the performance-critical decode_batch() and decode_deltas_batch() methods. This is unacceptable for code that:

  • Uses complex state machines with invariants
  • Performs prefix-sum accumulation on untrusted input
  • Claims wrapping arithmetic is correct

Required: Comprehensive fuzzing and unit tests covering:

  • Empty blocks
  • Miniblock boundary conditions
  • Integer overflow in prefix sums
  • Malformed block headers
  • Values_remaining edge cases

2. Unsafe Pointer Arithmetic Relies on "Trusted" Malicious Input

In PlainPrimitiveDecoder::push() and push_slice():

unsafe {
    *self.buffers_ptr.add(self.buffers_offset) = self
        .converter
        .convert(self.values.add(self.values_offset).read_unaligned());
}

The comment states: "We rely on trusted parquet metadata/level streams to keep source in-bounds". This is fundamentally broken - parquet files can be malicious or corrupted. You're trusting attacker-controlled metadata to prevent memory safety violations.

What happens when:

  • A malicious parquet file claims 1000 values but only provides 100 bytes?
  • Level streams are corrupted and values_offset exceeds buffer bounds?
  • Dictionary page promises N entries but contains fewer?

3. Silent Error Handling Creates Data Corruption Risk

RleDictionaryDecoder stores errors but continues writing to output buffers:

fn push(&mut self) -> ParquetResult<()> {
    if self.inner.error.is_some() {
        return Ok(());  // Returns "success" despite error!
    }
    // ... writes data ...
    if idx >= self.dict.len() {
        self.inner.error = Some(fmt_err!(...));
        return Ok(());  // Silently fails, may leave garbage in buffer
    }
}

This pattern means:

  1. Error occurs on row 1000 of 100K
  2. Decoder returns Ok(()) for remaining 99K rows
  3. Output buffer contains 1K valid + 99K uninitialized/garbage data
  4. Error only surfaces when someone calls result() - if they call it

Fix: Either fail-fast on errors OR explicitly document that callers MUST call result() and handle partial data correctly.

4. Batch Decoder State Invariant Violation

In delta_bitpacked/decoder.rs, the decode_batch() method claims:

// Maintain the invariant: next_value must hold the next value to emit.
// Load the next delta so next_value is ready for subsequent calls.
if self.values_remaining > 0 {
    let delta = self.load_delta()?;
    self.next_value = self.next_value.wrapping_add(delta);
}

But this is after the main loop exits. If load_delta() fails here, what state is the decoder in? The invariant is broken - next_value doesn't contain the next value, and values_remaining > 0 but we can't emit anything. The next decode_batch() call will emit stale data.

5. Pointer Invalidation Not Guarded

Decoders cache raw pointers:

buffers_ptr: *mut U,

Only refreshed in reserve():

self.buffers_ptr = self.buffers.data_vec.as_mut_ptr().cast();

Problem: If ColumnChunkBuffers is ever passed to other code that might reallocate the vec (growing in another decoder, for example), this cached pointer becomes dangling. You're betting the entire memory safety on no interleaved operations touching the same buffer.

6. Dictionary Bounds Checking Insufficient

In RleDictionaryDecoder::push_slice(), byte indices fast path:

RleIterator::ByteIndices { data, pos } => {
    for i in 0..n {
        let idx = unsafe { *bytes.get_unchecked(i) } as u32;
        if idx >= dict_len {
            self.inner.error = Some(fmt_err!(...));
            return Ok(());  // Partial write completed!
        }
    }
}

You checked avail = data.len() - *pos and n = remaining.min(avail), but then use get_unchecked(i) where i < n. If *pos isn't at the start of bytes, bytes[i] != data[*pos + i]. The slicing appears incorrect.

7. Test Coverage on Other Critical Files Inadequate

  • row_groups.rs: 51.35% coverage
  • array.rs: 61.75% coverage
  • rle_dictionary.rs: 66.53% coverage

For decoder code handling untrusted input, anything below 90% is insufficient. You need:

  • Malformed input tests
  • Boundary condition tests
  • Error path coverage
  • Fuzzing integration

8. No Miri Validation Mentioned

With 132 unsafe blocks, you absolutely need Miri testing to detect undefined behavior like use-after-free or uninitialized memory reads. Is this PR miri-clean? Have you run it?

Design Challenges

Question: Why silent error accumulation instead of fail-fast? The pattern where decoders store errors and return Ok(()) is unprecedented in the codebase and creates subtle bugs. What's the justification?

Question: Why cache raw pointers instead of recalculating on each access? The microsecond saving isn't worth the memory safety risk of dangling pointers.

Question: What's the threat model? Are you assuming parquet files are always trusted? If so, document it. If not, every "trusted metadata" comment is a critical bug.

Required Before Merge

  1. Achieve >90% test coverage on all decoder files
  2. Add fuzzing for all decoders with malformed inputs
  3. Run Miri and fix all UB findings
  4. Add explicit bounds checks or prove mathematically why every pointer deref is in-bounds
  5. Document error handling: Must callers check result()? What's in output buffers on error?
  6. Fix state invariant in batch decoder error path
  7. Fix dictionary byte indices slicing bug

The performance gains are impressive, but this code is a memory safety minefield. Every unsafe block needs a proof comment, every pointer deref needs bounds justification, and every error path needs coverage.


🧠 Learnings used
Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, for PRs requested by javier, provide line-cited, push-back style reviews with concrete follow-ups (tests, diffs, and performance considerations).

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: For questdb/questdb, javier expects a rigorous, push-back style review; always include concrete line-cited findings and propose targeted tests/fixes when reviewing javier’s PRs.

Learnt from: jerrinot
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-04T11:18:36.492Z
Learning: For PRs by jerrinot, always use a thorough and critical review mode: question design decisions, challenge assumptions, scrutinize implementations for edge cases and potential issues, and avoid superficial validation. They prefer hard-core reviews that push back on the code.

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, javier prefers hard‑core, critical reviews that challenge design choices and edge cases; respond with deep, push-back style analyses for javier’s requests.

@kafka1991
Copy link
Collaborator

@coderabbitai pls do a comprehensive review, appreciate your prompt response.

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@kafka1991, I'll conduct a comprehensive deep-dive review now. Let me examine the actual implementation files in detail.


🏁 Script executed:

#!/bin/bash
# Fetch the actual PR diff to analyze specific changes
gh pr diff 6759 --repo questdb/questdb > /tmp/pr_6759_diff.txt
echo "=== Diff size ==="
wc -l /tmp/pr_6759_diff.txt

# Get list of all changed files
echo -e "\n=== Changed files ==="
gh pr diff 6759 --repo questdb/questdb --name-only

Length of output: 2591


🏁 Script executed:

#!/bin/bash
# Examine the critical delta_bitpacked decoder with 0% coverage
echo "=== Delta Bitpacked Decoder (0% coverage) ==="
cat core/rust/qdbr/parquet2/src/encoding/delta_bitpacked/decoder.rs

Length of output: 15380


🏁 Script executed:

#!/bin/bash
# Look at the main decode.rs file to understand null_count optimization
echo "=== Main decode.rs - null_count usage ==="
rg -n "null_count" core/rust/qdbr/src/parquet_read/decode.rs -B 5 -A 10

Length of output: 507


🏁 Script executed:

#!/bin/bash
# Examine unsafe code patterns in delta_binary_packed.rs
echo "=== Delta Binary Packed unsafe blocks ==="
cat core/rust/qdbr/src/parquet_read/decoders/delta_binary_packed.rs | head -500

Length of output: 18293


🏁 Script executed:

#!/bin/bash
# Look at plain decoder implementation
echo "=== Plain Decoder Implementation ==="
cat core/rust/qdbr/src/parquet_read/decoders/plain.rs

Length of output: 14671


🏁 Script executed:

#!/bin/bash
# Examine RLE dictionary decoder
echo "=== RLE Dictionary Decoder ==="
cat core/rust/qdbr/src/parquet_read/decoders/rle_dictionary.rs

Length of output: 49855


🏁 Script executed:

#!/bin/bash
# Check what tests exist for decoders
echo "=== Decoder test files ==="
fd -e rs test core/rust/qdbr/src/parquet_read/decoders/
fd -e rs . core/rust/qdbr/src/parquet_read/decoders/ --exec basename {}

echo -e "\n=== Test modules in decoder files ==="
rg "#\[cfg\(test\)\]|mod tests" core/rust/qdbr/src/parquet_read/decoders/ -n

Length of output: 580


🏁 Script executed:

#!/bin/bash
# Look at the benchmark file to understand what's being measured
echo "=== Benchmark implementation ==="
cat core/rust/qdbr/benches/decode_page.rs

Length of output: 62618


🏁 Script executed:

#!/bin/bash
# Check for the null sentinel bug fix mentioned in PR description
echo "=== Searching for null sentinel changes ==="
rg -n "null_value|sentinel|NULL_" core/rust/qdbr/src/parquet_read/decoders/ --type rust -C 3 | head -100

Length of output: 7971


🏁 Script executed:

#!/bin/bash
# Look at slice_reader with low coverage
echo "=== Slice Reader (16.67% coverage) ==="
cat core/rust/qdbr/parquet2/src/read/page/slice_reader.rs

Length of output: 6263


🏁 Script executed:

#!/bin/bash
# Check error handling patterns across decoders
echo "=== Error handling patterns ==="
rg -n "self\.error|self\.inner\.error" core/rust/qdbr/src/parquet_read/decoders/ --type rust -C 3 | head -150

Length of output: 10721


🏁 Script executed:

#!/bin/bash
# Look for any existing fuzz tests or property-based tests
echo "=== Fuzz/Property tests ==="
fd -e rs . core/rust/qdbr/ | xargs rg "proptest|quickcheck|arbitrary|fuzz" -l

Length of output: 70


🏁 Script executed:

#!/bin/bash
# Examine the unpack lookup table for booleans
echo "=== Boolean unpack implementation ==="
cat core/rust/qdbr/src/parquet_read/decoders/unpack.rs

Length of output: 4301


🏁 Script executed:

#!/bin/bash
# Check UUID byte swapping specialization
echo "=== UUID byte swapping ==="
rg -n "uuid|UUID" core/rust/qdbr/src/parquet_read/decoders/converters.rs -B 5 -A 10

Length of output: 1296


🏁 Script executed:

#!/bin/bash
# Look at the actual diff for decode.rs to see the null_count optimization changes
echo "=== decode.rs changes in diff ==="
sed -n '/^diff.*decode\.rs/,/^diff/p' /tmp/pr_6759_diff.txt | head -800

Length of output: 34550


@kafka1991, here's my comprehensive deep-dive review. I appreciate your patience.

🔴 CRITICAL: Zero Test Coverage on Batch Decoder

File: core/rust/qdbr/parquet2/src/encoding/delta_bitpacked/decoder.rs (0% coverage, 65 new lines)

Lines 99-147: decode_batch() method

This is your primary performance optimization path, yet it has zero test coverage. Critical issues:

Line 111: out[written + i] = self.min_delta + mb.next().unwrap_or_default() as i64;

  • unwrap_or_default() silently returns 0 on miniblock exhaustion
  • Is this semantically correct or hiding bugs? Delta encoding shouldn't have "default" values

Lines 113-126: Batch delta accumulation

let mut written = 0;
while written < count && self.values_remaining > 0 {
    if let Some(ref mut block) = self.current_block {
        let batch_count = (count - written).min(self.values_remaining);
        let mut delta_buf = [0i64; 128];
  • Magic number 128: Why this size? Document relationship to miniblock/block structure
  • Stack allocation of 128 i64s (1KB) - is this safe for deeply nested call stacks?
  • What happens if values_remaining is corrupted in malicious files?

Lines 136-141: Wrapping arithmetic prefix sum

let mut acc = self.next_value;
for i in 0..decoded {
    acc = acc.wrapping_add(delta_buf[i]);
    out[written + i] = acc;
}
  • Why is wrapping semantically correct? Delta encoding with zigzag should never overflow with valid inputs
  • Document why overflow wrapping is acceptable, or prove it can't happen
  • Wrapping hides potential corruption in malicious files

Lines 150-161: State invariant maintenance

if self.values_remaining > 0 {
    let delta = self.load_delta()?;
    self.next_value = self.next_value.wrapping_add(delta);
}
  • CRITICAL: If load_delta() fails here (line 160), what state is decoder in?
  • Comment claims "next_value must hold the next value to emit" but it won't after error
  • Next decode_batch() call will emit stale/incorrect data

Required Actions:

  1. Add unit tests for: empty blocks, single value, miniblock boundaries, max values, truncated data
  2. Add property-based tests verifying: decode_batch() == repeated next()
  3. Document or eliminate wrapping arithmetic
  4. Fix error handling in invariant maintenance
  5. Replace magic number 128 with named constant

🔴 CRITICAL: Trusting Untrusted Parquet Metadata

File: core/rust/qdbr/src/parquet_read/decode.rs

Lines 2100-2103: null_count optimization

let nc = page.header.null_count();
if nc == Some(0) {
    return Ok(None);
}

This is fundamentally broken for untrusted input. You're skipping definition level decoding based solely on parquet metadata. Consider:

  • Malicious file: Claims null_count=0, actually has nulls → reads uninitialized memory or wrong data
  • Corrupted file: Metadata corrupted → silent data corruption
  • Buggy writer: Incorrect metadata → wrong results

Similar issue throughout decoders: Comments like "SAFETY: destination pointer stays in-bounds because decode paths reserve output upfront" (plain.rs:164-167, delta_binary_packed.rs:313-316) trust that:

  1. Parquet value counts are accurate
  2. Definition levels match value counts
  3. Dictionary indices are in-bounds
  4. Page sizes are correct

Every one of these assumptions can be violated by malicious/corrupt files.

Challenge: What's your threat model? If you accept only trusted parquet files, document it prominently. If you accept arbitrary files, every unsafe block needs validation.

Required Actions:

  1. Document threat model (trusted vs untrusted sources)
  2. If untrusted: Add validation before skipping decoding based on metadata
  3. Add debug assertions checking all unsafe assumptions
  4. Add fuzz tests with malformed/malicious parquet files

🔴 CRITICAL: Where's the Null Sentinel Bug Fix?

PR description states:

"Fixing bug with incorrect null sentinel values"

I cannot find this bug fix in any of the code changes. Searched for:

  • Changes to null sentinel constants
  • Changes to null value initialization
  • Tests verifying null values
  • Documentation of what was wrong

This is a data corruption bug. Where are:

  1. The actual fix?
  2. Regression tests demonstrating the bug?
  3. Documentation of which null sentinels were wrong?
  4. Verification that all null sentinels are now correct?

Required Actions:

  1. Document what null values were incorrect
  2. Add regression tests
  3. Verify all null sentinel values across all types

🟡 HIGH: Silent Error Accumulation Anti-Pattern

File: core/rust/qdbr/src/parquet_read/decoders/rle_dictionary.rs

Lines 112-130: push() method

fn push(&mut self) -> ParquetResult<()> {
    if self.inner.error.is_some() {
        return Ok(());  // ❌ Returns success despite error!
    }
    // ...
    if idx >= self.dict.len() {
        self.inner.error = Some(fmt_err!(...));
        return Ok(());  // ❌ Silently fails, continues writing
    }
}

This pattern is dangerous:

  1. Error occurs on row 1000 of 100K
  2. Decoder returns Ok(()) for remaining 99K rows
  3. Output buffer contains mixed valid/garbage data
  4. Error only surfaces when someone calls result() - if they remember

The same pattern exists in:

  • rle.rs lines 184-186
  • plain.rs lines 86-94

Question: Why not fail-fast? The performance cost of checking error state is negligible compared to continuing to write potentially incorrect data.

Required Actions:

  1. Either: Fail-fast on first error (recommended)
  2. Or: Document that callers must check result() and how to handle partial data
  3. Add tests verifying error propagation doesn't produce partial garbage output

🟡 HIGH: Pointer Invalidation Hazard

All decoder files using cached pointers (plain.rs:50, delta_binary_packed.rs:237, rle_dictionary.rs:101, etc.):

buffers_ptr: *mut T,

Refreshed only in reserve():

self.buffers_ptr = self.buffers.data_vec.as_mut_ptr().cast();

Problem: If ColumnChunkBuffers is ever reallocated elsewhere (e.g., another decoder operating on same buffer, external code), this cached pointer becomes dangling.

You're betting memory safety on:

  1. No interleaved operations touching the same buffer
  2. No external code reallocating the vec
  3. Perfect discipline in calling reserve() before every write

Challenge: Why cache pointers instead of recalculating? The nanosecond saved isn't worth the memory safety risk.

Required Actions:

  1. Add debug assertions validating cached pointer still matches vec pointer
  2. Or: Remove caching, recalculate pointer each time (safe)
  3. Run Miri on all tests to detect UB

🟡 MODERATE: Multiple unwrap() in Hot Paths

File: core/rust/qdbr/parquet2/src/encoding/delta_bitpacked/decoder.rs

  • Line 86: bitpacked::Decoder::try_new(miniblock, num_bits, length).unwrap()
  • Line 111: mb.next().unwrap_or_default()
  • Line 144-145: .next().unwrap_or_default()
  • Line 286: self.current_block.as_mut().unwrap()

For line 286: Comment says "At this point we must have at least one block" but what if:

  • Corrupted total_count in header?
  • Truncated page data?
  • Bug in state machine?

Panics in production are unacceptable. Especially in parsers handling untrusted input.

Required Actions:

  1. Replace unwrap() with proper error propagation
  2. Or: Document invariants that make unwrap safe with debug assertions
  3. Add tests triggering each unwrap path with invalid data

🟡 MODERATE: Test Coverage Insufficient for Parsers

Coverage Report:

  • delta_bitpacked/decoder.rs: 0% (65 lines)
  • slice_reader.rs: 16.67% (6 lines)
  • row_groups.rs: 51.35%
  • array.rs: 61.75%
  • rle_dictionary.rs: 66.53% (but has extensive tests in reality)

For code handling untrusted input, <90% coverage is inadequate.

I found NO tests for:

  • ❌ Malformed headers (truncated, oversized, invalid encodings)
  • ❌ Mismatched metadata (null_count != actual nulls)
  • ❌ Dictionary index out of bounds
  • ❌ Invalid bitwidths (0, >64)
  • ❌ Truncated page data mid-miniblock
  • ❌ Empty pages, single-value pages
  • ❌ Overflow scenarios in delta accumulation

Note: rle_dictionary.rs actually has excellent test coverage starting line 357 (I see 30+ test functions covering many edge cases). This is the quality level all decoders need.

Required Actions:

  1. Achieve >90% coverage on all decoder files
  2. Add adversarial/fuzz tests for malformed input
  3. Property-based tests verifying decoder invariants
  4. Integration tests with real-world malicious parquet files

🟢 GOOD: Boolean Lookup Table Optimization

File: core/rust/qdbr/src/parquet_read/decoders/plain.rs lines 29-44

const BOOLEAN_BITMAP_LUT: [[u8; 8]; 256] = {
    let mut lut = [[0u8; 8]; 256];
    let mut i = 0u16;
    while i < 256 {
        let b = i as u8;
        lut[i as usize] = [
            b & 1, (b >> 1) & 1, ...
        ];
        i += 1;
    }
    lut
};

This is excellent - compile-time lookup table generation. Explains the 97-98% performance improvement for boolean decoding.


🟢 GOOD: UUID Byte Swapping

File: core/rust/qdbr/src/parquet_read/decoders/converters.rs lines 130-145

Properly reverses byte order from Parquet big-endian to QuestDB little-endian. Explains 92.87% improvement for UUID decoding.


🟢 GOOD: RLE Dictionary Test Coverage

File: core/rust/qdbr/src/parquet_read/decoders/rle_dictionary.rs lines 357+

Exemplary test coverage: 30+ test functions covering:

  • Basic operations (push, push_slice, skip)
  • Null handling
  • Zero bit width (single-value dictionary)
  • Boundary conditions
  • Mixed operations
  • Out-of-bounds indices
  • Large data sets
  • Different bit widths
  • Consistency checks

This is the standard all other decoders should meet.


📊 Benchmark Questions

  1. "Apple M5" - This chip doesn't exist yet (current is M4). Do you mean M3/M4?

  2. Jitter acknowledged but no statistical significance testing. Are the improvements real or noise?

  3. Clickbench regressions:

    • q5: native 236ms → parquet 426ms (+80%)
    • q20-22: native 320-270ms → parquet 1200-2000ms (+275-640%)

    What's causing these regressions? Are they expected?

  4. Plain encoding regressions for some types:

    • date/double/float/int: +2% to +9%

    Why are plain primitives slower?


Required Before Merge

Testing

  1. ✅ Achieve >90% test coverage on all decoder files
  2. ✅ Add comprehensive tests for delta_bitpacked batch decoder
  3. ✅ Add fuzz tests for all decoders with malformed input
  4. ✅ Property-based tests verifying decode invariants
  5. ✅ Run Miri on all tests, fix all UB findings

Documentation

  1. ✅ Document threat model (trusted vs untrusted input)
  2. ✅ Document or fix null sentinel bug mentioned in PR description
  3. ✅ Document why wrapping arithmetic is correct for delta encoding
  4. ✅ Document silent error handling pattern or fix it
  5. ✅ Add proof comments for all unsafe blocks

Code Quality

  1. ✅ Replace unwrap() with proper error handling or document safety
  2. ✅ Add validation for null_count optimization or document security trade-off
  3. ✅ Fix state invariant bug in delta_bitpacked decode_batch error path
  4. ✅ Add debug assertions for all pointer arithmetic
  5. ✅ Explain Clickbench regressions (q5, q20-22)

The performance gains are impressive (median -87%), but this PR sacrifices correctness and safety. The delta_bitpacked batch decoder with 0% coverage is going into production handling potentially malicious input with extensive unsafe code. That's unacceptable.

What's your plan to address these concerns? Happy to discuss the architectural questions (error handling pattern, wrapping arithmetic, null_count optimization) if you have justifications I'm missing.


🧠 Learnings used
  • kafka1991 and team expect rigorous, push-back style reviews
  • Always include line-cited findings and concrete follow-ups
  • Challenge design decisions and scrutinize edge cases
  • Don't accept superficial validation

🧠 Learnings used
Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, for PRs requested by javier, provide line-cited, push-back style reviews with concrete follow-ups (tests, diffs, and performance considerations).

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: For questdb/questdb, javier expects a rigorous, push-back style review; always include concrete line-cited findings and propose targeted tests/fixes when reviewing javier’s PRs.

Learnt from: jerrinot
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-04T11:18:36.492Z
Learning: For PRs by jerrinot, always use a thorough and critical review mode: question design decisions, challenge assumptions, scrutinize implementations for edge cases and potential issues, and avoid superficial validation. They prefer hard-core reviews that push back on the code.

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, javier prefers hard‑core, critical reviews that challenge design choices and edge cases; respond with deep, push-back style analyses for javier’s requests.

@bluestreak01
Copy link
Member

@RaphDal — reviewed the Rust changes in this PR. Each finding was verified against the code on the PR branch.

Bugs

1. PlainVarSlicer::next()/skip() — unchecked read_unaligned (slicer/mod.rs)

next() reads a u32 length via ptr::read_unaligned(self.data.as_ptr().add(self.pos)) without checking self.pos + 4 <= self.data.len(). On a truncated Parquet file, self.pos can exceed the slice, causing UB before the subsequent slice indexing would panic. Same issue in skip().

Fix: bounds-check before the read, e.g.:

if self.pos + size_of::<u32>() > self.data.len() {
    // return error
}
let len = u32::from_le_bytes(self.data[self.pos..self.pos + 4].try_into().unwrap()) as usize;

2. PlainPrimitiveDecoder — no source buffer bounds checking (plain.rs)

new_with() converts the values slice to a raw *const T and discards the length. push() and push_slice() read via self.values.add(self.values_offset).read_unaligned() without verifying values_offset is in bounds. If page metadata inflates num_values beyond what the buffer contains, this causes OOB reads. (PlainBooleanDecoder is fine — it has can_read_bits().)

Fix: store the values length and validate up front or in push paths.

3. Copy-paste error message (page.rs)

split_buffer_v1, has_def branch, first error says "rep levels" instead of "def levels":

Error::oos("The number of bytes declared in v1 rep levels is higher than the page size")
//                                              ^^^ should be "def levels"

Improvements

4. BaseVarDictDecoder uses Vec instead of AcVec (dictionary.rs)

dict_values: Vec<&[u8]> bypasses QdbAllocator memory tracking. Same pattern in DecodeContext's decompress buffers.

5. RleDictionaryDecoder — uninitialized memory on error paths (rle_dictionary.rs)

reserve() calls set_len() without initializing new bytes. On sticky error, push()/push_slice() return early leaving bytes uninitialized. RleBooleanDecoder handles this by zeroing on error (ptr::write_bytes(out, 0, count)) — RleDictionaryDecoder should do the same.

6. Duplicated BOOLEAN_BITMAP_LUT (plain.rs, rle.rs)

Byte-identical 256-entry LUT in both files. Could be extracted to a shared module.

7. Test coverage gaps

Good combinatorial coverage for single-column, single-row-group files. Missing:

  • Multi-row-group files
  • Multi-column files
  • All-nulls columns
  • Zero-row files

@bluestreak01
Copy link
Member

@RaphDal The two unreachable!() calls in rle.rs (lines 336 and 373) on RleBooleanRun::None arms would panic if the decoder state machine ever reached them due to a bug or malformed parquet input. Would it be safer to return an error here instead, so that corrupt files produce a graceful error rather than a process-killing panic?

@RaphDal
Copy link
Contributor Author

RaphDal commented Mar 1, 2026

Thanks for the review !
500c2f8 addresses most of these feedbacks (mostly adding manual bound checks).
A few improvements I did not do:

  • 4: This is unecessary, QdbAllocator doesn't brings any clear advantage. It may be useful for long lived allocations (e.g. RowGroupBuffers) but otherwise it's pure overhead.
  • 5: If the Decoder leaves early then the error will be consumed and not the RowGroupBuffer.
  • 7: There are existing Java tests to cover these cases, the new tests are here to validate the proper behavior of decoders.

@RaphDal The two unreachable!() calls in rle.rs (lines 336 and 373) on RleBooleanRun::None arms would panic if the decoder state machine ever reached them due to a bug or malformed parquet input. Would it be safer to return an error here instead, so that corrupt files produce a graceful error rather than a process-killing panic?

These branches should truly unreachable!(), run is set to RleBooleanRun::None when ensure_run` fails or there are no remaining runs to decode. Both cases prevent the match block from being evaluated:

if !self.ensure_run() { // 1st case - returns if it fails
    return;
}

let run_available = self.run.remaining().min(self.remaining_values);
if run_available == 0 { // 2nd case - iterate again if it fails
    self.run = RleBooleanRun::None;
    continue;
}
let take = count.min(run_available);

match &mut self.run {
    RleBooleanRun::Rle { remaining, .. } => {
        // Repeated run skip is just counter arithmetic.
        *remaining -= take;
    }
    RleBooleanRun::Bitpacked { byte_offset, bit_offset, remaining, .. } => {
        // Bitpacked run skip advances bit/byte cursors.
        Self::skip_bitpacked(byte_offset, bit_offset, take);
        *remaining -= take;
    }
    RleBooleanRun::None => unreachable!(),
}

Unless I'm mistaken, returning an error would only bring unnecessary overhead to the caller.

@glasstiger
Copy link
Contributor

[PR Coverage check]

😍 pass : 3364 / 4476 (75.16%)

file detail

path covered line new line coverage
🔵 qdbr/parquet2/src/read/page/slice_reader.rs 1 6 16.67%
🔵 qdbr/src/parquet_read/row_groups.rs 267 520 51.35%
🔵 qdbr/src/parquet_read/decode/array.rs 339 549 61.75%
🔵 qdbr/src/parquet_read/decoders/rle_dictionary.rs 163 245 66.53%
🔵 qdbr/src/parquet_read/slicer/mod.rs 23 32 71.88%
🔵 qdbr/src/parquet_read/page.rs 51 69 73.91%
🔵 qdbr/src/parquet_read/decoders/rle.rs 209 282 74.11%
🔵 qdbr/src/parquet_read/decoders/dictionary.rs 91 120 75.83%
🔵 qdbr/src/parquet_read/decode.rs 851 1085 78.43%
🔵 qdbr/src/parquet_read/decoders/delta_binary_packed.rs 294 340 86.47%
🔵 qdbr/src/parquet_read/decoders/plain.rs 229 266 86.09%
🔵 qdbr/src/parquet_read/decode/decimal.rs 720 834 86.33%
🔵 qdbr/src/parquet_read/decoders/unpack.rs 47 49 95.92%
🔵 qdbr/src/allocator.rs 17 17 100.00%
🔵 qdbr/src/parquet_read/column_sink/var.rs 8 8 100.00%
🔵 qdb-core/src/col_type.rs 6 6 100.00%
🔵 qdbr/parquet2/src/encoding/bitpacked/decode.rs 6 6 100.00%
🔵 qdbr/src/parquet_read/decoders/converters.rs 31 31 100.00%
🔵 qdbr/src/parquet_read/mod.rs 6 6 100.00%
🔵 qdbr/parquet2/src/page/mod.rs 5 5 100.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants