Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DictStrategy dict-fit probe ignores the configured compressor #8405

@tomsanbear

Description

@tomsanbear

What happened?

DictStrategy decides whether to apply a dictionary layout to a column by probe-compressing the column's first chunk and checking whether the cascade chose a dictionary encoding (compressed.is::<Dict>()). That probe is hardcoded to a stock BtrBlocksCompressor::default():

// vortex-layout/src/layouts/dict/writer.rs
let compressed = BtrBlocksCompressor::default().compress(&chunk, &mut exec_ctx)?;
!compressed.is::<Dict>()

So the dict-fit decision ignores the compressor the caller configured via WriteStrategyBuilder::with_btrblocks_builder / with_compressor. A caller who customizes the cascade — e.g. excluding a dictionary scheme — still gets a layout decision made by a compressor they didn't ask for.

This is observable in-tree: vortex-cuda/gpu-scan-cli writes with BtrBlocksCompressorBuilder::default().only_cuda_compatible(), which deliberately excludes StringDictScheme / BinaryDictScheme / FSSTScheme. Because the probe ignores that, a vortex.dict layout still fires for low-cardinality string columns; the GPU scan then skips Dict fields (gpu-scan-cli/src/main.rs, if field.is::<Dict>() { continue; }), so those columns silently drop off the pure-GPU path the config was built to keep them on.

Steps to reproduce

  1. Build a low-cardinality string column (e.g. 32,768 rows cycling ["alpha","beta","gamma"]).
  2. Write it twice through SESSION.write_options().with_strategy(...):
    • A: WriteStrategyBuilder::default().build()
    • B: WriteStrategyBuilder::default().with_btrblocks_builder(BtrBlocksCompressorBuilder::default().exclude_schemes([StringDictScheme.id()])).build()
  3. Walk each file's layout tree (footer().layout()) for a node whose encoding_id() == "vortex.dict".
  4. Expected: A has a dict layout, B falls back (no dict layout — StringDict was excluded). Actual: both A and B contain a vortex.dict layout, because the probe uses the hardcoded default regardless of B's configuration.

Environment

  • Vortex version: develop @ 9444d20ae (source-level logic bug; not release-specific)
  • Python/Java version: n/a
  • OS: n/a (platform-independent)

Additional context

I have a pull request/branch with this fix on it, will post shortly.

Discovery was encountered myself while experimenting. Reproduced, tracked and found the fix using coding agents.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions