Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat(vortex-datafusion): struct scalar conversion + extension-over-struct scan#8453

Open
HarukiMoriarty wants to merge 4 commits into
developfrom
nemo/geo-q1
Open

feat(vortex-datafusion): struct scalar conversion + extension-over-struct scan#8453
HarukiMoriarty wants to merge 4 commits into
developfrom
nemo/geo-q1

Conversation

@HarukiMoriarty

Copy link
Copy Markdown
Contributor

Summary

  1. DataFusion and Vortex can now exchange struct-shaped scalars.
  2. Scan can resolve columns whose type is an extension over a struct.

Testing

add null/non-null struct scalar round-trips.

@HarukiMoriarty HarukiMoriarty requested a review from a team June 16, 2026 15:45
Comment thread vortex-datafusion/src/convert/schema.rs Outdated

/// The struct fields of `dtype` if it is a struct, or of an extension type whose storage is
/// (eventually) a struct -- e.g. the native geo Point over `Struct<x, y>`.
fn struct_fields(dtype: &DType) -> Option<&StructFields> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - this is only used once, can we just inline it?

@codspeed-hq

codspeed-hq Bot commented Jun 16, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 16.42%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
✅ 1544 untouched benchmarks
⏩ 10 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation varbinview_large 131.2 µs 112.7 µs +16.42%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing nemo/geo-q1 (46c1d5c) with develop (67a2b22)

Open in CodSpeed

Footnotes

  1. 10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Comment on lines +123 to +125
DType::Struct(struct_fields, _) => {
let scalar = self.as_struct();
let (fields, arrays): (Vec<Field>, Vec<_>) = struct_fields

@connortsui20 connortsui20 Jun 16, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might as well split this whole branch out into a separate function?

}
ScalarValue::Dictionary(_, v) => Scalar::from_df(v.as_ref()),
ScalarValue::Struct(array) => {
let nullable = array.is_null(0);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the struct array here has a DataType with the nullability info, I think using it is much clearer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok reading down this is not nullable, this is an actual check if its null.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're using it for both, it should be two different things, one derived from the DataType and one from the data itself.

.map(|column| {
Scalar::from_df(
&ScalarValue::try_from_array(&**column, 0).unwrap_or_else(|e| {
unimplemented!(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return an error, unimplemented is semantically code that hasn't been implemented yet (and it panics).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_df is infallible by design — fn from_df(value: &ScalarValue) -> Scalar returns a
Scalar, not a Result — so we can't return an Err here. For the (effectively unreachable)
"struct child can't convert" case, a panic is the only option.

.iter()
.map(|column| {
Scalar::from_df(
&ScalarValue::try_from_array(&**column, 0).unwrap_or_else(|e| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&**column - might be correct, but there's almost certainly a more idiomatic way to express that.

@HarukiMoriarty HarukiMoriarty added the changelog/feature A new feature label Jun 16, 2026
@a10y a10y self-requested a review June 16, 2026 17:31
Signed-off-by: Nemo Yu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants