feat(vortex-datafusion): struct scalar conversion + extension-over-struct scan#8453
feat(vortex-datafusion): struct scalar conversion + extension-over-struct scan#8453HarukiMoriarty wants to merge 4 commits into
Conversation
Signed-off-by: Nemo Yu <[email protected]>
c5f96ff to
ffbbfdd
Compare
|
|
||
| /// The struct fields of `dtype` if it is a struct, or of an extension type whose storage is | ||
| /// (eventually) a struct -- e.g. the native geo Point over `Struct<x, y>`. | ||
| fn struct_fields(dtype: &DType) -> Option<&StructFields> { |
There was a problem hiding this comment.
nit - this is only used once, can we just inline it?
Merging this PR will improve performance by 16.42%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | varbinview_large |
131.2 µs | 112.7 µs | +16.42% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing nemo/geo-q1 (46c1d5c) with develop (67a2b22)
Footnotes
-
10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
| DType::Struct(struct_fields, _) => { | ||
| let scalar = self.as_struct(); | ||
| let (fields, arrays): (Vec<Field>, Vec<_>) = struct_fields |
There was a problem hiding this comment.
nit: might as well split this whole branch out into a separate function?
| } | ||
| ScalarValue::Dictionary(_, v) => Scalar::from_df(v.as_ref()), | ||
| ScalarValue::Struct(array) => { | ||
| let nullable = array.is_null(0); |
There was a problem hiding this comment.
the struct array here has a DataType with the nullability info, I think using it is much clearer.
There was a problem hiding this comment.
ok reading down this is not nullable, this is an actual check if its null.
There was a problem hiding this comment.
I think you're using it for both, it should be two different things, one derived from the DataType and one from the data itself.
| .map(|column| { | ||
| Scalar::from_df( | ||
| &ScalarValue::try_from_array(&**column, 0).unwrap_or_else(|e| { | ||
| unimplemented!( |
There was a problem hiding this comment.
Should return an error, unimplemented is semantically code that hasn't been implemented yet (and it panics).
There was a problem hiding this comment.
from_df is infallible by design — fn from_df(value: &ScalarValue) -> Scalar returns a
Scalar, not a Result — so we can't return an Err here. For the (effectively unreachable)
"struct child can't convert" case, a panic is the only option.
| .iter() | ||
| .map(|column| { | ||
| Scalar::from_df( | ||
| &ScalarValue::try_from_array(&**column, 0).unwrap_or_else(|e| { |
There was a problem hiding this comment.
&**column - might be correct, but there's almost certainly a more idiomatic way to express that.
Signed-off-by: Nemo Yu <[email protected]>
Signed-off-by: Nemo Yu <[email protected]>
eaa9ca7 to
f52a2c4
Compare
Summary
Testing
add null/non-null struct scalar round-trips.