Codestin Search App

andygrove · 2026-02-27T13:51:08Z

Summary

Add native support for array_exists(arr, x -> predicate(x)) in SQL and DataFrame API
First general-purpose lambda expression infrastructure, extensible to array_filter, array_transform, array_forall
Vectorized lambda evaluation: flattens list elements, evaluates lambda body once over expanded batch, reduces per row with SQL three-valued logic
Unsupported lambda bodies (e.g. containing UDFs) fall back to Spark correctly

Add native support for `array_exists(arr, x -> predicate(x))` in SQL and DataFrame API. This is the first general-purpose lambda expression infrastructure, which can later be extended to support `array_filter`, `array_transform`, and `array_forall`. The lambda body is serialized as a regular expression tree where `NamedLambdaVariable` leaf nodes are serialized as `LambdaVariable` proto messages. On the Rust side, `ArrayExistsExpr` evaluates the lambda body vectorized over all elements in a single pass: it flattens list values, expands the batch with repeat indices, appends elements as a `__comet_lambda_var` column, evaluates once, and reduces per row with SQL three-valued logic semantics. Unsupported lambda bodies (e.g. containing UDFs) fall back to Spark. Closes apache#3149

- Remove unused element_type proto field from ArrayExists - Add LargeListArray support via decompose_list helper - Use column index instead of name for lambda variable lookup - Add TimestampNTZType to supported element types - Restore CometNamedLambdaVariable as standalone serde object - Remove SQL-based Scala tests (covered by SQL file tests) - Add DataFrame tests for decimal and date element types - Add negative test for unsupported element type fallback - Add multi-column batch Rust unit test

gstvg · 2026-03-02T17:46:22Z

native/spark-expr/src/array_funcs/array_exists.rs

+        for (i, col) in batch.columns().iter().enumerate() {
+            let expanded = take(col.as_ref(), &repeat_indices_array, None)?;
+            expanded_columns.push(expanded);
+            expanded_fields.push(Arc::new(batch.schema().field(i).clone()));
+        }


non-blocking: I believe this will also expand uncaptured columns (those not referenced in the lambda body)
To avoid that costly expansion, is possible to:

Use a NullArray as it's creation is O(1) regardless of length,

Only includes on the batch the captured columns and the lambda variable, and rewrite the lambda body adjusting columns indices, as done in http://github.com/apache/datafusion/pull/18329/changes#diff-ac23ff0fe78acd71875341026dd5907736e3e3f49e2c398a69e6b33cb6394ae8R92-R139

andygrove added 3 commits February 27, 2026 06:50

fix: remove unused variable binding in lambda pattern match

4e10e4b

comphead mentioned this pull request Feb 27, 2026

[RFC] Add lambda support and array_transform udf apache/datafusion#18921

Draft

andygrove marked this pull request as ready for review March 2, 2026 14:36

gstvg reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add array_exists with lambda expression support#3611

feat: add array_exists with lambda expression support#3611
andygrove wants to merge 3 commits intoapache:mainfrom
andygrove:feat/array-exists-lambda

andygrove commented Feb 27, 2026 •

edited

Loading

Uh oh!

gstvg Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andygrove commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

gstvg Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andygrove commented Feb 27, 2026 •

edited

Loading