Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: add FFI support for user defined functions #1145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

timsaucer
Copy link
Contributor

@timsaucer timsaucer commented Jun 9, 2025

Which issue does this PR close?

Closes #1017

Rationale for this change

Now that we have user defined scalar, aggregate, and window functions in the upstream datafusion 48.0.0, we can add support in datafusion-python. This allows for greater code reuse and extends the available options for Rust implementation of functions to be exposed to python.

What changes are included in this PR?

For all three types of functions, add pycapsule support in the datafusion-python libraries and add the supporting methods to initialize these as UDFs to register with the session context.

Integration tests are included.

Are there any user-facing changes?

This is only addition. All previous code is not impacted.

@timsaucer timsaucer self-assigned this Jun 17, 2025
@timsaucer timsaucer marked this pull request as ready for review June 17, 2025 10:36
@timsaucer timsaucer changed the title Draft: feat: Add FFI support for user defined functions feat: add FFI support for user defined functions Jun 17, 2025
Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I checked the PyO3 docs to convince myself the unsafe blocks were good and everything checks out.

result = [r.column(0) for r in result]
expected = [
pa.array([3], type=pa.int64()),
pa.array([3], type=pa.int64()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely unrelated to this PR, but why isn't this null instead of 3?

> select 1 + null;
+-----------------+
| Int64(1) + NULL |
+-----------------+
|                 |
+-----------------+
1 row(s) fetched.
Elapsed 0.004 seconds.

I'm more than open to the possibility that the group by has a subtly different behavior than normal addition, but it just caught me by surprise because I expected to see null there. Poking a bit, none of the other aggregates I thought of off the top of my head return null so I guess its at least consistent.

Anywho, just a curiosity while I was reviewing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just reusing the Sum aggregate function, not writing my own. And by default Sum will skip null values in an aggregation. Sum and addition are actually different in this way!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, absolutely! I was just wearing my review hat while reading the test so was predicting that it’d be null and then got surprised so I noted it.

And, “aha!” on sum skipping null values, so thanks for that! I just have the “math on nulls returns nulls” in my brain so knowing that behavior isn’t consistent is a TIL.

Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wat? That was supposed to be an approval.

+1 for real this time.

@davisp
Copy link
Member

davisp commented Jun 19, 2025

Ok, that's twice now so I assume that's a config thing on this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make it easier to use rust DataFusion UDFs in datafusion-python
2 participants