-
Notifications
You must be signed in to change notification settings - Fork 115
feat: add FFI support for user defined functions #1145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I checked the PyO3 docs to convince myself the unsafe
blocks were good and everything checks out.
result = [r.column(0) for r in result] | ||
expected = [ | ||
pa.array([3], type=pa.int64()), | ||
pa.array([3], type=pa.int64()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely unrelated to this PR, but why isn't this null
instead of 3?
> select 1 + null;
+-----------------+
| Int64(1) + NULL |
+-----------------+
| |
+-----------------+
1 row(s) fetched.
Elapsed 0.004 seconds.
I'm more than open to the possibility that the group by has a subtly different behavior than normal addition, but it just caught me by surprise because I expected to see null
there. Poking a bit, none of the other aggregates I thought of off the top of my head return null
so I guess its at least consistent.
Anywho, just a curiosity while I was reviewing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just reusing the Sum
aggregate function, not writing my own. And by default Sum
will skip null values in an aggregation. Sum and addition are actually different in this way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, absolutely! I was just wearing my review hat while reading the test so was predicting that it’d be null and then got surprised so I noted it.
And, “aha!” on sum skipping null values, so thanks for that! I just have the “math on nulls returns nulls” in my brain so knowing that behavior isn’t consistent is a TIL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wat? That was supposed to be an approval.
+1 for real this time.
Ok, that's twice now so I assume that's a config thing on this repo. |
Which issue does this PR close?
Closes #1017
Rationale for this change
Now that we have user defined scalar, aggregate, and window functions in the upstream
datafusion
48.0.0, we can add support indatafusion-python
. This allows for greater code reuse and extends the available options for Rust implementation of functions to be exposed to python.What changes are included in this PR?
For all three types of functions, add pycapsule support in the datafusion-python libraries and add the supporting methods to initialize these as UDFs to register with the session context.
Integration tests are included.
Are there any user-facing changes?
This is only addition. All previous code is not impacted.