Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

jleibs
Copy link
Member

@jleibs jleibs commented Jul 17, 2025

@jleibs jleibs requested a review from timsaucer July 17, 2025 20:47
@jleibs jleibs added sdk-python Python logging API dataplatform Rerun Data Platform integration include in changelog labels Jul 17, 2025
Copy link
Member

@ntjohnson1 ntjohnson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm we use pyarrow.compute so importing it makes sense. Why did none of our type checkers/linters etc save us?

@ntjohnson1
Copy link
Member

ntjohnson1 commented Jul 17, 2025

FYI nothing saved us because pyarrow isn't typed
apache/arrow#32609
(I also got a no types warning from datafusion too so that might not be packaged correctly or that was just my local setup). We should consider using the 3rd party stubs

Note the datafusion error I saw was becuse rerun imports datafusion but we don't specify it in our dependencies 💥

@jleibs jleibs added this to the 0.24.1 (maybe) milestone Jul 18, 2025
@jleibs jleibs merged commit 02a8eaa into main Jul 18, 2025
45 of 48 checks passed
@jleibs jleibs deleted the jleibs/udf_missing_arrow_compute branch July 18, 2025 11:57
grtlr pushed a commit that referenced this pull request Jul 21, 2025
ntjohnson1 added a commit that referenced this pull request Jul 30, 2025
### Related
Our typing doesn't do what we want. We are running mypy in our base
environment that doesn't have `rerun` installed so we can't properly
check all of our transitive types. See
#10690 and
#10695 for a few footguns.

### What
Updates `py-lint` to be two targets`py-lint-non-sdk` which is now a
separate `.ini` file since we care about that differently, and
`py-lint-rerun` we still run in default environment so imperfect
coverage but we catch more now.

I also went through and "fixed" the broken pieces. Somewhat haphazard
combination of casts, bug fixes, and typing updates.

NOTES:
1. `pixi run -e py py-lint-rerun` isn't integrated anywhere yet. Unclear
where in our CI stack this goes.
2. This will help here but is not a pre-requisite
#10695 (however the combination
of both may complicate things if we go the route of optional
dependencies and testing)
3. General question around BoolArrayLike, FloatArrayLike, is there a
reason for a single value not wrapped in array we don't take the python
equivalent too?
emilk pushed a commit that referenced this pull request Aug 6, 2025
@emilk emilk mentioned this pull request Aug 7, 2025
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataplatform Rerun Data Platform integration include in changelog sdk-python Python logging API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Datafusion UDFs depend on arrow.compute being enabled/injected by .filter
2 participants