Add `rerun-sdk[datafusion]` and `rerun-sdk[all]` #10696

ntjohnson1 · 2025-07-18T00:32:53Z

What

This add the datafusion version we specify in our examples as a lower bound to make sure datafusion gets install with rerun before we use it. Added a simple smoke test to repro the issue then verified it passed with the install specification.

Right now we have some optional dependencies specified with our package. However we don't handle them very carefully. So importing different parts of the package could blow up. Taking inspiration from pandas I deferred imports so we should be able to import anything and not hit errors unless we ACTUALLY use the thing with the dependency. It also makes it clearer how to resolve things since it wasn't obvious to me that our notebook was already incorporated as an optional dependency. python -c "import rerun.notebook" just errors.

Adds rerun[datafusion] for the datafusion dependencies and rerun[all] to get all of the non-testing features.

If we like this approach we should be able to update the couple of different internal places where we duplicate notebook checking to use this global check approach.

github-actions · 2025-07-18T00:33:20Z

Latest documentation preview deployed successfully.

Result	Commit	Link
✅	`3a34bec`	https://landing-5rgn0tp4f-rerun.vercel.app/docs

^{Note: This comment is updated whenever you push a commit.}

github-actions · 2025-07-18T00:33:47Z

Web viewer built successfully. If applicable, you should also test it:

I have tested the web viewer

Result	Commit	Link	Manifest
✅	`3a34bec`	https://rerun.io/viewer/pr/10696	`+nightly` `+main`

^{Note: This comment is updated whenever you push a commit.}

timsaucer · 2025-07-18T14:55:40Z

rerun_py/pyproject.toml

  "pillow>=8.0.0",          # Used for JPEG encoding. 8.0.0 added the `format` arguments to `Image.open`
  "pyarrow>=18.0.0",
  "typing_extensions>=4.5", # Used for PEP-702 deprecated decorator
+  "datafusion>=45.0.0",


For now the best practice is going to be to keep the datafusion python version the same as the datafusion rust version we use for it's FFI crate. They are currently cross-version compatible, but there have been some breaking changes in the most recent release.

That does mean we'd want users to be on 47.0.0.

Also, more generally, I think we don't want this as a dependency in rerun, because it means any of our open source viewer users who are not using datafusion now have to bring in this fairly large package. This is why it hasn't been put in as dependency before.

I know some packages do things like pip install rerun-sdk[datafusion] type things to add in additional dependencies, but I am not sure how that is done.

Gotcha I'll take a look at adding it as optional and hiding the component without that. I'm sure its just a maturing flag. We have a similar issue in our notebook apparently so I'm doing additional larger cleanup for some of the things this kicked up.

…_import

abey79

lgtm, thanks!

abey79 · 2025-07-24T11:12:47Z

rerun_py/rerun_sdk/rerun/utilities/datafusion/functions/url_generation.py

+from rerun.error_utils import RerunOptionalDependencyError
+
+HAS_DATAFUSION = True
+try:
+    from datafusion import Expr, ScalarUDF, col, udf
+except ModuleNotFoundError:
+    HAS_DATAFUSION = False
+


Is the type checker able to detect if a new method accesses eg Expr without first testing HAS_DATAFUSION?

Good question. Our type checking is SO BROKEN right now #10704. Maybe I merge this and then pull main into that branch to check it out.

…_import

Closes #10695 ~This add the datafusion version we specify in our examples as a lower bound to make sure datafusion gets install with rerun before we use it. Added a simple smoke test to repro the issue then verified it passed with the install specification.~ Right now we have some optional dependencies specified with our package. However we don't handle them very carefully. So importing different parts of the package could blow up. Taking inspiration from pandas I deferred imports so we should be able to import anything and not hit errors unless we ACTUALLY use the thing with the dependency. It also makes it clearer how to resolve things since it wasn't obvious to me that our notebook was already incorporated as an optional dependency. `python -c "import rerun.notebook"` just errors. Adds `rerun[datafusion]` for the datafusion dependencies and `rerun[all]` to get all of the non-testing features. If we like this approach we should be able to update the couple of different [internal places](https://github.com/rerun-io/rerun/blob/7484e03f9a98341114c30abad49895258288df76/rerun_py/rerun_sdk/rerun/recording_stream.py#L900) where we duplicate notebook checking to use this global check approach.

ntjohnson1 added 2 commits July 17, 2025 20:29

Add smoke test

9b2c986

Fix failing test with dependency specification

c763f06

ntjohnson1 added the exclude from changelog PRs with this won't show up in CHANGELOG.md label Jul 18, 2025

timsaucer reviewed Jul 18, 2025

View reviewed changes

ntjohnson1 added 3 commits July 18, 2025 15:52

Clear up some ambiguity in optional dependencies

ad1db00

Some typing issues found me

aceb0d4

Fix error string mixup

b666d64

ntjohnson1 changed the title ~~Require Datafusion Since We Import It~~ Make Optional Dependencies Clearer Jul 18, 2025

ntjohnson1 mentioned this pull request Jul 18, 2025

Try to Fix Python Typing #10704

Merged

Merge branch 'main' of github.com:rerun-io/rerun into nick/datafusion…

c8ca5e7

…_import

ntjohnson1 mentioned this pull request Jul 23, 2025

Make Datafusion dependency explicit #10695

Closed

ntjohnson1 requested a review from abey79 July 24, 2025 09:19

abey79 approved these changes Jul 24, 2025

View reviewed changes

Merge branch 'main' of github.com:rerun-io/rerun into nick/datafusion…

3a34bec

…_import

ntjohnson1 added this to the 0.24.1 (maybe) milestone Jul 24, 2025

ntjohnson1 merged commit c5e2c66 into main Jul 24, 2025
47 checks passed

ntjohnson1 deleted the nick/datafusion_import branch July 24, 2025 12:14

emilk changed the title ~~Make Optional Dependencies Clearer~~ Carify optional rerun-sdk dependencies Aug 6, 2025

emilk changed the title ~~Carify optional rerun-sdk dependencies~~ Clarify optional rerun-sdk dependencies Aug 6, 2025

emilk added the sdk-python Python logging API label Aug 6, 2025

emilk changed the title ~~Clarify optional rerun-sdk dependencies~~ Add rerun-sdk[datafusion] and rerun-sdk[all] Aug 6, 2025

emilk added include in changelog and removed exclude from changelog PRs with this won't show up in CHANGELOG.md labels Aug 6, 2025

emilk mentioned this pull request Aug 6, 2025

Release 0.24.1 #10808

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `rerun-sdk[datafusion]` and `rerun-sdk[all]` #10696

Add `rerun-sdk[datafusion]` and `rerun-sdk[all]` #10696

Uh oh!

ntjohnson1 commented Jul 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 18, 2025 •

edited

Loading

Uh oh!

timsaucer Jul 18, 2025

Uh oh!

ntjohnson1 Jul 18, 2025

Uh oh!

abey79 left a comment

Uh oh!

abey79 Jul 24, 2025

Uh oh!

ntjohnson1 Jul 24, 2025

Uh oh!

Uh oh!

Uh oh!

Add rerun-sdk[datafusion] and rerun-sdk[all] #10696

Add rerun-sdk[datafusion] and rerun-sdk[all] #10696

Uh oh!

Conversation

ntjohnson1 commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related

What

Uh oh!

github-actions bot commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timsaucer Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

ntjohnson1 Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

abey79 left a comment

Choose a reason for hiding this comment

Uh oh!

abey79 Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

ntjohnson1 Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Add `rerun-sdk[datafusion]` and `rerun-sdk[all]` #10696

Add `rerun-sdk[datafusion]` and `rerun-sdk[all]` #10696

ntjohnson1 commented Jul 18, 2025 •

edited

Loading

github-actions bot commented Jul 18, 2025 •

edited

Loading

github-actions bot commented Jul 18, 2025 •

edited

Loading