-
Notifications
You must be signed in to change notification settings - Fork 539
Add rerun-sdk[datafusion]
and rerun-sdk[all]
#10696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Latest documentation preview deployed successfully.
Note: This comment is updated whenever you push a commit. |
Web viewer built successfully. If applicable, you should also test it:
Note: This comment is updated whenever you push a commit. |
rerun_py/pyproject.toml
Outdated
"pillow>=8.0.0", # Used for JPEG encoding. 8.0.0 added the `format` arguments to `Image.open` | ||
"pyarrow>=18.0.0", | ||
"typing_extensions>=4.5", # Used for PEP-702 deprecated decorator | ||
"datafusion>=45.0.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now the best practice is going to be to keep the datafusion python version the same as the datafusion rust version we use for it's FFI crate. They are currently cross-version compatible, but there have been some breaking changes in the most recent release.
That does mean we'd want users to be on 47.0.0.
Also, more generally, I think we don't want this as a dependency in rerun, because it means any of our open source viewer users who are not using datafusion now have to bring in this fairly large package. This is why it hasn't been put in as dependency before.
I know some packages do things like pip install rerun-sdk[datafusion]
type things to add in additional dependencies, but I am not sure how that is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha I'll take a look at adding it as optional and hiding the component without that. I'm sure its just a maturing flag. We have a similar issue in our notebook apparently so I'm doing additional larger cleanup for some of the things this kicked up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks!
from rerun.error_utils import RerunOptionalDependencyError | ||
|
||
HAS_DATAFUSION = True | ||
try: | ||
from datafusion import Expr, ScalarUDF, col, udf | ||
except ModuleNotFoundError: | ||
HAS_DATAFUSION = False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the type checker able to detect if a new method accesses eg Expr
without first testing HAS_DATAFUSION
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Our type checking is SO BROKEN right now #10704. Maybe I merge this and then pull main into that branch to check it out.
rerun-sdk
dependencies
rerun-sdk
dependenciesrerun-sdk
dependencies
rerun-sdk
dependenciesrerun-sdk[datafusion]
and rerun-sdk[all]
Closes #10695 ~This add the datafusion version we specify in our examples as a lower bound to make sure datafusion gets install with rerun before we use it. Added a simple smoke test to repro the issue then verified it passed with the install specification.~ Right now we have some optional dependencies specified with our package. However we don't handle them very carefully. So importing different parts of the package could blow up. Taking inspiration from pandas I deferred imports so we should be able to import anything and not hit errors unless we ACTUALLY use the thing with the dependency. It also makes it clearer how to resolve things since it wasn't obvious to me that our notebook was already incorporated as an optional dependency. `python -c "import rerun.notebook"` just errors. Adds `rerun[datafusion]` for the datafusion dependencies and `rerun[all]` to get all of the non-testing features. If we like this approach we should be able to update the couple of different [internal places](https://github.com/rerun-io/rerun/blob/7484e03f9a98341114c30abad49895258288df76/rerun_py/rerun_sdk/rerun/recording_stream.py#L900) where we duplicate notebook checking to use this global check approach.
Related
Closes #10695
What
This add the datafusion version we specify in our examples as a lower bound to make sure datafusion gets install with rerun before we use it. Added a simple smoke test to repro the issue then verified it passed with the install specification.Right now we have some optional dependencies specified with our package. However we don't handle them very carefully. So importing different parts of the package could blow up. Taking inspiration from pandas I deferred imports so we should be able to import anything and not hit errors unless we ACTUALLY use the thing with the dependency. It also makes it clearer how to resolve things since it wasn't obvious to me that our notebook was already incorporated as an optional dependency.
python -c "import rerun.notebook"
just errors.Adds
rerun[datafusion]
for the datafusion dependencies andrerun[all]
to get all of the non-testing features.If we like this approach we should be able to update the couple of different internal places where we duplicate notebook checking to use this global check approach.