-
Notifications
You must be signed in to change notification settings - Fork 4k
Support polars dataframe/series hashing #10408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a7748aa to
10c6bdd
Compare
| - name: Install integration dependencies | ||
| run: | | ||
| source venv/bin/activate | ||
| uv pip install polars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We install here only polars, and not integration-requirements.txt because ray (another integration requirement is not supported by Python 3.13).
| except TypeError: | ||
| # Use pickle if polars cannot hash the object for example if | ||
| # it contains unhashable objects. | ||
| return b"%s" % pickle.dumps(obj, pickle.HIGHEST_PROTOCOL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log a warning here, since I assume this could potentially be a lot slower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warnings added: I am not sure that this could even actually happen, but let's monitor it and maybe improve the warnings if we see the actual fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be useful to set exc_info=true for all the new warnings to include a bit more context about the issue.
|
@lukasmasuch related to hash determinism, I tested it, and now it is stable across app reruns and restarts (when cache is persistent, for example). Polars hashing is not guaranteed to be deterministic across Polars versions (see Notes here), but I think it is not a big deal, since that means we will recompute the actual function on Polars version upgrades. |
lukasmasuch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Maybe one change related to the logger warning (see other comment).
Describe your changes
GitHub Issue Link (if applicable)
Closes: #10347
Testing Plan
Script for time it:
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.