Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kajarenc
Copy link
Collaborator

@kajarenc kajarenc commented Feb 15, 2025

Describe your changes

GitHub Issue Link (if applicable)

Closes: #10347

Testing Plan

Script for time it:

import streamlit as st
import polars as pl
import numpy as np
import random

from streamlit.runtime.caching.cache_type import CacheType
from streamlit.runtime.caching.hashing import update_hash
import hashlib
import timeit


random.seed(0)
np.random.seed(0)

def my_function():
    my_hasher = hashlib.new("md5", usedforsecurity=False)

    polars_df = pl.DataFrame(
        np.random.randint(0, 100, size=(100_000, 5)),
        schema=["a", "b", "c", "d", "g"]
    )
    update_hash(polars_df, my_hasher, cache_type=CacheType.DATA)
    return my_hasher.digest()




st.write(my_function())

execution_time = timeit.timeit(
    my_function,
    number=1_000
)
st.write(f"Execution time: {execution_time:.6f} seconds")
  • Explanation of why no additional tests are needed
  • Unit Tests (JS and/or Python)
  • E2E Tests
  • Any manual testing needed?

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

- name: Install integration dependencies
run: |
source venv/bin/activate
uv pip install polars
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We install here only polars, and not integration-requirements.txt because ray (another integration requirement is not supported by Python 3.13).

@kajarenc kajarenc added security-assessment-completed Security assessment has been completed for PR impact:users PR changes affect end users change:feature PR contains new feature or enhancement implementation labels Feb 21, 2025
@kajarenc kajarenc marked this pull request as ready for review February 21, 2025 14:53
@kajarenc kajarenc requested a review from a team as a code owner February 21, 2025 14:53
@kajarenc kajarenc changed the title [WIP] start work on supporting polars dataframe/series hashing Support polars dataframe/series hashing Feb 21, 2025
except TypeError:
# Use pickle if polars cannot hash the object for example if
# it contains unhashable objects.
return b"%s" % pickle.dumps(obj, pickle.HIGHEST_PROTOCOL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log a warning here, since I assume this could potentially be a lot slower?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings added: I am not sure that this could even actually happen, but let's monitor it and maybe improve the warnings if we see the actual fail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be useful to set exc_info=true for all the new warnings to include a bit more context about the issue.

@kajarenc
Copy link
Collaborator Author

@lukasmasuch related to hash determinism, I tested it, and now it is stable across app reruns and restarts (when cache is persistent, for example).

Polars hashing is not guaranteed to be deterministic across Polars versions (see Notes here), but I think it is not a big deal, since that means we will recompute the actual function on Polars version upgrades.

Copy link
Collaborator

@lukasmasuch lukasmasuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Maybe one change related to the logger warning (see other comment).

@kajarenc kajarenc merged commit f10c9f1 into develop Feb 25, 2025
32 checks passed
@kajarenc kajarenc deleted the polars-hashing branch March 12, 2025 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users security-assessment-completed Security assessment has been completed for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Polars objects for cache hashing

5 participants