Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Support Polars objects for cache hashing #10347

@BartSchuurmans

Description

@BartSchuurmans

Checklist

  • I have searched the existing issues for similar feature requests.
  • I added a descriptive title and summary to this issue.

Summary

When passing a Polars dataframe to a function that is decorated with @st.cache_data, you get the following error:

UnhashableParamError: Cannot hash argument 'my_df' (of type polars.dataframe.frame.DataFrame) in 'my_func'.

Follow-up of #5088 (comment)

Why?

I want to be able to cache functions that have a Polars dataframe as input.

How?

There is already support for Pandas dataframes:

elif type_util.is_type(obj, "pandas.core.frame.DataFrame"):
import pandas as pd
obj = cast(pd.DataFrame, obj)
self.update(h, obj.shape)
if len(obj) >= _PANDAS_ROWS_LARGE:
obj = obj.sample(n=_PANDAS_SAMPLE_SIZE, random_state=0)
try:
column_hash_bytes = self.to_bytes(
pd.util.hash_pandas_object(obj.dtypes)
)
self.update(h, column_hash_bytes)
values_hash_bytes = self.to_bytes(pd.util.hash_pandas_object(obj))
self.update(h, values_hash_bytes)
return h.digest()
except TypeError:
# Use pickle if pandas cannot hash the object for example if
# it contains unhashable objects.
return b"%s" % pickle.dumps(obj, pickle.HIGHEST_PROTOCOL)

Support for Polars dataframes could be implemented the same way.
Alternative: Use https://github.com/narwhals-dev/narwhals for a dataframe-agnostic implementation.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature:cacheRelated to `st.cache_data` and `st.cache_resource`feature:cache-hash-funcRelated to cache hashing functionstype:enhancementRequests for feature enhancements or new features

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions