-
Notifications
You must be signed in to change notification settings - Fork 4k
Add session scoping to caches. #13482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add a "scope" parameter through the caching stack. Create session-scoped caches as appropriate. Add methods to clear caches by session ID. Call those methods when sessions expire.
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
✅ PR preview is ready!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds session-scoped caching to st.cache_data and st.cache_resource, allowing cache entries to be scoped either globally (default) or per-session. Session-scoped caches are automatically cleared when sessions disconnect or shut down, enabling resource cleanup and per-session initialization patterns.
Key changes:
- Adds a new
scopeparameter ("global" or "session") to both caching decorators - Refactors cache storage from flat dictionaries to nested session-to-function-key mappings
- Implements
clear_session()methods to clean up session-specific caches - Integrates cache clearing into the session disconnect and shutdown lifecycle
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/streamlit/runtime/caching/cache_utils.py | Adds CacheScope type alias and get_session_id_or_throw() helper function; updates CachedFuncInfo to accept scope parameter |
| lib/streamlit/runtime/caching/cache_data_api.py | Refactors DataCaches to use nested dictionaries for session scoping; adds scope parameter to decorator and clear_session() method |
| lib/streamlit/runtime/caching/cache_resource_api.py | Refactors ResourceCaches to use nested dictionaries for session scoping; adds scope parameter to decorator and clear_session() method |
| lib/streamlit/runtime/app_session.py | Adds clear_session_caches() method and integrates it into shutdown and disconnect flows |
| lib/streamlit/runtime/websocket_session_manager.py | Calls clear_session_caches() when sessions disconnect |
| lib/tests/streamlit/runtime/caching/common_cache_test.py | Adds comprehensive tests for session-scoped cache lookup, clearing, and invalid scope handling |
| lib/tests/streamlit/runtime/caching/cache_utils_test.py | New test file for get_session_id_or_throw() utility function |
| lib/tests/streamlit/runtime/app_session_test.py | Updates shutdown tests to verify cache clearing and adds test for clear_session_caches() method |
SummaryThis PR adds a new
This feature addresses several GitHub issues (#8545, #6703) by enabling session-scoped resource management and disconnect hooks. Code QualityThe implementation is clean and follows existing codebase patterns well: Strengths:
Minor Issues:
Test CoverageThe test coverage is comprehensive and well-structured: Covered scenarios:
Tests follow best practices:
Potential additions (optional):
Backwards Compatibility✅ Fully backwards compatible:
Security & RiskNo security concerns identified:
Low regression risk:
Note on documented behavior: Recommendations
VerdictAPPROVED: This is a well-implemented feature that addresses a real user need for session-scoped caching. The code quality is high, test coverage is comprehensive, and the implementation follows existing patterns in the codebase. The minor issues noted (unused test function, missing return type annotation) are not blockers and can be addressed in a follow-up if desired. This is an automated AI review. Please verify the feedback and use your judgment. |
|
|
||
| if session_caches is not None: | ||
| for cache in session_caches.values(): | ||
| cache.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storage not closed when session caches are cleared
The clear_session method in DataCaches calls cache.clear() but does not call cache.storage.close() afterwards. This is inconsistent with clear_all() (which calls both clear() and storage.close() in its fallback path) and get_cache() (which calls storage.close() when replacing a cache). For disk-persisted session-scoped caches, this could result in file handles not being released when sessions are disconnected, leading to a resource leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Updating.
| """Clear all cache_resource caches.""" | ||
| _resource_caches.clear_all() | ||
|
|
||
| def clear_session(self, session_id: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: is clear_session supposed to be exposed to users via the public API? I think there isn't even an official API to retrieve the session ID. In case it should be exposed, it would be good to add a gather_metrics decorator to track its usage. cc @jrieke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe its also cleaner to support this via a flag on the .clear method, e.g. clear(scope="session")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd casually considered this suitable for a public API - but I hadn't really thought through the fact that session ID isn't public info in an app. I don't think this is especially useful for a user - even clear has pretty limited utility - so it should likely just be made private.
I'll pull this into a helpful function at the module level instead, so that it's not an exported symbol by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API lifted to a non-public namespace.
This actually matches the scope of this function better, since it's operating on a module-level object ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, if we wanted to give users the ability to clear caches, it could just be done through the AppSession method.
Clear backing store for session cache for the data cache.
sfc-gh-jkinkead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL!
|
|
||
| if session_caches is not None: | ||
| for cache in session_caches.values(): | ||
| cache.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Updating.
| """Clear all cache_resource caches.""" | ||
| _resource_caches.clear_all() | ||
|
|
||
| def clear_session(self, session_id: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API lifted to a non-public namespace.
This actually matches the scope of this function better, since it's operating on a module-level object ...
lukasmasuch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
## Describe your changes Add session-level scoping to `st.cache_data` and `st.cache_resource`. Add the "scope" parameter through the caching stack. Create session-scoped caches as appropriate. Add methods to clear caches by session ID. Call those methods when sessions are disconnected. **Note**: This clears caches on disconnect _and_ shutdown. In the current websocket session manager, sessions only appear to be shut down when the backend process terminates - and so disconnection is the only hook that's actually invoked in the typical session lifecycle. I'm not sure if this is a bug or a design choice. Either way, like the docs note, this means that session caches might populate multiple times for a single user session in some edge-cases. I think this is fine. ## GitHub Issue Link (if applicable) - Fix for streamlit#8545. An `on_release` hook for a session-scoped resource can be used for disconnect hooks. - Implements one of the suggested fixes for streamlit#6703. ## Testing Plan See unit tests. --- **Contribution License Agreement** By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.
## Describe your changes
Fixes an issue with Snowflake connections not getting re-initialized
after having been closed.
<details>
<summary>Claude issue analysis</summary>
# Issue Report: Snowflake Connection "Connection is closed" Error
## Summary
After recent PRs adding `on_release` to `st.cache_resource` and
session-scoped connection support, users may encounter a
`snowflake.connector.errors.DatabaseError: 250002 (08003): Connection is
closed` error when using `st.connection("snowflake")` with cached data
queries.
## Related PRs
- **PR #13439**: Add `on_release` to `st.cache_resource`
- **PR #13482**: Add session scoping to caches
- **PR #13538**: Add `SnowflakeCallersRightsConnection`
- **PR #13506**: Add session-scoped connection support
## Error Details
```
Traceback (most recent call last):
File ".../streamlit/runtime/scriptrunner/exec_code.py", line 129, in exec_func_with_error_handling
result = func()
...
File ".../snowflake/snowpark/_internal/server_connection.py", line 205, in _cursor
self._thread_store.cursor = self._conn.cursor()
File ".../snowflake/connector/connection.py", line 1270, in cursor
Error.errorhandler_wrapper(...)
snowflake.connector.errors.DatabaseError: 250002 (08003): Connection is closed
```
## Root Cause Analysis
### Background
The PRs mentioned added important functionality:
1. **PR #13439** added the `on_release` callback to `st.cache_resource`,
which is called when cache entries are evicted
2. **PR #13506** modified `connection_factory.py` to use this
`on_release` callback to call `connection.close()` when a connection is
evicted from the cache
This is the relevant code in `connection_factory.py`:
```python
def on_release_wrapped(connection: ConnectionClass) -> None:
connection.close()
__create_connection = cache_resource(
max_entries=max_entries,
show_spinner="Running `st.connection(...)`.",
ttl=ttl,
scope=scope,
on_release=on_release_wrapped, # Calls close() when evicted
)(__create_connection)
```
### The Bug
In `BaseSnowflakeConnection.close()`, after calling
`self._raw_instance.close()`, the `_raw_instance` attribute was **NOT**
reset to `None`:
```python
def close(self) -> None:
"""Closes the underlying Snowflake connection."""
if self._raw_instance is not None:
self._raw_instance.close()
# BUG: _raw_instance was NOT set to None!
```
This caused the following issue:
1. When `close()` was called (e.g., via `on_release` when a cache entry
is evicted), the underlying connection was closed
2. However, `_raw_instance` still referenced the **closed** connection
object
3. The `_instance` property checks `if self._raw_instance is None` to
decide whether to create a new connection:
```python
@Property
def _instance(self) -> RawConnectionT:
if self._raw_instance is None:
self._raw_instance = self._connect(**self._kwargs)
return self._raw_instance
```
4. Since `_raw_instance` wasn't `None`, subsequent access to `_instance`
returned the **CLOSED** connection
5. Any operations on the closed connection failed with "Connection is
closed"
### When This Bug Manifests
The `on_release` callback (which calls `close()`) is triggered when:
- Cache entries expire due to TTL
- Cache is full and oldest entries are evicted (`max_entries`)
- `st.cache_resource.clear()` is called
- For session-scoped caches: when a session disconnects
For global-scoped connections like `st.connection("snowflake")`, this
typically only happens if:
- `st.cache_resource.clear()` is called explicitly
- TTL is set and expires
- `max_entries` is set and exceeded
### Additional Consideration: Snowpark Sessions
When users call `conn.session()`, they get a Snowpark Session that
internally references `self._instance`. If the underlying connection is
closed:
```python
def session(self) -> Session:
if running_in_sis():
return get_active_session()
return Session.builder.configs({"connection": self._instance}).create()
```
Any Snowpark Sessions created from the connection will also fail because
they hold a reference to the now-closed underlying connection object.
## Fix
The fix is simple: reset `_raw_instance` to `None` after closing the
connection:
```python
def close(self) -> None:
"""Closes the underlying Snowflake connection."""
if self._raw_instance is not None:
self._raw_instance.close()
self._raw_instance = None # Added this line
```
This ensures that after `close()` is called, the next access to
`_instance` will create a new connection instead of returning the closed
one.
## Files Changed
1. **`lib/streamlit/connections/snowflake_connection.py`**
- Fixed `close()` method to reset `_raw_instance = None` after closing
2. **`lib/tests/streamlit/connections/snowflake_connection_test.py`**
- Added `TestSnowflakeConnectionClose` test class with:
- `test_close_resets_raw_instance`: Verifies that `close()` closes the
connection AND resets `_raw_instance`
- `test_close_is_noop_when_not_connected`: Verifies that `close()`
doesn't fail when `_raw_instance` is already `None`
## Testing
```bash
PYTHONPATH=lib pytest lib/tests/streamlit/connections/snowflake_connection_test.py::TestSnowflakeConnectionClose -v
```
Output:
```
lib/tests/streamlit/connections/snowflake_connection_test.py::TestSnowflakeConnectionClose::test_close_resets_raw_instance PASSED
lib/tests/streamlit/connections/snowflake_connection_test.py::TestSnowflakeConnectionClose::test_close_is_noop_when_not_connected PASSED
```
## Recommendations for Users
Until this fix is released, users experiencing this issue can:
1. **Avoid storing Snowpark Sessions long-term**: Instead of caching
Snowpark Sessions, create them fresh when needed
2. **Check if using `st.cache_resource.clear()`**: If calling this
anywhere in the app, it will close all cached connections
3. **Consider connection TTL settings**: If TTL is set on the
connection, it may expire and close
## Impact
- **Affected**: Users of `st.connection("snowflake")` and
`st.connection("snowflake-callers-rights")` who experience cache
eviction scenarios
- **Severity**: Medium - The bug causes operations to fail with a
confusing error message, but the workaround (restarting the app or
avoiding cache clears) is available
- **Scope**: Only affects `SnowflakeConnection` and its subclasses;
other connection types (`SQLConnection`, `SnowparkConnection`) inherit
the no-op `close()` from `BaseConnection` and are not affected
</details>
## GitHub Issue Link (if applicable)
## Testing Plan
- Added unit test.
---
**Contribution License Agreement**
By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
## Describe your changes
Fixes an issue with Snowflake connections not getting re-initialized
after having been closed.
<details>
<summary>Claude issue analysis</summary>
# Issue Report: Snowflake Connection "Connection is closed" Error
## Summary
After recent PRs adding `on_release` to `st.cache_resource` and
session-scoped connection support, users may encounter a
`snowflake.connector.errors.DatabaseError: 250002 (08003): Connection is
closed` error when using `st.connection("snowflake")` with cached data
queries.
## Related PRs
- **PR #13439**: Add `on_release` to `st.cache_resource`
- **PR #13482**: Add session scoping to caches
- **PR #13538**: Add `SnowflakeCallersRightsConnection`
- **PR #13506**: Add session-scoped connection support
## Error Details
```
Traceback (most recent call last):
File ".../streamlit/runtime/scriptrunner/exec_code.py", line 129, in exec_func_with_error_handling
result = func()
...
File ".../snowflake/snowpark/_internal/server_connection.py", line 205, in _cursor
self._thread_store.cursor = self._conn.cursor()
File ".../snowflake/connector/connection.py", line 1270, in cursor
Error.errorhandler_wrapper(...)
snowflake.connector.errors.DatabaseError: 250002 (08003): Connection is closed
```
## Root Cause Analysis
### Background
The PRs mentioned added important functionality:
1. **PR #13439** added the `on_release` callback to `st.cache_resource`,
which is called when cache entries are evicted
2. **PR #13506** modified `connection_factory.py` to use this
`on_release` callback to call `connection.close()` when a connection is
evicted from the cache
This is the relevant code in `connection_factory.py`:
```python
def on_release_wrapped(connection: ConnectionClass) -> None:
connection.close()
__create_connection = cache_resource(
max_entries=max_entries,
show_spinner="Running `st.connection(...)`.",
ttl=ttl,
scope=scope,
on_release=on_release_wrapped, # Calls close() when evicted
)(__create_connection)
```
### The Bug
In `BaseSnowflakeConnection.close()`, after calling
`self._raw_instance.close()`, the `_raw_instance` attribute was **NOT**
reset to `None`:
```python
def close(self) -> None:
"""Closes the underlying Snowflake connection."""
if self._raw_instance is not None:
self._raw_instance.close()
# BUG: _raw_instance was NOT set to None!
```
This caused the following issue:
1. When `close()` was called (e.g., via `on_release` when a cache entry
is evicted), the underlying connection was closed
2. However, `_raw_instance` still referenced the **closed** connection
object
3. The `_instance` property checks `if self._raw_instance is None` to
decide whether to create a new connection:
```python
@Property
def _instance(self) -> RawConnectionT:
if self._raw_instance is None:
self._raw_instance = self._connect(**self._kwargs)
return self._raw_instance
```
4. Since `_raw_instance` wasn't `None`, subsequent access to `_instance`
returned the **CLOSED** connection
5. Any operations on the closed connection failed with "Connection is
closed"
### When This Bug Manifests
The `on_release` callback (which calls `close()`) is triggered when:
- Cache entries expire due to TTL
- Cache is full and oldest entries are evicted (`max_entries`)
- `st.cache_resource.clear()` is called
- For session-scoped caches: when a session disconnects
For global-scoped connections like `st.connection("snowflake")`, this
typically only happens if:
- `st.cache_resource.clear()` is called explicitly
- TTL is set and expires
- `max_entries` is set and exceeded
### Additional Consideration: Snowpark Sessions
When users call `conn.session()`, they get a Snowpark Session that
internally references `self._instance`. If the underlying connection is
closed:
```python
def session(self) -> Session:
if running_in_sis():
return get_active_session()
return Session.builder.configs({"connection": self._instance}).create()
```
Any Snowpark Sessions created from the connection will also fail because
they hold a reference to the now-closed underlying connection object.
## Fix
The fix is simple: reset `_raw_instance` to `None` after closing the
connection:
```python
def close(self) -> None:
"""Closes the underlying Snowflake connection."""
if self._raw_instance is not None:
self._raw_instance.close()
self._raw_instance = None # Added this line
```
This ensures that after `close()` is called, the next access to
`_instance` will create a new connection instead of returning the closed
one.
## Files Changed
1. **`lib/streamlit/connections/snowflake_connection.py`**
- Fixed `close()` method to reset `_raw_instance = None` after closing
2. **`lib/tests/streamlit/connections/snowflake_connection_test.py`**
- Added `TestSnowflakeConnectionClose` test class with:
- `test_close_resets_raw_instance`: Verifies that `close()` closes the
connection AND resets `_raw_instance`
- `test_close_is_noop_when_not_connected`: Verifies that `close()`
doesn't fail when `_raw_instance` is already `None`
## Testing
```bash
PYTHONPATH=lib pytest lib/tests/streamlit/connections/snowflake_connection_test.py::TestSnowflakeConnectionClose -v
```
Output:
```
lib/tests/streamlit/connections/snowflake_connection_test.py::TestSnowflakeConnectionClose::test_close_resets_raw_instance PASSED
lib/tests/streamlit/connections/snowflake_connection_test.py::TestSnowflakeConnectionClose::test_close_is_noop_when_not_connected PASSED
```
## Recommendations for Users
Until this fix is released, users experiencing this issue can:
1. **Avoid storing Snowpark Sessions long-term**: Instead of caching
Snowpark Sessions, create them fresh when needed
2. **Check if using `st.cache_resource.clear()`**: If calling this
anywhere in the app, it will close all cached connections
3. **Consider connection TTL settings**: If TTL is set on the
connection, it may expire and close
## Impact
- **Affected**: Users of `st.connection("snowflake")` and
`st.connection("snowflake-callers-rights")` who experience cache
eviction scenarios
- **Severity**: Medium - The bug causes operations to fail with a
confusing error message, but the workaround (restarting the app or
avoiding cache clears) is available
- **Scope**: Only affects `SnowflakeConnection` and its subclasses;
other connection types (`SQLConnection`, `SnowparkConnection`) inherit
the no-op `close()` from `BaseConnection` and are not affected
</details>
## GitHub Issue Link (if applicable)
## Testing Plan
- Added unit test.
---
**Contribution License Agreement**
By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
Describe your changes
Add session-level scoping to
st.cache_dataandst.cache_resource.Add the "scope" parameter through the caching stack.
Create session-scoped caches as appropriate.
Add methods to clear caches by session ID. Call those methods when sessions are disconnected.
Note: This clears caches on disconnect and shutdown. In the current websocket session manager, sessions only appear to be shut down when the backend process terminates - and so disconnection is the only hook that's actually invoked in the typical session lifecycle. I'm not sure if this is a bug or a design choice. Either way, like the docs note, this means that session caches might populate multiple times for a single user session in some edge-cases. I think this is fine.
GitHub Issue Link (if applicable)
on_releasehook for a session-scoped resource can be used for disconnect hooks.Can be used to implement Session State convenience function for initialization #10089. A session-scopedReading this more closely, the issue is for a helper function to make this easy, not for means to init at session start.@st.cache_dataor@st.cache_resourcefunction invoked at the start of a script will work as an init function.Testing Plan
See unit tests.
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.