Fix bug in stale session cleanup#10415
Conversation
| key = (session.tid, endpoint) | ||
| if key not in self.tid_endpoint_to_session and agent_statuses[endpoint] != AgentStatus.paused: | ||
| existing = self.tid_endpoint_to_session.get(key) | ||
| if (existing is None or existing.id not in self.sessions) and agent_statuses[endpoint] != AgentStatus.paused: |
There was a problem hiding this comment.
Should this be up instead? Like this it will trigger for downed agents.
| if (existing is None or existing.id not in self.sessions) and agent_statuses[endpoint] != AgentStatus.paused: | |
| if (existing is None or existing.id not in self.sessions) and agent_statuses[endpoint] == AgentStatus.up: |
There was a problem hiding this comment.
I think this is intended. That way downed agents (i.e. those who don't have a primary) can recover and use this new primary.
(cc @sanderr correct ?)
There was a problem hiding this comment.
That's correct. This method does a failover, so we can failover to any agent (whether it's a primary or not) as long as the agent is not paused.
sanderr
left a comment
There was a problem hiding this comment.
@arnaudsjs I'm shifting this to you because I'm trying to catch up on other review work, and I believe you know this code better than I do in any case. If I'm wrong, feel free to pass it back.
| issue-nr: 10351 | ||
| change-type: patch | ||
| destination-branches: | ||
| - iso8 |
There was a problem hiding this comment.
Reminder to add iso9 as well
| # can be correctly promoted to primary. | ||
| for endpoint_name in endpoint_names_snapshot: | ||
| key = (session.tid, endpoint_name) | ||
| if key in self.tid_endpoint_to_session and self.tid_endpoint_to_session[key].id == session.id: |
There was a problem hiding this comment.
The first part of this method uses get_id() to get the id of a session. The new part uses the id property. It would improve readability if we use one style within the same method.
| if key in self.tid_endpoint_to_session and self.tid_endpoint_to_session[key].id == session.id: | |
| if key in self.tid_endpoint_to_session and self.tid_endpoint_to_session[key].get_id() == sid: |
|
|
||
|
|
||
| @pytest.mark.parametrize("auto_start_agent", [False]) # prevent autostart to keep agent under control | ||
| async def test_session_expiration(server, environment, async_finalizer, caplog): |
There was a problem hiding this comment.
This test case is missing the part that verifies what happens if the database comes back online. Can the agent reconnect without any issue.
| key = (session.tid, endpoint) | ||
| if key not in self.tid_endpoint_to_session and agent_statuses[endpoint] != AgentStatus.paused: | ||
| existing = self.tid_endpoint_to_session.get(key) | ||
| if (existing is None or existing.id not in self.sessions) and agent_statuses[endpoint] != AgentStatus.paused: |
There was a problem hiding this comment.
That's correct. This method does a failover, so we can failover to any agent (whether it's a primary or not) as long as the agent is not paused.
|
|
|
Processing this pull request |
|
Failed to merge changes into iso9 due to merge conflict. Please open a pull request for these branches separately by cherry-picking the commit that was made on the branch iso8 (git cherry-pick 6f3b48c). |
|
Merged into branches iso8 in 6f3b48c |
…y towards the database could sometimes cause the scheduler to miss new versions notifications. (Issue #10351, PR #10415) # Description Iso8 + 9 only since the websocket refactor on iso10 removed most of the session management logic. closes #10351 # Self Check: Strike through any lines that are not applicable (`~~line~~`) then check the box - [ ] Attached issue to pull request - [ ] Changelog entry - [ ] Type annotations are present - [ ] Code is clear and sufficiently documented - [ ] No (preventable) type errors (check using make mypy or make mypy-diff) - [ ] Sufficient test cases (reproduces the bug/tests the requested feature) - [ ] Correct, in line with design - [ ] End user documentation is included or an issue is created for end-user documentation (add ref to issue here: ) - [ ] If this PR fixes a race condition in the test suite, also push the fix to the relevant stable branche(s) (see [test-fixes](https://internal.inmanta.com/development/core/tasks/build-master.html#test-fixes) for more info)
|
Not closing this pull request due to previously commented issues for some of the destination branches. Please open a separate pull request for those branches by cherry-picking the relevant commit. You can safely close this pull request and delete the source branch. |
|
This branch was not deleted as it seems to still be in use. |
…y towards the database could sometimes cause the scheduler to miss new versions notifications. (Issue #10351, PR #10448) # Description iso9 PR for #10415 Only diff is in the `test_session_expiration` test: the methods to break and re-enable connectivity with the db are `data.connect_pool` and `data.disconnect_pool` (vs iso8 `data.connect` and `data.disconnect`) # Self Check: Strike through any lines that are not applicable (`~~line~~`) then check the box - [ ] Attached issue to pull request - [ ] Changelog entry - [ ] Type annotations are present - [ ] Code is clear and sufficiently documented - [ ] No (preventable) type errors (check using make mypy or make mypy-diff) - [ ] Sufficient test cases (reproduces the bug/tests the requested feature) - [ ] Correct, in line with design - [ ] End user documentation is included or an issue is created for end-user documentation (add ref to issue here: ) - [ ] If this PR fixes a race condition in the test suite, also push the fix to the relevant stable branche(s) (see [test-fixes](https://internal.inmanta.com/development/core/tasks/build-master.html#test-fixes) for more info)
Description
Iso8 + 9 only since the websocket refactor on iso10 removed most of the session management logic.
closes #10351
Self Check:
Strike through any lines that are not applicable (
~~line~~) then check the box