Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix bug in stale session cleanup#10415

Open
Hugo-Inmanta wants to merge 10 commits into
iso8from
issue/10351-stale-session-bugfix-iso8
Open

Fix bug in stale session cleanup#10415
Hugo-Inmanta wants to merge 10 commits into
iso8from
issue/10351-stale-session-bugfix-iso8

Conversation

@Hugo-Inmanta

@Hugo-Inmanta Hugo-Inmanta commented May 27, 2026

Copy link
Copy Markdown
Contributor

Description

Iso8 + 9 only since the websocket refactor on iso10 removed most of the session management logic.

closes #10351

Self Check:

Strike through any lines that are not applicable (~~line~~) then check the box

  • Attached issue to pull request
  • Changelog entry
  • Type annotations are present
  • Code is clear and sufficiently documented
  • No (preventable) type errors (check using make mypy or make mypy-diff)
  • Sufficient test cases (reproduces the bug/tests the requested feature)
  • Correct, in line with design
  • End user documentation is included or an issue is created for end-user documentation (add ref to issue here: )
  • If this PR fixes a race condition in the test suite, also push the fix to the relevant stable branche(s) (see test-fixes for more info)

@Hugo-Inmanta Hugo-Inmanta changed the title wip Fix bug in stale session cleanup May 28, 2026
key = (session.tid, endpoint)
if key not in self.tid_endpoint_to_session and agent_statuses[endpoint] != AgentStatus.paused:
existing = self.tid_endpoint_to_session.get(key)
if (existing is None or existing.id not in self.sessions) and agent_statuses[endpoint] != AgentStatus.paused:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be up instead? Like this it will trigger for downed agents.

Suggested change
if (existing is None or existing.id not in self.sessions) and agent_statuses[endpoint] != AgentStatus.paused:
if (existing is None or existing.id not in self.sessions) and agent_statuses[endpoint] == AgentStatus.up:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is intended. That way downed agents (i.e. those who don't have a primary) can recover and use this new primary.
(cc @sanderr correct ?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. This method does a failover, so we can failover to any agent (whether it's a primary or not) as long as the agent is not paused.

@sanderr sanderr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnaudsjs I'm shifting this to you because I'm trying to catch up on other review work, and I believe you know this code better than I do in any case. If I'm wrong, feel free to pass it back.

issue-nr: 10351
change-type: patch
destination-branches:
- iso8

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to add iso9 as well

@sanderr sanderr requested a review from arnaudsjs June 3, 2026 14:25
Comment thread src/inmanta/server/agentmanager.py Outdated
# can be correctly promoted to primary.
for endpoint_name in endpoint_names_snapshot:
key = (session.tid, endpoint_name)
if key in self.tid_endpoint_to_session and self.tid_endpoint_to_session[key].id == session.id:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first part of this method uses get_id() to get the id of a session. The new part uses the id property. It would improve readability if we use one style within the same method.

Suggested change
if key in self.tid_endpoint_to_session and self.tid_endpoint_to_session[key].id == session.id:
if key in self.tid_endpoint_to_session and self.tid_endpoint_to_session[key].get_id() == sid:

Comment thread tests/test_agent_manager.py Outdated


@pytest.mark.parametrize("auto_start_agent", [False]) # prevent autostart to keep agent under control
async def test_session_expiration(server, environment, async_finalizer, caplog):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is missing the part that verifies what happens if the database comes back online. Can the agent reconnect without any issue.

key = (session.tid, endpoint)
if key not in self.tid_endpoint_to_session and agent_statuses[endpoint] != AgentStatus.paused:
existing = self.tid_endpoint_to_session.get(key)
if (existing is None or existing.id not in self.sessions) and agent_statuses[endpoint] != AgentStatus.paused:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. This method does a failover, so we can failover to any agent (whether it's a primary or not) as long as the agent is not paused.

@Hugo-Inmanta

Hugo-Inmanta commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #10427

@Hugo-Inmanta Hugo-Inmanta reopened this Jun 4, 2026
@Hugo-Inmanta Hugo-Inmanta added the merge-tool-ready This ticket is ready to be merged in label Jun 8, 2026
@inmantaci

Copy link
Copy Markdown
Contributor

Processing this pull request

@inmantaci inmantaci removed the merge-tool-ready This ticket is ready to be merged in label Jun 8, 2026
@inmantaci

Copy link
Copy Markdown
Contributor

Failed to merge changes into iso9 due to merge conflict. Please open a pull request for these branches separately by cherry-picking the commit that was made on the branch iso8 (git cherry-pick 6f3b48c).

@inmantaci

Copy link
Copy Markdown
Contributor

Merged into branches iso8 in 6f3b48c

inmantaci pushed a commit that referenced this pull request Jun 8, 2026
…y towards the database could sometimes cause the scheduler to miss new versions notifications.

 (Issue #10351, PR #10415)

# Description

Iso8 + 9 only since the websocket refactor on iso10 removed most of the session management logic.

closes #10351

# Self Check:

Strike through any lines that are not applicable (`~~line~~`) then check the box

- [ ] Attached issue to pull request
- [ ] Changelog entry
- [ ] Type annotations are present
- [ ] Code is clear and sufficiently documented
- [ ] No (preventable) type errors (check using make mypy or make mypy-diff)
- [ ] Sufficient test cases (reproduces the bug/tests the requested feature)
- [ ] Correct, in line with design
- [ ] End user documentation is included or an issue is created for end-user documentation (add ref to issue here: )
- [ ] If this PR fixes a race condition in the test suite, also push the fix to the relevant stable branche(s) (see [test-fixes](https://internal.inmanta.com/development/core/tasks/build-master.html#test-fixes) for more info)
@inmantaci

Copy link
Copy Markdown
Contributor

Not closing this pull request due to previously commented issues for some of the destination branches. Please open a separate pull request for those branches by cherry-picking the relevant commit. You can safely close this pull request and delete the source branch.

@inmantaci

Copy link
Copy Markdown
Contributor

This branch was not deleted as it seems to still be in use.

inmantaci pushed a commit that referenced this pull request Jun 8, 2026
…y towards the database could sometimes cause the scheduler to miss new versions notifications.

 (Issue #10351, PR #10448)

# Description

iso9 PR for #10415

Only diff is in the `test_session_expiration` test: the methods to break and re-enable connectivity with the db are `data.connect_pool` and `data.disconnect_pool` (vs iso8 `data.connect` and `data.disconnect`)

# Self Check:

Strike through any lines that are not applicable (`~~line~~`) then check the box

- [ ] Attached issue to pull request
- [ ] Changelog entry
- [ ] Type annotations are present
- [ ] Code is clear and sufficiently documented
- [ ] No (preventable) type errors (check using make mypy or make mypy-diff)
- [ ] Sufficient test cases (reproduces the bug/tests the requested feature)
- [ ] Correct, in line with design
- [ ] End user documentation is included or an issue is created for end-user documentation (add ref to issue here: )
- [ ] If this PR fixes a race condition in the test suite, also push the fix to the relevant stable branche(s) (see [test-fixes](https://internal.inmanta.com/development/core/tasks/build-master.html#test-fixes) for more info)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants