-
Notifications
You must be signed in to change notification settings - Fork 7.9k
39429/OfflineSessionPersistenceTest testPersistenceMultipleNodesClientSessionsAtRandomNode #40459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ahus1 we changed the defaults when we enabled persistent user sessions features here: af53af1#diff-6003597969d1c5eba2b531cb775d0fa212a6e1ca68a9dd5ae4a2ae79cdf7cf22 Based on Ryan's findings, it seems an unsafe configuration. If a user is using volatile sessions, the user will end up with caches with num-owner=1 and incur data loss. Instead of creating a new XML for the test, we would need to change the default back to 2. This also affects KC 26.2. |
|
Thank you for the analysis. I didn't realize back then that one owner would be problematic during rebalancing. When we implemented it, we added to https://www.keycloak.org/server/caching the following statement:
This test is obviously not doing it, and we also make it very complicated for users to do the right thing. We already have a check that prints a warning here: "Number of owners is one for cache %s, and no persistence is configured." Lines 226 to 232 in eafe08a
Instead having the manual instructions, and the extra XML file, I suggest to automatically update one owner to be two owners, and reduce the documentation by that manual procedure. Let me know your thoughts. If you think an additional discussion is needed, please schedule a meeting for next week, for example Monday. |
|
Another thought: Writing to the cache and reading back is a strange implementation anyway. Still I hesitate to change this in the old code. Let's talk next week if this should be touched as well. |
Me neither. The only scenario I have in mind is when the originator has an old topology and requests the read to a node that is no longer the owner. It is worth investigating; I'll try to check.
I'm ok with it, but I bet there will be a user, who prioritizes speed over consistency, complaining they can no longer user owner=1
The |
I discussed this with @jabolina. The issue in this case is that even though
+1 to making the configuration always set num_owners=2 with volatile sessions. The user may prioritize speed, but it's at the expense of correctness and they maybe don't realise the implications of this configuration. I think we should cater for the majority of users here and see if "power" users have issues with this before reconsidering allowing num_owners=1 with volatile sessions. |
|
I have pushed a commit to make it so that when volatile sessions are configured, we always configure at least It's important that we distinguish between a shared and non-shared store, as a non-shared store will not prevent data loss even with a StatefulSet as we're not using global state and so restarted nodes will appear as new cluster members potential causing segments to be remapped on rebalance. |
757f168 to
d1a1084
Compare
I'm surprised by this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unreported flaky test detected, please review
Unreported flaky test detectedIf the flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR. org.keycloak.testsuite.cluster.ClientScopeInvalidationClusterTest#crudWithFailoverorg.keycloak.testsuite.cluster.RealmInvalidationClusterTest#crudWithFailover |
…disabled Closes keycloak#39429 Closes keycloak#40472 Signed-off-by: Ryan Emerson <[email protected]>
d1a1084 to
ab48883
Compare
|
I have created #40472 to track this work as an enhancement and I have updated the commits to reflect this. IMO we should just apply this change from 26.3 onwards as the correct configuration required was documented previously. |
|
I've created a small gist [1] you can run with JBang and try the different configurations. You should be able to run with By default, it will create a cluster with 3 nodes with numOwners=1. It will remove some nodes and verify whether the entries are still present. You can check the gist for the other CLI options (global state, num entries, how many to remove, etc). [1] https://gist.github.com/jabolina/3b1222441f40a3f2fd6ed5fdc1fbd9a3 |
|
I've approved this change. I as I wasn't involved in this discussions last week, I want to give @pruivo the opportunity to comment on this one as well. I'm all in to merge this for 26.3. We can have a separate discussion on this is merged if this should be part of 26.2 as well. |
Closes #39429
Closes #40472
Test failures introduced by #39126
Previously
model/infinispan/src/main/java/org/keycloak/connections/infinispan/DefaultInfinispanConnectionProviderFactory.javaensured that the caches were created with 2 owners ifsessionsOwnerswas not explicitly configured. However, since those changes thetest-ispn.xmlconfiguration was loaded which utilises num_owners=1 for theofflineSessionscache.When looking up a offlineSession not present in the cache, the
InfinispanUserSessionProvidercallsgetUserSessionEntityFromPersistenceProviderwhich:importUserSession2a. Write to the cache via
session.sessions().importUserSessions2b. Retrieve the
UserSessionEntityby reading from the cache and returnThe test has become flaky as 2b is a cache miss if a rebalance occurs between 2a and 2b.
Sample log entries:
The solution is to ensure that
num_owners=2is always defined forvolatilesessiontests like we recommend for users in their configurations.I have executed the test 100 times consecutively without failure locally, without the fix it would consistently fail before the 20th iteration.