-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Fix negative expire values for refresh tokens and expire client session when user session #17525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mposolda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmartinc Thanks! In general, looks good.
I've added some very minor nitpick inline. Also I've added one question/concern inline, but just to receive some opinions on it (not really proposing we should refactoring this, but just thinking...)
@hmlnarik @mkanis I would like to also add store team to review this one. Also due the concern that this fix affects old store, and I am not sure if something similar is applicable for the new store as well (and should be added in this PR also to the new store?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: My personal preference is to name these variables clientSessionTimeToExpire and userSessionTimeToExpire. WDYT? This is just a nitpick and maybe it is just me... On the other hand, this code around timeouts is already non that trivial and anything, which would make it a bit easier to understand is good.
I am not hard on this if you and others have different opinion and not a blocker for this PR IMO. Just a though :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I have changed to userSessionTimeToExpire and clientSessionTimeToExpiretoo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a little comment here to understand better the new calculation. It fixes two different issues:
- The max life was using the timestamp and not the STARTED_AT of the client session, which is directly wrong. We were calculating wrong max life (it was more an idle timeout than a max life).
- Besides there is a second issue (non functional, just a memory issue) that client sessions can remain in memory more than the the user session. That's a waste of memory. Imagine a client session which is created just few seconds before than the user session expires, the user session will be removed but the client one (useless) remains in memory all the max-life of the client session. For this second problem the USER_SESSION_STARTED_AT_NOTE is needed.
Regards!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above (but again, a nitpick...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking loud, that if this method also has UserSessionModel (or entity) as an argument, it will allow to:
- avoid introducing
USER_SESSION_STARTED_AT_NOTE(which is another thing saved in the infinispan cache) - allow more proper computation of
sessionMaxLifespanwith something likeint sessionMaxLifespan = userSession.isRememberMe() ? Math.max(realm.getSsoSessionMaxLifespan(), realm.getSsoSessionMaxLifespanRememberMe()) : realm.getSsoSessionMaxLifespan();
On the other hand, it seems that having userSession available here may be fair amount of refactoring (and still not 100% sure if it is doable...). Also problem with not-100% accurate sessionMaxLifespan was even before this PR... So rather not worth an effort, but we're keeping as is now? Was just thinking if someone feels we should try to go this way, but I am myself rather not very keen on that...;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! I also thought about getting the user session somehow. But I detected a big change (probably we would need the user session id in the authentication session) and I decided to follow contributors idea which seems to be easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, agree. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks @rmartinc for taking both changes in the fix. However, after I submitted my first PR I realized that to properly calculating the the lifespan we need to know if the user session has 'remember me' on or not, so apart from USER_SESSION_STARTED_AT_NOTE I added also note if it's remember me.
I got it @mposolda it's yet another thing to keep but in our specific case with remember me set to 180 days compared to couple hours without it, memory increase with these attribute is nothing with amount of sessions that are deleted thanks to proper calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmartinc @pjbober Will it be possible to have "remember me" as another note on AuthenticatedClientSessionModel? The storing userSessionId instead on AuthenticatedClientSessionModel (instead of "Started at" and "remember me") works as well, but if it requires a big refactoring or additional lookup of userSession from infinispan, then my preference would be to rather stick with "Started at" and "remember me" notes.
3bc0faf to
8f9d327
Compare
hmlnarik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rmartinc .
Please see the comments inline. Please avoid modification of the entities and introducing new direct references from services module to legacy store modules (infinispan in this case). These changes tightly couple services with legacy store which complicates the aim to decouple the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be in AuthenticatedClientSessionModel, not AuthenticatedClientSessionEntity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! It's done now! I have also created the method getUserSessionStarted more or less the same to getStarted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effectively means four very slow calls to update realm parameters upon setting and another four upon resetting in the finally block. Please consider replacling all these calls in all methods affected in this PR with RealmAttributeUpdater, and removing the changes in RealmManager.
For an example usage, see
Lines 808 to 812 in f5ebe67
| try (AutoCloseable c = new RealmAttributeUpdater(adminClient.realm("test")) | |
| .updateWith(r -> { | |
| r.setSsoSessionMaxLifespan(5); | |
| }) | |
| .update()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! Good point! Done in the added methods!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods belong to either AuthenticatedClientSessionModel (not AuthenticatedClientSessionEntity) or into an utility method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to maintain both in the infinispan project. This class is not related at all to AuthenticatedClientSessionModel so I cannot reuse the methods from there. So creating them here is more or less the same than creating a utility class in the infinispan section.
Let me know if you have a better idea.
martin-kanis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmartinc Thanks for the PR. I suspect these problems might be present in the new store as well. Currently, OfflineTokenTest and RefreshTokenTest classes are not run with the new store in GHA. When I tried to enable them in GHA, your tests failed with 500 because the tests rely on legacy store specifics (i.e testingClient.testing().cache(InfinispanConnectionProvider.OFFLINE_USER_SESSION_CACHE_NAME).contains(sessionId));)
Would it be possible to use just methods available in UserSessionProvider, UserSessionModel and AuthenticatedClientSessionModel?
PS: Some failures with the new storage in existing tests might not be cased by this PR. Michal is working on it #17572
Not completely, the the user-session and the client-session are separated in infinispan. It's possible that the user-session disappears (by max-life for example) but the client-session remains. But via API retrieving a client session without the user-session is not possible (because it just makes sense in infinispan). So I cannot assert that the client session is fully removed. At least I have to maintain the last part here. I can change the other checks but not this one (I need access to the real infinispan API for that). I can also not assert this part but it was exactly one of the error I was trying to fix in the PR. |
8f9d327 to
94ec7a3
Compare
|
@martin-kanis I have replaced all the uses that can be replaced. And for the one that cannot be substituted I did this trick. Let me know if it works for you. (I really want to test that the session is really gone from the infinispan cache.) |
For these cases please use model tests with appropriate |
Thank you. I run again this PR with new store and there are these failures: OfflineTokenTest.refreshTokenUserClientMaxLifespanSmallerThanSession:1036 expected:<1> but was:<0> It would be good to have the other tests you added running fully with the new store. Either by transferring them to model tests as Hynek suggested, or at least checking if |
|
@martin-kanis and all, I have been doing some tests and created a model test class (this one in a test branch). I can say that there are differences between infinispan and the new map for session lifespans. The hot-rod fails in general (don't know why). So I think that first we should agree how this should work. There are two main differences:
Let's wait for @mposolda too, but I think that we have to decide how to handle those two points and do the same for both impls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to move it to some "util" method instead of having it as a method on AbstractKeycloakTest? Maybe like StorageUtil class can be introduced for this (if there is not something like this already?)
AFAIK some tests still might have issues with "testing client"...
|
Are you planning to release this fix on 22.0.0 ? |
94ec7a3 to
6333873
Compare
|
@mposolda @hmlnarik @martin-kanis The previous week I started to work again on this. I think that everything is in place and working. The change is quite big now, as I moved timeout calculations to a shared utility class and the two implementations perform the same timeouts:
In general we can decide something different in any point. I just decided what it's more logical IMHO and it was less risky. The PR has been rebased. Added different tests in different places. And there are some minor tests fixes because now calculations are different for both impls (this affected two previous tests). |
|
Regarding @mposolda comments, I think that the configuration is quite convoluted in general. But I'm just trying to fix the current issues and execute the same code for all the implementations. After that we can improve it more, but, for the moment, I just want to use the same methods. I tried to do the minimal changes and do something that makes sense. I agree that if this gets merged, we probably should modify documentation and tooltips again. We are changing how this works, not much, but we are changing it. |
mposolda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am approving as from my point of view, the changes look good! I consider that we have smaller follow-up tasks, which we can/should do as a follow-up PRs (documentation, tooltips, new admin console polishing, possible TokenManager refactoring to leverage methods from SessionExpiration).
I didn't much review parts related to new store and model tests, as leaving this to "store" team.
ghost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unreported flaky test detected, please review
Unreported flaky test detectedIf the below flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR. org.keycloak.testsuite.model.session.SessionTimeoutsTest#testOfflineUserClientIdleTimeoutSmallerThanSessionNoRefreshKeycloak CI - Store Model Tests |
|
@mposolda @martin-kanis OK, I can confirm that infinispan is returning a session that is expired in the cross-dc configuration. I have added this lines in a TEST branch. +++ b/model/infinispan/src/main/java/org/keycloak/models/sessions/infinispan/InfinispanUserSessionProvider.java
@@ -402,6 +402,10 @@ public class InfinispanUserSessionProvider implements UserSessionProvider {
AuthenticatedClientSessionEntity clientSessionEntityFromCache = getClientSessionEntity(UUID.fromString(clientSessionId), offline);
if (clientSessionEntityFromCache != null) {
+ if (offline && SessionTimeouts.getOfflineClientSessionMaxIdleMs(userSession.getRealm(), client, clientSessionEntityFromCache) == SessionTimeouts.ENTRY_EXPIRED_FLAG) {
+ System.err.println("RIIIICKY: returning expired session " + clientSessionEntityFromCache.getId()
+ + " timestamp=" + clientSessionEntityFromCache.getTimestamp() + " started=" + clientSessionEntityFromCache.getStarted());
+ }
return wrap(userSession, client, clientSessionEntityFromCache, offline);
}It failed in the 4th run: So, I don't think we can do much here. Except rechecking expiration, always, only if cross-dc,... Note that returning a expired session can cause negative times in tokens. I think that the flaky issue #16511 is the same cause. Correction: This is just idle, not expiration. So maybe it's no so important. I'm going to check if I can improve the test to test N times it's expired by idle. |
|
@rmartinc The updated version in now green after 10 runs https://github.com/martin-kanis/keycloak/actions/runs/5130604990/jobs/9229608348 |
😍 Great! But if you see |
|
OK, I have tested different things with the CI (as I cannot reproduce the issue in my laptop) with the profile
So, it's not just idle, it can happen in max lifespan too. It seems to be more probable with idle, but it also happens with max lifespan. So I'm not doing anything more, I will wait for @martin-kanis comments. As commented the only thing I see to improve this at keycloak side is always checking expiration before returning the sessions (or at least in some scenarios like cross-dc). NOTE: In my laptop I have executed Cheers! |
| return importUserSession(realm, offline, persistentUserSession); | ||
| UserSessionEntity sessionEntity = importUserSession(realm, offline, persistentUserSession); | ||
| if (sessionEntity == null) { | ||
| persister.removeUserSession(sessionId, offline); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why we need to delete the offline session if importUserSession didn't find in the DB? Is it because we want to remove associated offline client sessions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point we now the session is in the offline store but is expired. So it's removed now. If not it will be removed by some scheduled task later. I think that I needed this for the tests (because the sessions created by the new test remained in the store and other tests related with the persister counted them).
|
|
||
| public int getUserSessionStarted() { | ||
| String started = getNotes().get(AuthenticatedClientSessionModel.USER_SESSION_STARTED_AT_NOTE); | ||
| return started == null ? timestamp : Integer.parseInt(started); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This defaults to timestamp while AuthenticatedClientSessionModel.getUserSessionStarted defaults to 0 if USER_SESSION_STARTED_AT_NOTE is not set. Is it intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's correct. Previously getTimestamp was always used. So this is maintaining the same behavior just in case. If we use 0 all the session without the note would be expired. Note that we are doing the same in the map part (here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why we don't use timestamp in AuthenticatedClientSessionModel but 0 instead?
default int getUserSessionStarted() {
String started = getNote(USER_SESSION_STARTED_AT_NOTE);
return started == null ? 0 : Integer.parseInt(started);
}
and older code
default int getStarted() {
String started = getNote(STARTED_AT_NOTE);
// Fallback to 0 if "started" note is not available. This can happen for the offline sessions migrated from old version where "startedAt" note was not yet available
return started == null ? 0 : Integer.parseInt(started);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do it, but I really don't know if it can be problematic. In AuthenticatedClientSessionEntity I simply created getUserSessionStarted following the same idea that was in getStarted (so I returned 0).
For the PR the classes that are really used are AuthenticatedClientSessionEntity and MapAuthenticatedClientSessionEntity (which don't implement AuthenticatedClientSessionEntity and re-implement the access to notes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mposolda Do you recall why you used 0 as default? To me, this implies that isIssuedBeforeSessionStart check will be always false meaning "legacy" sessions will always pass this check. At the same time, this PR adds https://github.com/keycloak/keycloak/pull/17525/files#diff-fd8b1c47da5ec7ee42d5450d3f7b73cdeef9c08af26b825c9245e3e6052c5cdfR1090 and https://github.com/keycloak/keycloak/pull/17525/files#diff-fd8b1c47da5ec7ee42d5450d3f7b73cdeef9c08af26b825c9245e3e6052c5cdfR1130 that uses timestamp instead 0.
Where I am heading with this, isn't better to use timestamp at the first place unless Marek remembers reasons why he used 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @martin-kanis, I can give it a try to see it brings any issue. We can change both methods to return the timestamp instead of 0 in AuthenticatedClientSessionEntity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-kanis @rmartinc This is good catch!
I recall this was added in this commit https://github.com/keycloak/keycloak/pull/8130/files#diff-b4014a5e9c73d54dc98153e6fa64803e21ab0646a75566cf74ca9a02abbb18db . And some related story behind that is described in this JIRA description https://issues.redhat.com/browse/KEYCLOAK-18368 .
The 0 was just a fallback, which was only used for "offline client sessions" migrated from previous version (As those offline client sessions didn't have "startedAt" note on them). In other words when it would use timestamp instead of 0 in that PR, we wouldn't be able to refresh offline tokens from previous version due this possible scenario:
- Offline token issued on old server (before KEYCLOAK-18368) at time 1000
- Keycloak server stopped and started with new Keycloak version (with KEYCLOAK-18368 applied)
- Refresh of the offline token at time 2000 would fail due this check https://github.com/keycloak/keycloak/blob/21.1.1/services/src/main/java/org/keycloak/protocol/oidc/TokenManager.java#L186-L189
Note that this check is needed exactly because of the scenario described in https://issues.redhat.com/browse/KEYCLOAK-18368. But at the same time, we should make sure that offline tokens from previous version must be possible to refresh. So that's the story of why it was originally added :-)
Now some thoughts for this PR:
IMO the ideal is, if this fallback to 0 (or timestamp) is effectively not needed anywhere. It looks that the only case how the USER_SESSION_STARTED_AT_NOTE note won't be present is (again) the offline sessions migrated from previous version?
AFAIK we don't need to care about new store as we don't support migrating of offline sessions created with the new store in previous versions. Is it correct @martin-kanis ? So we likely need to care just about old store and the offline sessions coming from UserSessionPersister.
So as long as all client sessions returned by JpaUserSessionPersisterProvider have this note on them, we should be good. If we always have corresponding UserSession (which AFAIK we have), we can just use userSession.getStarted() to fill that note before saving those persisted sessions in infinispan?
Also I wonder if AuthenticatedClientSessionModel.getUserSessionStarted() could be updated like this in this PR?
default int getUserSessionStarted() {
String started = getNote(USER_SESSION_STARTED_AT_NOTE);
return started == null ? getUserSession().getStarted() : Integer.parseInt(started);
}
In other words, fallback to look at userSession.getStarted() instead of using hardcoded value of 0 (or timestamp). Ideal is if AuthenticatedClientSessionEntity also has access to userSession, so it can look into it, but I suppose this would require some bigger refactoring, which we don't want to do in this PR?
In shortcut: In MigrationTest we have scenario for testing refresh of the offline-token from previous version. So it seems that this PR should be tested with MigrationTest to make sure it works. AFAIK MigrationTest is not yet in GH actions, so this may need to be done manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this with the MigrationTest and it works (mariadb only, but I think that's enough). I have also done two little changes:
- The change proposed by @mposolda in
AuthenticatedClientSessionModel.getUserSessionStarted. - And I also modified these lines commented by @martin-kanis. He is right and if 0 the previous check for
isIssuedBeforeSessionStartfails.
For the previous comments, I think that returning 0 in AuthenticatedClientSessionModel interface is necessary. Not changing anything there in this PR for the moment.
WDYT now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return started == null ? timestamp : Integer.parseInt(started); | ||
| } | ||
|
|
||
| public int getStarted() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This defaults to timestamp while AuthenticatedClientSessionModel.getStarted defaults to 0 if STARTED_AT_NOTE is not set. Is it intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the same.
…implementations Closes keycloak#14854 Closes keycloak#11990
510cc40 to
3a4e626
Compare
ghost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unreported flaky test detected, please review
Unreported flaky test detectedIf the below flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR. org.keycloak.testsuite.model.session.SessionTimeoutsTest#testOfflineUserClientIdleTimeoutSmallerThanSessionNoRefreshKeycloak CI - Store Model Tests |
martin-kanis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmartinc Thank you for the PR and all the changes! LGTM
mposolda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmartinc @martin-kanis Thanks for the great work and for the review!
Closes #14854
Closes #11990
I have joined both PRs into one. They are related and modify the same code area. Adding tests to check both. I have respected the original authors.