-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Description
Before reporting an issue
- I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.
Area
infinispan
Describe the bug
We run a single-cluster deployment using a StatefulSet with 5 (in the past 3) replicas. When the underlying k8s-node of one of the replicas becomes NotReady (possibly because of other workloads on the k8s node), the pod is stuck in "Terminating". The other nodes experience Timeouts to the pod running on this node (as expected).
After 3 minutes, the failed node leaves the keycloak cluster and a new cluster view is created. Instead of the cluster returning to normal operation with the remaining nodes, communication between them breaks down (resulting in Timeouts) and them ultimately reporting as not healthy.
Version
26.5.2
Regression
- The issue is a regression
Expected behavior
Cluster resumes normal cluster operation after one node leaves the cluster.
Actual behavior
Logs after initial node fails:
2026-02-12 16:43:44.719errorISPN000476: Timed out waiting for responses for request 3264119 from keycloak-k1-3-40412 after 15 secondsISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12 16:43:44.720errorISPN000476: Timed out waiting for responses for request 3264119 from keycloak-k1-3-40412 after 15 secondsUncaught server error
2026-02-12 16:43:48.890errorISPN000476: Timed out waiting for responses for request 3264126 from keycloak-k1-3-40412 after 15 secondsISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12 16:43:50.720errorISPN000476: Timed out waiting for responses for request 3264133 from keycloak-k1-3-40412 after 15 secondsISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12 16:43:52.955errorISPN000427: Timeout after 15 seconds waiting for acks ([keycloak-k1-3-40412]). Id=3006, Topology Id=332ISPN000136: Error executing command PutKeyValueCommand on Cache 'authenticationSessions', writing keys [PqtGNanhtEt4WwzoPbFafyMs]
2026-02-12 16:43:52.956traceorg.infinispan.commons.TimeoutException: ISPN000427: Timeout after 15 seconds waiting for acks ([keycloak-k1-3-40412]). Id=3006, Topology Id=332Uncaught server error
...
Logs after new cluster view:
...
2026-02-12T15:46:45.806Z ISPN000094: Received new cluster view for channel ISPN: [keycloak-k1-0-2128|75] (4) [keycloak-k1-0-2128, keycloak-k1-1-24929, keycloak-k1-2-59615, keycloak-k1-4-56737]
2026-02-12T15:46:45.806Z Reloading JGroups Certificate
2026-02-12T15:46:45.823Z ISPN100001: Node keycloak-k1-3-40412 left the cluster
2026-02-12T15:46:45.823Z ISPN100001: Node keycloak-k1-3-40412 left the cluster
2026-02-12T15:47:00.604Z ISPN000476: Timed out waiting for responses for request 3266982 from keycloak-k1-2-59615 after 14.25 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'actionTokens', writing keys []
2026-02-12T15:47:00.604Z ISPN000476: Timed out waiting for responses for request 3266982 from keycloak-k1-2-59615 after 14.25 seconds Uncaught server error
2026-02-12T15:47:00.913Z ISPN000476: Timed out waiting for responses for request 3267026 from keycloak-k1-2-59615 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.917Z ISPN000476: Timed out waiting for responses for request 3267028 from keycloak-k1-2-59615 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.927Z ISPN000476: Timed out waiting for responses for request 3267031 from keycloak-k1-2-59615 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.932Z ISPN000476: Timed out waiting for responses for request 3267032 from keycloak-k1-2-59615 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.933Z ISPN000476: Timed out waiting for responses for request 3267033 from keycloak-k1-2-59615 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.951Z ISPN000476: Timed out waiting for responses for request 3267037 from keycloak-k1-2-59615 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.951Z ISPN000476: Timed out waiting for responses for request 3267038 from keycloak-k1-2-59615 after 14.96 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.980Z ISPN000476: Timed out waiting for responses for request 3267040 from keycloak-k1-2-59615 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.983Z ISPN000476: Timed out waiting for responses for request 3267041 from keycloak-k1-2-59615 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.983Z ISPN000476: Timed out waiting for responses for request 3267042 from keycloak-k1-1-24929 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.986Z ISPN000476: Timed out waiting for responses for request 3267043 from keycloak-k1-1-24929 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.986Z ISPN000476: Timed out waiting for responses for request 3267044 from keycloak-k1-1-24929 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.987Z ISPN000476: Timed out waiting for responses for request 3267045 from keycloak-k1-2-59615 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.990Z ISPN000476: Timed out waiting for responses for request 3267046 from keycloak-k1-1-24929 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.992Z ISPN000476: Timed out waiting for responses for request 3267047 from keycloak-k1-0-2128 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.997Z ISPN000476: Timed out waiting for responses for request 3267048 from keycloak-k1-2-59615 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.997Z ISPN000476: Timed out waiting for responses for request 3267049 from keycloak-k1-2-59615 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.997Z ISPN000476: Timed out waiting for responses for request 3267050 from keycloak-k1-0-2128 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
2026-02-12T15:47:00.997Z ISPN000476: Timed out waiting for responses for request 3267051 from keycloak-k1-0-2128 after 15 seconds ISPN000136: Error executing command GetKeyValueCommand on Cache 'clientSessions', writing keys []
...
Cluster reporting as not healthy after another 40 seconds:
2026-02-12 16:47:40.987infoSRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Keycloak cluster health check","status":"DOWN","data":{"Failing since":"2026-02-12 15:47:40,984"}},{"name":"Keycloak database connections async health check","status":"UP"}]}
A "rollout restart" of the Statefulset fixes the cluster state.
How to Reproduce?
Stateful set with multiple replicas (occured with both 3 and 5).
Environment:
KC_CACHE: ispn
KC_CACHE_STACK: jdbc-ping
Happened multiple times, however not clear how to reliably reproduce.
Anything else?
No response