-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Description
Summary
We encountered a data consistency issue after our application triggered multiple OutOfMemoryError exceptions due to an unrelated bug. Notably, these OOMs did not immediately crash the system—instead, the JVM continued running in a degraded state.
This led to a situation where JedisCluster was left with a contaminated input buffer causing subsequent operations to read stale or incorrect responses from Redis.
Technical Explanation
We traced the core pattern responsible for this issue to redis.clients.jedis.Connection:
public Object executeCommand(final CommandArguments args) {
sendCommand(args);
return getOne();
}If an OutOfMemoryError occurs after sending the command but before reading the response, the response from Redis remains in the input buffer. Subsequent commands may then read this leftover data instead of their own response. In our case, we saw multiple OOM exceptions over time, amplifying the buffer contamination effect.
public Object executeCommand(final CommandArguments args) {
sendCommand(args); // 1. Command sent successfully
// 2. OutOfMemoryError thrown here!
return getOne(); // 3. This never executes - response stays in buffer
}Example:
// Operation 1
jedis.get("key-1");
// Redis responds: "data-1"
// OutOfMemoryError (or other exception) thrown before reading "data-1"
// "data-1" remains in input buffer
// Operation 2
jedis.get("key-2");
// Redis responds: "data-2"
// getOne() reads leftover "data-1" instead of "data-2"Questions
- Should JedisCluster proactively purge/reset its input stream after any non-JedisException failure (e.g. OOM during the command/response) or before issuing the next command?
- If not, how to prevent or recover from input buffer contamination?
Steps to Reproduce
Reproducing this issue in a real-world scenario is challenging due to the unpredictable nature of the OutOfMemoryError. To make investigation easier, I created a small piece of code that reliably showcases the buffer contamination problem:
- Clone LeFilou/jedis-oom.
- Follow the instructions in the repository to run the reproduction scenario.
- Buffer corruption does not occur on every single run, but it happens quite often when the scenario is executed multiple times.
Redis/Jedis Configuration
- Jedis Version: 6.2.0
- Redis: AWS ElastiCache (Redis Cluster),