Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

shaan420
Copy link
Contributor

@shaan420 shaan420 commented Feb 4, 2025

What this PR does / why we need it:
When writing to a ConsumerService, m3msg producer randomly picks a ConsumerWriter that writes to a replica. It will block until there is a success or an error. In certain cases such as deployment of the ConsumerService, the error path induces an very high latency (25s+). This creates a huge backlog in the m3aggregator message queue and drastically increases the consume latency of the messages. In order to minimize the impact of these errors, this PR waits for a configurable amount of time on a write to return. If it doesn't then it opportunistically starts writing the message to another random replica concurrently. If this succeeds and the message is acked, the subsequent writes will detect that the stalled ConsumerWriter is still active and will skip over it to go straight to another ConsumerWriter. As soon as the connection stability returns, the m3msg producer will go back to writing to one replica.

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant