Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Messenger] error in receiver results in message staying in queue forever #32055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tobion opened this issue Jun 14, 2019 · 7 comments
Closed

Comments

@Tobion
Copy link
Contributor

Tobion commented Jun 14, 2019

When an error in the worker happens before the message is dispatched on the bus (deserialization error of a message for example), the message is neither ack, nor nack, the exception not caught and the worker quit.
The result is that the message stays in the queue (at the front). So the next worker will try to receive the same message again and likely fail again. This continues forever and you can't consume any good messages anymore.

A similar situation can happen when the handling of a message takes longer than the rabbitmq connection heartbeat or timeout. Ref. #31707 and php-enqueue/enqueue-dev#658 (comment)

Rabbitmq puts the messages back into the queue with a Redelivered header. How we solved it in our apps using https://github.com/M6Web/AmqpBundle is to ack the message directly when it has the Redelivered header and then trigger a retry.

I think it's important that SF messenger can handle those cases as well using it's retry logic and failed messages transport.

@Tobion Tobion changed the title [Messenger] error in receiver will make the message stay in the queue forever [Messenger] error in receiver results in message staying in queue forever Jun 14, 2019
@weaverryan
Copy link
Member

weaverryan commented Jun 17, 2019

@Tobion I'm not sure if this fully explains/covers your situation, but a serializer is supposed to throw a MessageDecodingFailedException message if deserialization specifically fails:

throw new MessageDecodingFailedException('Encoded envelope should have at least a "body" and some "headers".');

Then, receivers/transports are supposed to catch this and reject the message -

} catch (MessageDecodingFailedException $exception) {

This is something that didn't happen in 4.2, so it was fixed in 4.3. It does, however, rely on two "shoulds" that are mentioned on two interfaces (the serializer "should" throw that exception and the receiver "should" catch it and reject).

Is this your issue? Do you have a custom serializer or transport that's not following these "shoulds"?

Update: And we decided to "reject" instead of retry as a deserialization error is not one that seems "temporary"

@Tobion
Copy link
Contributor Author

Tobion commented Jun 19, 2019

@weaverryan you are right. Our custom serializer didn't throw a MessageDecodingFailedException. Using that, errors in serialization are handled correctly. But there is still room for improvments:

  1. Serialization errors could use the failed transport to keep track of them
  2. We should still handle \AMQPEnvelope::isRedelivery as explained above. There can be alot of reasons for this to happen, e.g. connection lost or processing taking too long or like in my case serialization errors without using MessageDecodingFailedException. So that can still result in blocked queues and redelivery loops.

https://www.rabbitmq.com/confirms.html

This means that if all consumers requeue because they cannot process a delivery due to a transient condition, they will create a requeue/redelivery loop. Such loops can be costly in terms of network bandwidth and CPU resources. Consumer implementations can track the number of redeliveries and reject messages for good (discard them) or schedule requeueing after a delay.

So we should also handle retry for those.

  1. Why do serialization errors (any error outside the bus) result in worker termination but exceptions in handlers don't? I think it's not a problem because you need something like supervisor anyway, but it feels arbitrary.

@weaverryan
Copy link
Member

Ok, let's break this down so we can see what actionable things we can do.

  1. Serialization errors could use the failed transport to keep track of them

That seems reasonable... we would basically "fail" in the same way that an exception from a handler. So, by default, it would fail 3 times, then go to the failure queue. This also would solve item (3) I believe: if exceptions from deserialization are handled the same as exceptions from handlers, then the worker would not exit in both situations.

  1. We should still handle \AMQPEnvelope::isRedelivery as explained above

I'd appreciate a separate issue or PR on this... as I'm still far from a RabbitMQ expert. I don't quite understand the flow/problem:

A) Handler takes longer than Rabbit connections heartbeat or timeout
B) Rabbit puts back in queue with a Redelievered header
C) Our app sees the message with AMQPEnvelope::isRedelivery() and so we ack immediately and trigger a retry?
D) Our app handles the retried message

I'm fuzzy about a few things:

  • In (A), if the handler took really long... does it mean it wasn't handled? Or did our app actually handle it and so we should not handle the redelivered one?
  • In (C) Why do we ack and then trigger a retry? By "trigger a retry" do you mean use the same redelivery/retry functionality we currently have?

Cheers!

@Tobion
Copy link
Contributor Author

Tobion commented Jun 20, 2019

Yes the flow is described correctly. Trying to answer your questions:

  • (A) That's a tough question. We don't really know if it was handled. It could be handled but just the ack to rabbitmq didn't work. Or it was handled partly and then failed and then the nack didn't work either. This undecidable problem is also in the retry. We retry the full message handling. But maybe only part of it failed and other parts/handlers actually succeeded. So some handlers will be executed twice. This is why you usually want to implement message queues with idempotence in mind. This is a general problem and not what I'm trying to solve here.
  • (C) We ack first to break a potential redelivery loop. Say a handler just always takes too much time and loses the connection so the ack doesn't work. So it will get redelivered everytime and fail to ack every time. By acknowledging first (or using auto-ack), we break the loop. By retrying then (yes the normal retry functionality we already have), we still make sure the message get's handled at some point with a max retry counter to break the retry loop as well.

@weaverryan
Copy link
Member

To keep this bumping, I think I see two actionable things:

A) On deserialization errors (MessageDecodingFailedException), sent to the failure transport so those messages can be dealt with later.

B) Handle \AMQPEnvelope::isRedelivery(), which would be: if \AMQPEnvelope::isRedelivery(), then ack() immediately and then redeliver (using our normal redelivery mechanism).

Correct?

@Tobion
Copy link
Contributor Author

Tobion commented Sep 9, 2019

We just had a different case where the messsage get's redelivered again and again blocking the queue.
We have a message containing SimpleXMLElement and when it fails and should be sent to the failed transport, the PhpSerializer errors with Serialization of 'SimpleXMLElement' is not allowed.
The problem is that it happens as a listener (SendFailedMessageToFailureTransportListener) in

$this->dispatchEvent(new WorkerMessageFailedEvent($envelope, $transportName, $throwable, $shouldRetry));
. So an exception there quits the worker and the message is neither acknoledged nor rejected.
So it get's redelivered by rabbitmq and fails again and again...
Maybe we can put a try...finally around the worker event to make sure the message is rejected at the end.

Tobion added a commit that referenced this issue Oct 23, 2019
…e is dropped (Tobion)

This PR was merged into the 4.3 branch.

Discussion
----------

Revert "[Messenger] Fix exception message of failed message is dropped

| Q             | A
| ------------- | ---
| Branch?       | 4.3
| Bug fix?      | yes
| New feature?  | no <!-- please update src/**/CHANGELOG.md files -->
| Deprecations? | no <!-- please update UPGRADE-*.md and src/**/CHANGELOG.md files -->
| Tickets       |
| License       | MIT
| Doc PR        |

This reverts #33600 because it makes the message grow for each retry until AMQP cannot handle it anymore. On each retry, the full exception trace is added to the message. So in our case on the 5th retry, the message is too big for the AMQP library to encode it. AMQP extension then throws the exception

> Library error: table too large for buffer

(ref. alanxz/rabbitmq-c#224 and php-amqp/php-amqp#131) when trying to publish the message.

To solve this, I suggest to revert #33600 (this PR) and merge #32341 instead which does not re-add the exception on each failure.

Btw, the above problem causes other problematic side-effects of Symfony messenger. As the new retry message fails to be published with an exception, the old (currently processed message) also does not get removed (acknowledged) from the delay queue. So rabbitmq redelivers the message and the same thing happens forever. This can block the consumers and have a huge toll on your service. That's just another case for #32055 (comment). I'll try to fix this in another PR.

Commits
-------

3dbe924 Revert "[Messenger] Fix exception message of failed message is dropped on retry"
@Tobion
Copy link
Contributor Author

Tobion commented Oct 24, 2019

I fixed problem B) in #34107

Feature A) (sent deserialization errors to the failure transport) is a nice-to-have but also not straight forward because the sending to failure transport relies on \Symfony\Component\Messenger\Event\WorkerMessageFailedEvent which requires an evelope. But when the deserialization fails, you obviously have no envelope to use. So let's keep that separate and does not have much prio to me.

Tobion added a commit that referenced this issue Oct 25, 2019
…queues (Tobion)

This PR was merged into the 4.3 branch.

Discussion
----------

[Messenger] prevent infinite redelivery loops and blocked queues

| Q             | A
| ------------- | ---
| Branch?       | 4.3
| Bug fix?      | yes
| New feature?  | no <!-- please update src/**/CHANGELOG.md files -->
| Deprecations? | no <!-- please update UPGRADE-*.md and src/**/CHANGELOG.md files -->
| Tickets       | Fix #32055
| License       | MIT
| Doc PR        |

This PR solves a very common fitfall of amqp redeliveries. It's for example explained in https://blog.forma-pro.com/rabbitmq-redelivery-pitfalls-440e0347f4e0
Newer RabbitMQ versions provide a solution for this by itself but only for quorum queues and not the classic ones, see rabbitmq/rabbitmq-server#1889

This PR adds a middleware that throws a RejectRedeliveredMessageException when a message is detected that has been redelivered by AMQP.

The middleware runs before the HandleMessageMiddleware and prevents redelivered messages from being handled directly. The thrown exception is caught by the worker and will trigger the retry logic according to the retry strategy.

AMQP redelivers messages when they do not get acknowledged or rejected. This can happen when the connection times out or an exception is thrown before acknowledging or rejecting. When such errors happen again while handling the redelivered message, the message would get redelivered again and again. The purpose of this middleware is to prevent infinite redelivery loops and to unblock the queue by republishing the redelivered messages as retries with a retry limit and potential delay.

Commits
-------

d211904 [Messenger] prevent infinite redelivery loops and blocked queues
@Tobion Tobion closed this as completed Oct 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants