[Messenger] error in receiver results in message staying in queue forever #32055

Tobion · 2019-06-14T23:47:01Z

When an error in the worker happens before the message is dispatched on the bus (deserialization error of a message for example), the message is neither ack, nor nack, the exception not caught and the worker quit.
The result is that the message stays in the queue (at the front). So the next worker will try to receive the same message again and likely fail again. This continues forever and you can't consume any good messages anymore.

A similar situation can happen when the handling of a message takes longer than the rabbitmq connection heartbeat or timeout. Ref. #31707 and php-enqueue/enqueue-dev#658 (comment)

Rabbitmq puts the messages back into the queue with a Redelivered header. How we solved it in our apps using https://github.com/M6Web/AmqpBundle is to ack the message directly when it has the Redelivered header and then trigger a retry.

I think it's important that SF messenger can handle those cases as well using it's retry logic and failed messages transport.

weaverryan · 2019-06-17T13:42:48Z

@Tobion I'm not sure if this fully explains/covers your situation, but a serializer is supposed to throw a MessageDecodingFailedException message if deserialization specifically fails:

symfony/src/Symfony/Component/Messenger/Transport/Serialization/Serializer.php

Line 67 in a0aa941

    
           throw new MessageDecodingFailedException('Encoded envelope should have at least a "body" and some "headers".');

Then, receivers/transports are supposed to catch this and reject the message -

symfony/src/Symfony/Component/Messenger/Transport/RedisExt/RedisReceiver.php

Line 54 in 2c9a196

} catch (MessageDecodingFailedException $exception) {

This is something that didn't happen in 4.2, so it was fixed in 4.3. It does, however, rely on two "shoulds" that are mentioned on two interfaces (the serializer "should" throw that exception and the receiver "should" catch it and reject).

Is this your issue? Do you have a custom serializer or transport that's not following these "shoulds"?

Update: And we decided to "reject" instead of retry as a deserialization error is not one that seems "temporary"

Tobion · 2019-06-19T00:16:22Z

@weaverryan you are right. Our custom serializer didn't throw a MessageDecodingFailedException. Using that, errors in serialization are handled correctly. But there is still room for improvments:

Serialization errors could use the failed transport to keep track of them
We should still handle \AMQPEnvelope::isRedelivery as explained above. There can be alot of reasons for this to happen, e.g. connection lost or processing taking too long or like in my case serialization errors without using MessageDecodingFailedException. So that can still result in blocked queues and redelivery loops.

https://www.rabbitmq.com/confirms.html

This means that if all consumers requeue because they cannot process a delivery due to a transient condition, they will create a requeue/redelivery loop. Such loops can be costly in terms of network bandwidth and CPU resources. Consumer implementations can track the number of redeliveries and reject messages for good (discard them) or schedule requeueing after a delay.

So we should also handle retry for those.

Why do serialization errors (any error outside the bus) result in worker termination but exceptions in handlers don't? I think it's not a problem because you need something like supervisor anyway, but it feels arbitrary.

weaverryan · 2019-06-19T13:46:31Z

Ok, let's break this down so we can see what actionable things we can do.

Serialization errors could use the failed transport to keep track of them

That seems reasonable... we would basically "fail" in the same way that an exception from a handler. So, by default, it would fail 3 times, then go to the failure queue. This also would solve item (3) I believe: if exceptions from deserialization are handled the same as exceptions from handlers, then the worker would not exit in both situations.

We should still handle \AMQPEnvelope::isRedelivery as explained above

I'd appreciate a separate issue or PR on this... as I'm still far from a RabbitMQ expert. I don't quite understand the flow/problem:

A) Handler takes longer than Rabbit connections heartbeat or timeout
B) Rabbit puts back in queue with a Redelievered header
C) Our app sees the message with AMQPEnvelope::isRedelivery() and so we ack immediately and trigger a retry?
D) Our app handles the retried message

I'm fuzzy about a few things:

In (A), if the handler took really long... does it mean it wasn't handled? Or did our app actually handle it and so we should not handle the redelivered one?
In (C) Why do we ack and then trigger a retry? By "trigger a retry" do you mean use the same redelivery/retry functionality we currently have?

Cheers!

Tobion · 2019-06-20T01:47:21Z

Yes the flow is described correctly. Trying to answer your questions:

(A) That's a tough question. We don't really know if it was handled. It could be handled but just the ack to rabbitmq didn't work. Or it was handled partly and then failed and then the nack didn't work either. This undecidable problem is also in the retry. We retry the full message handling. But maybe only part of it failed and other parts/handlers actually succeeded. So some handlers will be executed twice. This is why you usually want to implement message queues with idempotence in mind. This is a general problem and not what I'm trying to solve here.
(C) We ack first to break a potential redelivery loop. Say a handler just always takes too much time and loses the connection so the ack doesn't work. So it will get redelivered everytime and fail to ack every time. By acknowledging first (or using auto-ack), we break the loop. By retrying then (yes the normal retry functionality we already have), we still make sure the message get's handled at some point with a max retry counter to break the retry loop as well.

weaverryan · 2019-07-08T18:44:35Z

To keep this bumping, I think I see two actionable things:

A) On deserialization errors (MessageDecodingFailedException), sent to the failure transport so those messages can be dealt with later.

B) Handle \AMQPEnvelope::isRedelivery(), which would be: if \AMQPEnvelope::isRedelivery(), then ack() immediately and then redeliver (using our normal redelivery mechanism).

Correct?

Tobion · 2019-09-09T19:30:29Z

We just had a different case where the messsage get's redelivered again and again blocking the queue.
We have a message containing SimpleXMLElement and when it fails and should be sent to the failed transport, the PhpSerializer errors with Serialization of 'SimpleXMLElement' is not allowed.
The problem is that it happens as a listener (SendFailedMessageToFailureTransportListener) in

symfony/src/Symfony/Component/Messenger/Worker.php

Line 141 in 92e64a1

    
           $this->dispatchEvent(new WorkerMessageFailedEvent($envelope, $transportName, $throwable, $shouldRetry));

. So an exception there quits the worker and the message is neither acknoledged nor rejected.
So it get's redelivered by rabbitmq and fails again and again...
Maybe we can put a try...finally around the worker event to make sure the message is rejected at the end.

…e is dropped (Tobion) This PR was merged into the 4.3 branch. Discussion ---------- Revert "[Messenger] Fix exception message of failed message is dropped | Q | A | ------------- | --- | Branch? | 4.3 | Bug fix? | yes | New feature? | no  | Deprecations? | no  | Tickets | | License | MIT | Doc PR | This reverts #33600 because it makes the message grow for each retry until AMQP cannot handle it anymore. On each retry, the full exception trace is added to the message. So in our case on the 5th retry, the message is too big for the AMQP library to encode it. AMQP extension then throws the exception > Library error: table too large for buffer (ref. alanxz/rabbitmq-c#224 and php-amqp/php-amqp#131) when trying to publish the message. To solve this, I suggest to revert #33600 (this PR) and merge #32341 instead which does not re-add the exception on each failure. Btw, the above problem causes other problematic side-effects of Symfony messenger. As the new retry message fails to be published with an exception, the old (currently processed message) also does not get removed (acknowledged) from the delay queue. So rabbitmq redelivers the message and the same thing happens forever. This can block the consumers and have a huge toll on your service. That's just another case for #32055 (comment). I'll try to fix this in another PR. Commits ------- 3dbe924 Revert "[Messenger] Fix exception message of failed message is dropped on retry"

Tobion · 2019-10-24T15:54:54Z

I fixed problem B) in #34107

Feature A) (sent deserialization errors to the failure transport) is a nice-to-have but also not straight forward because the sending to failure transport relies on \Symfony\Component\Messenger\Event\WorkerMessageFailedEvent which requires an evelope. But when the deserialization fails, you obviously have no envelope to use. So let's keep that separate and does not have much prio to me.

…queues (Tobion) This PR was merged into the 4.3 branch. Discussion ---------- [Messenger] prevent infinite redelivery loops and blocked queues | Q | A | ------------- | --- | Branch? | 4.3 | Bug fix? | yes | New feature? | no  | Deprecations? | no  | Tickets | Fix #32055 | License | MIT | Doc PR | This PR solves a very common fitfall of amqp redeliveries. It's for example explained in https://blog.forma-pro.com/rabbitmq-redelivery-pitfalls-440e0347f4e0 Newer RabbitMQ versions provide a solution for this by itself but only for quorum queues and not the classic ones, see rabbitmq/rabbitmq-server#1889 This PR adds a middleware that throws a RejectRedeliveredMessageException when a message is detected that has been redelivered by AMQP. The middleware runs before the HandleMessageMiddleware and prevents redelivered messages from being handled directly. The thrown exception is caught by the worker and will trigger the retry logic according to the retry strategy. AMQP redelivers messages when they do not get acknowledged or rejected. This can happen when the connection times out or an exception is thrown before acknowledging or rejecting. When such errors happen again while handling the redelivered message, the message would get redelivered again and again. The purpose of this middleware is to prevent infinite redelivery loops and to unblock the queue by republishing the redelivered messages as retries with a retry limit and potential delay. Commits ------- d211904 [Messenger] prevent infinite redelivery loops and blocked queues

Tobion changed the title ~~[Messenger] error in receiver will make the message stay in the queue forever~~ [Messenger] error in receiver results in message staying in queue forever Jun 14, 2019

Tobion added Messenger Enhancement labels Jun 14, 2019

Tobion mentioned this issue Jun 20, 2019

[Messenger] Create a MessageEncodingFailedException #32117

Closed

Tobion mentioned this issue Jul 4, 2019

[Messenger] Library error: a socket error occurred #32357

Closed

Tobion mentioned this issue Oct 23, 2019

Revert "[Messenger] Fix exception message of failed message is dropped #34082

Merged

Tobion mentioned this issue Oct 24, 2019

[Messenger] prevent infinite redelivery loops and blocked queues #34107

Merged

Tobion closed this as completed Oct 25, 2019

weaverryan mentioned this issue Feb 19, 2021

[FrameworkBundle] Removed RejectRedeliveredMessageMiddleware to avoid message to be lost if it cannot be processed by both handler and failed transport #40249

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Messenger] error in receiver results in message staying in queue forever #32055

[Messenger] error in receiver results in message staying in queue forever #32055

Tobion commented Jun 14, 2019 •

edited

Loading

weaverryan commented Jun 17, 2019 •

edited

Loading

Tobion commented Jun 19, 2019 •

edited

Loading

weaverryan commented Jun 19, 2019

Tobion commented Jun 20, 2019 •

edited

Loading

weaverryan commented Jul 8, 2019

Tobion commented Sep 9, 2019 •

edited

Loading

Tobion commented Oct 24, 2019

[Messenger] error in receiver results in message staying in queue forever #32055

[Messenger] error in receiver results in message staying in queue forever #32055

Comments

Tobion commented Jun 14, 2019 • edited Loading

weaverryan commented Jun 17, 2019 • edited Loading

Tobion commented Jun 19, 2019 • edited Loading

weaverryan commented Jun 19, 2019

Tobion commented Jun 20, 2019 • edited Loading

weaverryan commented Jul 8, 2019

Tobion commented Sep 9, 2019 • edited Loading

Tobion commented Oct 24, 2019

Tobion commented Jun 14, 2019 •

edited

Loading

weaverryan commented Jun 17, 2019 •

edited

Loading

Tobion commented Jun 19, 2019 •

edited

Loading

Tobion commented Jun 20, 2019 •

edited

Loading

Tobion commented Sep 9, 2019 •

edited

Loading