Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Messenger] Could not acknowledge redis message with BatchHandlerInterface #44400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zip-fa opened this issue Dec 1, 2021 · 24 comments
Open

Comments

@zip-fa
Copy link

zip-fa commented Dec 1, 2021

Symfony version(s) affected

5.4.0, 6.0.0

How to reproduce

I have MessengerHandler which makes http requests (with only one parallel consumer):

messenger.yaml:

transports:
  test:
    dsn: '%env(MESSENGER_TRANSPORT_DSN)%/test'
    options:
      delete_after_ack: true
      consumer: '%env(MESSENGER_CONSUMER_NAME)%'
    retry_strategy:
      max_retries: 0

routing:
     'TestMessage': test

TestMessageHandler.php:

<?php
class TestMessageHandler implements MessageHandlerInterface, BatchHandlerInterface
{
    use BatchHandlerTrait;

    public function __construct(
        private HttpClientInterface $client
    ) { }

    public function __invoke(TestMessage $message, Acknowledger &$ack = null)
    {        
        return $this->handle($message, $ack);
    }
    
    private function shouldFlush(): bool
    {
        return 5 <= \count($this->jobs);
    }

    private function process(array $jobs): void
    {
        $responses = [];
        
        foreach ($jobs as [$job, $ack]) {          
            try {
                [$headers, $content] = prepareRequest();
                
                $responses[] = $this->client->request('POST', $job->getEndpoint(), [
                    'headers' => $headers,
                    'body' => $content
                ]);

                $ack->ack($job);
            } catch (\Exception $e) {
                $ack->nack($e);
            }
        }
        
        if(0 === count($responses)) {
          return;
        }

        foreach ($this->client->stream($responses) as $response => $chunk) {
            if ($chunk->isFirst()) {
                var_dump($response->getStatusCode());
            } else if ($chunk->isLast()) {
            }
        }
    }
}

cli:

MESSENGER_CONSUMER_NAME=test php bin/console messenger:consume test

Code works okay, but when i hit ctrl+c before script finished and start it again, it throws me an error:

In Connection.php line 441:

  Could not acknowledge redis message "1638374074049-0".

I noticed that after consumer crashes or i hit ctrl+c, same message comes to __invoke a few times. When removing BatchHandlerInterface it starts to work as excepected without any issue.

Possible Solution 1 (a bad one)

  1. Add uuid to every Message
  2. Destroy Acknowledger and create replacement for it (HandleMessageMiddleware.php:88)

Test.php:

class Test
{
  public function isAcknowledged()
  {
    return false;
  }
}

TestMessageHandler.php:

public function __invoke(SendPushMessage $message, Acknowledger &$ack = null)
{
    $uid = $message->getUuid();
    
    if(isset($this->jobs[$uid])) {
      $this->flush(true);
      
      try {
        $ack = null;
        unset($ack);
      } catch(\Exception) { }
      
      $ack = new Test();

      return 0;
    }
    
    return $this->handle($message, $ack);
}

private function handle(object $message, ?Acknowledger $ack)
{
    $uid = $message->getUuid();

    if (null === $ack) {
        $ack = new Acknowledger(get_debug_type($this));
        $this->jobs[$uid] = [$message, $ack];
        $this->flush(true);

        return $ack->getResult();
    }

    $this->jobs[$uid] = [$message, $ack];
    if (!$this->shouldFlush()) {
        return \count($this->jobs);
    }

    $this->flush(true);

    return 0;
}

Possible Solution 2

Comment __destruct in Symfony\Component\Messenger\Handler\Acknowledger.php

Handler.php:

public function __invoke(SendPushMessage $message, Acknowledger $ack = null)
{
    $uid = $message->getUuid();
    
    if(isset($this->jobs[$uid])) {
      $this->flush(true);
    
      return 0;
    }
    
    return $this->handle($message, $ack);
}

private function handle(object $message, ?Acknowledger $ack)
{
    $uid = $message->getUuid();

    if (null === $ack) {
        $ack = new Acknowledger(get_debug_type($this));
        $this->jobs[$uid] = [$message, $ack];
        $this->flush(true);

        return $ack->getResult();
    }

    $this->jobs[$uid] = [$message, $ack];
    if (!$this->shouldFlush()) {
        return \count($this->jobs);
    }

    $this->flush(true);

    return 0;
}

Additional Context

No response

@zip-fa zip-fa added the Bug label Dec 1, 2021
@zip-fa zip-fa changed the title Could not delete message from the redis stream with BatchHandlerInterface Could not acknowledge redis message with BatchHandlerInterface Dec 1, 2021
@zip-fa zip-fa changed the title Could not acknowledge redis message with BatchHandlerInterface [Messenger] Could not acknowledge redis message with BatchHandlerInterface Dec 1, 2021
@rvanlaak
Copy link
Contributor

rvanlaak commented Mar 15, 2022

Same here, could this have something to do with what is stated about MESSENGER_CONSUMER_NAME in the documentation when using Redis as transport in combination with Supervisor?

In our infra we have Supervisor run 4 instances of the bin/console messenger:consume command.

@zip-fa
Copy link
Author

zip-fa commented Mar 18, 2022

Any news @xabbuh?

@rvanlaak
Copy link
Contributor

@zip-fa since I've properly configured the Supervisor application to register to Redis on a unique DSN this issue seems resolved for me.

After translating this Russian blogpost by Алексей Альшенецкий I've came to the following configuration:

# messenger.yaml
parameters:
    env(CONSUMER_ID): '0'

framework:
    messenger:
        transports:
            async: "redis://%env(REDIS_HOST)%:%env(REDIS_PORT)%/messages/symfony/consumer-%env(CONSUMER_ID)%?auto_setup=true"
# supervisor.conf
[program:messenger-consume]
command=php /var/www/html/bin/console messenger:consume async
numprocs=10
autostart=true
autorestart=true
environment = CONSUMER_ID=%(process_num)d

Relevant details

  • You need to know that Symfony Messenger for working with Redis uses a special data type that appeared in Redis 5.0 - streams (Streams)
  • All consumers that we want to run in parallel must have a unique name so as not to conflict while “parsing” messages.
  • In the above example, symfony is the name of the consumer group. The auto-setup flag allows Redis to create it automatically.
  • In this group, we dynamically (based on the CONSUMER_ID environment variable) generate the name of the consumer so that each has a unique name (consumer-0, consumer-1, consumer-2, e.t.c).
  • Above, in parameters, we declare a default value for the CONSUMER_ID variable so that the application does not throw an error when parsing messenger.yaml if you use the same application image outside of the messenger:consume process.

@dkarlovi
Copy link
Contributor

We're also seeing this issue with only two workers on Azure managed Redis.

This is the current config:

framework:
    messenger:
        buses:
            messenger.bus.pimcore-core:
                middleware:
                    - doctrine_ping_connection
                    - doctrine_close_connection
                    - doctrine_transaction
        transports:
            pimcore_core:
                dsn: '%env(file:MESSENGER_DSN_FILE)%'
                options:
                    delete_after_ack: true
                    stream: '%env(MESSENGER_GROUP)%'
                    consumer: '%env(HOSTNAME)%'
            pimcore_maintenance:
                dsn: '%env(file:MESSENGER_DSN_FILE)%'
                options:
                    delete_after_ack: true
                    stream: '%env(MESSENGER_GROUP)%'
                    consumer: '%env(HOSTNAME)%'
            pimcore_image_optimize:
                dsn: '%env(file:MESSENGER_DSN_FILE)%'
                options:
                    delete_after_ack: true
                    stream: '%env(MESSENGER_GROUP)%'
                    consumer: '%env(HOSTNAME)%'
  1. DSN looks like rediss://password@host?dbindex=0, it doesn't define stream, group or consumer
  2. MESSENGER_GROUP is the current Kubernetes namespace (we share the same Redis server for multiple instances of the app, it's a review app deployment sort of thing), so preview-branch, preview-other-branch, etc
  3. HOSTNAME is the container hostname, we're using the K8s StatefulSet object so the hostnames are fixed and look like worker-0, worker-1 etc

It mostly works, but then something happens and the messages consistently cannot be acknowledged, breaking workers for a while.

@nicolas-grekas
Copy link
Member

A small reproducing app might help here.

@dkarlovi
Copy link
Contributor

@nicolas-grekas I expect this to be very hard to reproduce since it might be context specific (in our case, Azure managed Redis), if I was able to reproduce it, I'd be able to fix it. :)

I could try to fully debug it if you point me to how this is supposed to work, what are the scenarios where Redis wouldn't be able to acknowledge the message? It was already acknowledged, for example?

@ohaag
Copy link

ohaag commented Sep 29, 2022

I have the same issue with redis host on K8S or in local.
In my case, the handler pull a .html.tar.gz file on AWS S3 locally and extract it.
It can happen on less than 500 messages (--limit=500 and batchsize=50)

Like you said, after the fail, whatever the batchsize is, that same message will be __invoke every time.
When it does that, temporarily setting the batch_size to 1 seems to solve that last issue in order to unblock a queue.

If someone else does not do it early, I'll try to find some time to create a small app to reproduce it.

FYI: I asked also about it on slack here (exact same issue)
Simple debug after an error like this :

public function __invoke(PushMessage $message, Acknowledger $ack = null)
{
    dump('__invoke'.$message->key);
    return $this->handle($message, $ack);
}

private function process(array $jobs): void
{
    foreach ($jobs as [$message, $ack]) {
        dump('process'.$message->key);
    }
    die;
}

All dumps display the exact same message->key which are supposed to be uniq.

I'm not 100% sure, but even with a dummy consumer like this, without the could not acknowledge the message, you would have a The acknowledger was not called which may reproduce the strange behaviour (multiple times the same message) during the next try. (don't need to die maybe, just don't call the ack)

@pan85
Copy link

pan85 commented Sep 30, 2022

With my experience with this, the problem is that the message is already consumed. So you can try something like this:
create a service parameter with some custom function that return a random (unique) value
service.yaml
parameters: random_consumer: '%env(random_consumer:HOSTNAME)%
Then in messinger.yaml do something like:
transports: # https://symfony.com/doc/current/messenger.html#transport-configuration async: dsn: '%env(REDIS_CONNECTION_STRING)%/messages' options: delete_after_ack: true auth: '%env(urldecode:MESSENGER_REDIS_PASSWD)%' consumer: '%random_consumer%' auto_setup: true

@zip-fa
Copy link
Author

zip-fa commented Sep 30, 2022

With my experience with this, the problem is that the message is already consumed. So you can try something like this:
create a service parameter with some custom function that return a random (unique) value
parameters: random_consumer: '%env(random_consumer:HOSTNAME)%

I have the same config - read my first message

@AdamKatzDev
Copy link
Contributor

Me and my colleague have managed to fix the problem described here. It is actually not a single bug but a whole pile:

  • Messenger worker is not handling SIGINT properly, which leaves messages in batch unprocessed. Can be fixed by implementing a pcntl_signal handler. SIGTERM is handled properly in StopWorkerOnSigtermSignalListener.
  • Even if SIGINT is processed correctly and worker is stopped gracefully there is a bug that prevent messages from flushing. When worker flushes a handler but shouldFlush is false then on shutdown worker will fail to flush the remaining messages due to the faulty guard clause at the beginning of the flush method:
    private function flush(bool $force): bool
    {
    $unacks = $this->unacks;
    if (!$unacks->count()) {
    return false;
    }
    Can be fixed by implementing an Event Listener that force flushes all batch handlers on worker stop.
  • In 'claim' mode the Redis transport falls into a loop that returns the same message again and again, to fix that @zip-fa's solution could be used. Instead of commenting destructor in Acknowledger class we've written a redis connection decorator that just silences Acknowledger errors. Setting claim_interval higher than the worker time_limit also helps to reduce such occurrences (but not to eliminate them entirely, failed messages still have to be processed somehow).
  • There is another nasty issue that causes workers to stop working [Messenger][Redis] Worker stops handling messages on first empty message #48166. Do not use stream_max_entries, it will cause problems until the bug is fixed.

@lermontex
Copy link

@AdamKatzDev, maybe you can share a solution (Redis connection decorator)?

@AdamKatzDev
Copy link
Contributor

@lermontex, sure.

To anyone that tries to use this crutch, this is a part of the full solution, you need to add UUID to your messages to remove duplicates. The decorator allows to ack and delete messages that don't exist in the redis stream anymore (and you will have to do this, redis transport will try to feed your worker duplicate messages, i.e. the messages that are collected but not processed by the worker at the moment). Also take a look at my previous comment to check other issues that might affect you.

src/Messenger/Bridge/Redis/RedisTransportFactoryDecorator.php

<?php

declare(strict_types=1);

namespace App\Messenger\Bridge\Redis;

use Psr\Log\LoggerInterface;
use Symfony\Component\Messenger\Bridge\Redis\Transport\Connection;
use Symfony\Component\Messenger\Bridge\Redis\Transport\RedisTransport;
use Symfony\Component\Messenger\Transport\Serialization\SerializerInterface;
use Symfony\Component\Messenger\Transport\TransportFactoryInterface;
use Symfony\Component\Messenger\Transport\TransportInterface;

class RedisTransportFactoryDecorator implements TransportFactoryInterface
{
    private TransportFactoryInterface $decorated;
    private LoggerInterface $logger;

    public function __construct(TransportFactoryInterface $decorated, LoggerInterface $logger)
    {
        $this->decorated = $decorated;
        $this->logger    = $logger;
    }

    public function createTransport(string $dsn, array $options, SerializerInterface $serializer): TransportInterface
    {
        unset($options['transport_name']);

        return new RedisTransportDecorator(
            new RedisTransport(Connection::fromDsn($dsn, $options), $serializer),
            $this->logger
        );
    }

    public function supports(string $dsn, array $options): bool
    {
        return $this->decorated->supports($dsn, $options);
    }
}

src/Messenger/Bridge/Redis/RedisTransportFactoryDecorator.php

<?php

declare(strict_types=1);

namespace App\Messenger\Bridge\Redis;

use Psr\Log\LoggerInterface;
use Symfony\Component\Messenger\Envelope;
use Symfony\Component\Messenger\Exception\TransportException;
use Symfony\Component\Messenger\Transport\TransportInterface;

class RedisTransportDecorator implements TransportInterface
{
    private TransportInterface $decorated;
    private LoggerInterface $logger;

    public function __construct(TransportInterface $decorated, LoggerInterface $logger)
    {
        $this->decorated  = $decorated;
        $this->logger     = $logger;
    }

    public function get(): iterable
    {
        return $this->decorated->get();
    }

    public function ack(Envelope $envelope): void
    {
        try {
            $this->decorated->ack($envelope);
        } catch (TransportException $exception) {
            if (strpos($exception->getMessage(), 'Could not acknowledge') === false) {
                throw $exception;
            } else {
                $this->logger->error($exception->getMessage(), [
                    'exception' => $exception
                ]);
            }
        }
    }

    public function reject(Envelope $envelope): void
    {
        try {
            $this->decorated->reject($envelope);
        } catch (TransportException $exception) {
            if (strpos($exception->getMessage(), 'Could not delete message') === false) {
                throw $exception;
            } else {
                $this->logger->error($exception->getMessage(), [
                    'exception' => $exception
                ]);
            }
        }
    }

    public function send(Envelope $envelope): Envelope
    {
        return $this->decorated->send($envelope);
    }
}

config/services.yaml

    App\Messenger\Bridge\Redis\RedisTransportFactoryDecorator:
        decorates: 'messenger.transport.redis.factory'

@lermontex
Copy link

lermontex commented Jan 14, 2023

@AdamKatzDev, Thanks!

I tried it, but unfortunately, it didn't work for me.

you need to add UUID to your messages to remove duplicates.

Tell me, please, what do you mean? Is this mandatory if I don't send identical messages to the queue?

@AdamKatzDev
Copy link
Contributor

AdamKatzDev commented Jan 16, 2023

@lermontex
You have to add UUID to every message to be able to use Redis transport with batch handlers, take a look at @zip-fa's solution for the implementation details.
Basically Redis transport is not made with batch handlers in mind, that's why these ugly crutches are needed to make it work properly. The transport at some point tries to feed your handlers the same message that was already handled but not yet acked.
When that happens you can do the following:

  • Flush the current batch as soon as your handler is getting fed with duplicates (you need UUID to understand that the message is duplicate or some other way to know this).
  • Ack the duplicate messages until the batch is ready to be processed. Though this will cause the message to be deleted from the stream before the task is done. Again you need UUID for that or some other way to understand that you get the same messages.
  • Just collect duplicates until the batch is full, then dedupe the messages before processing them (and you might want to have UUIDs for that).

After that you need to ack every duplicate message or you'll get the The acknowledger was not called by the "%s" batch handler. error. But if you try to do this your worker will crash with Could not acknowledge/Could not delete error because the message was already deleted by another Acknowledger. To ignore these errors you need the Redis transport decorator that I've provided.

@lermontex
Copy link

@AdamKatzDev, thanks for your detailed answer!

@AdamKatzDev
Copy link
Contributor

AdamKatzDev commented Jan 20, 2023

While working on this #49028 PR I've realized that I can't fix the issue fully for batch handlers without fixing this issue first.
A solution that could be implemented is to store IDs of the messages that worker currently holds inside the Connection class .

Algorithm is the following:

  1. When a message is fetched its ID is stored inside Connection
  2. When Connection tries to fetch a pending message
    2.1. Ask XPENDING to return count(IDs) + constant. Makes sense to hard cap the number of messages with some arbitrary value like 100 to decrease the load on Redis.
    2.2. If there is a suitable message (forgotten AND not in the IDs store) then claim it if needed and fetch with XRANGE.
    2.3. If not then call XREADGROUP >
  3. When a message is acked then it is removed from the IDs store in Connection.

If (big if) it is guaranteed that a message is always acked/noacked or workers dies if not, then this should work. But introducing a state in Connection will complicate things a lot, could cause memory leaks or some weird behavior if badly implemented.

@carsonbot
Copy link

Hey, thanks for your report!
There has not been a lot of activity here for a while. Is this bug still relevant? Have you managed to find a workaround?

@AdamKatzDev
Copy link
Contributor

So, we've build our own transport instead. We've made some opinionated changes there to make it work as we needed, so it is not a perfect replacement for the Symfony implementation.

@carsonbot carsonbot removed the Stalled label Aug 7, 2023
@dkarlovi
Copy link
Contributor

dkarlovi commented Dec 29, 2023

To follow up here, this comment #49028 (comment) says

Redis 6.2.0 introduced IDLE parameter to XPENDING and added XAUTOCLAIM command, it looks like these can help to improve the algorithm significantly, fix the issue that I've tried to tackle here and make the transport usable for batch handlers by also partially fixing #44400.

@AdamKatzDev can you outline the rough idea(s) you had with these new features helping? How would you use the new commands and could this be upstreamed into Symfony? Thanks.

Edit: looking at your list of issues here: #44400 (comment) it seems some of them are fixed, some are probably still pending. Could we review it and create the missing issues which describe what we're trying to achieve and what steps need to be taken, how and why? IMO Redis is a popular choice for the Messenger and it makes sense we improve it if possible.

@AdamKatzDev
Copy link
Contributor

AdamKatzDev commented Jan 3, 2024

@dkarlovi

Messenger worker is not handling SIGINT properly, which leaves messages in batch unprocessed. Can be fixed by implementing a pcntl_signal handler. SIGTERM is handled properly in StopWorkerOnSigtermSignalListener.

Fixed in #49539

Even if SIGINT is processed correctly and worker is stopped gracefully there is a bug that prevent messages from flushing. When worker flushes a handler but shouldFlush is false then on shutdown worker will fail to flush the remaining messages due to the faulty guard clause at the beginning of the flush method.

Not fixed #49026, but there is a crutch #46869 (reply in thread)

In 'claim' mode the Redis transport falls into a loop that returns the same message again and again, to fix that @zip-fa's solution could be used. Instead of commenting destructor in Acknowledger class we've written a redis connection decorator that just silences Acknowledger errors. Setting claim_interval higher than the worker time_limit also helps to reduce such occurrences (but not to eliminate them entirely, failed messages still have to be processed somehow).

We currently use XAUTOCLAIM to claim messages instead of using XPENDING and XCLAIM combo. That simplifies code a lot.
Fetching pending messages is implemented using WeakMap, when WeakMap is empty, i.e. there are no messages in the worker currently, we just iterate all consumer pending messages using XREADGROUP starting from 0. That approach resolves #49023 and the main issue above which does not have a separate issue currently.

@AdamKatzDev
Copy link
Contributor

I could provide the transport implementation in a draft. But I am afraid I won't have time to make the complete PR myself.

@dkarlovi
Copy link
Contributor

dkarlovi commented Jan 3, 2024

Creating a draft would be a great start IMO because it gives people something actionable to look at and do. 👍

AdamKatzDev added a commit to AdamKatzDev/symfony that referenced this issue Jan 3, 2024
@AdamKatzDev
Copy link
Contributor

There is also another issue that can be confusing for someone who switch from AWS SQS, for example, and described in #51604 and in #44400 (comment).
Without unique consumer names with multiple workers the Could not acknowledge redis message error will happen even without batch handlers. Would be nice if consumer name generation could be done automatically.
Also note that ideally consumer groups should be garbage collected to prevent other issues https://stackoverflow.com/a/70335802.

@7underlines
Copy link

As far as I know this also affects Symfony 7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.