Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@JanPaulBeumer
Copy link

@JanPaulBeumer JanPaulBeumer commented Apr 13, 2025

Q A
Branch? 7.3
Bug fix? no
New feature? yes
Deprecations? no
Issues no
License MIT

i experienced alot of lock wait time and some deadlocks on my database when working with multiple workers handling the same queue. so i fixed it in my application with a decorater. i want to contribute back to symfony.

introduce a algorithm in Connection for mysql platforms to minimize exclusive locking

[TODO list]

  • update pr description
  • let tests pass
  • discuss

introduce a algorithm in `Connection` for mysql platforms to minimize exclusive locking
@carsonbot
Copy link

Hey!

To help keep things organized, we don't allow "Draft" pull requests. Could you please click the "ready for review" button or close this PR and open a new one when you are done?

Note that a pull request does not have to be "perfect" or "ready for merge" when you first open it. We just want it to be ready for a first review.

Cheers!

Carsonbot

$this->deleteDeliveredMessageForMySQLPlatform();
}
try {
$this->driverConnection->delete($this->configuration['table_name'], ['delivered_at' => '9999-12-31 23:59:59']);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this made an exclusive lock on more than a row or record level. this is problematic since it blocks all other processes


private function getMessageForMySQLPlatform(): ?array
{
$possibleIdsToClaim = $this->createAvailableMessagesQueryBuilder()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch only ids till a message is claimed to not load all the payloads unnecessary (they can be huge)

return null;
}

$messageData = $this->createQueryBuilder()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load the data only for the claimed message


$claimed = $this->driverConnection->createQueryBuilder()
->update($this->configuration['table_name'])
->set('delivered_at', ':now')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will invoke an exclusive lock on row/record level to ensure we wont have race conditions and multiple workers handling the same message.

either we can update the message, that means the message id has not been updated (delivered_at)
or we wont find the message to update and go on with the next message id we can try

$ids = $this->selectMessageIdsToDelete();
$this->driverConnection->createQueryBuilder()
->delete($this->configuration['table_name'])
->where('id IN (:ids)')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

row/record level exclusive lock to not interfere with the selecting part

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it work to replace this by a subquery instead of doing a roundtrip to get the ids?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id rather not use subqueries. they are very often a cause of performance flaws

@JanPaulBeumer JanPaulBeumer marked this pull request as ready for review April 13, 2025 07:54
@carsonbot carsonbot added this to the 7.3 milestone Apr 13, 2025
@carsonbot carsonbot changed the title [WIP] [Messenger][Doctrine][Transport] reduce lock time [Doctrine][Messenger] [WIP] [Transport] reduce lock time Apr 13, 2025
@JanPaulBeumer JanPaulBeumer changed the title [Doctrine][Messenger] [WIP] [Transport] reduce lock time [WIP] [Doctrine][Messenger][Transport] reduce lock time Apr 13, 2025
@JanPaulBeumer
Copy link
Author

maybe someone competent enough in other platforms like oracle or postgres can decide if it should be adapted there aswell 🤷‍♂️

@JanPaulBeumer JanPaulBeumer changed the title [WIP] [Doctrine][Messenger][Transport] reduce lock time [Doctrine][Messenger][Transport] reduce lock time Apr 13, 2025
@JanPaulBeumer
Copy link
Author

what do you think? shall we approach to merge this? then the tests must be fixed... should this be first controlled with a flag to have an opt in? to not possibly break anything?

@carsonbot carsonbot changed the title [Doctrine][Messenger][Transport] reduce lock time [Doctrine][Messenger] [Transport] reduce lock time Apr 14, 2025
}
}

if (!isset($claimedId)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the var should be declared before

Suggested change
if (!isset($claimedId)) {
if (null === $claimedId) {

$ids = $this->selectMessageIdsToDelete();
$this->driverConnection->createQueryBuilder()
->delete($this->configuration['table_name'])
->where('id IN (:ids)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it work to replace this by a subquery instead of doing a roundtrip to get the ids?

Types::STRING,
Types::STRING,
])
->setMaxResults(5_000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if there are more?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem. each time one message gets claimed 5k are being deleted.

$connection->get();
}

public static function providePlatformSql(): iterable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be nice to reduce the diff on this file, doable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh no. i changed the algorithm and the queries. i tried to, but that was the best i could do here on the unit test

@carsonbot carsonbot changed the title [Doctrine][Messenger] [Transport] reduce lock time [Messenger] [Transport] reduce lock time Apr 16, 2025
@nicolas-grekas nicolas-grekas changed the title [Messenger] [Transport] reduce lock time [Messenger] Reduce lock time when using MySQL for transport Apr 16, 2025
@symfony symfony deleted a comment from carsonbot Apr 16, 2025
}
}

if (!null === $claimedId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!null === $claimedId) {
if (null === $claimedId) {

Typo i gues


$claimedId = null;
foreach ($possibleIdsToClaim as $id) {
if (null === $claimedId = $this->claimMessage($id)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (null === $claimedId = $this->claimMessage($id)) {
if (null !== $claimedId = $this->claimMessage($id)) {

@fabpot fabpot modified the milestones: 7.3, 7.4 May 26, 2025
@JanPaulBeumer
Copy link
Author

hi, i sadly close this pr. as you might have seen i do not find the spare time to work on this. some private duties take all my time.

@psihius
Copy link

psihius commented Oct 5, 2025

@JanPaulBeumer I had same deadlocking issues, but I was able to drill down to the root cause of them and the correct way of solving them, and verified it running my solution in production on a fairly high traffic (over 3 million per day) queue - #61963

fabpot added a commit that referenced this pull request Dec 14, 2025
…at causes deadlocks (psihius)

This PR was merged into the 6.4 branch.

Discussion
----------

[Doctrine][Messenger] Remove old MySQL special handling that causes deadlocks

| Q             | A
| ------------- | ---
| Branch?       | 6.4
| Bug fix?      | yes
| New feature?  | no
| Deprecations? | no
| Issues        | [#47633](#47366), [#47366](#47366), [#57906](#57906) (and many others since closed), abandoned PR #60207 and so on
| License       | MIT

We run over 3 million queue items a day, we had run into major issues with current implementation deadlocking regularly, no amount of adjusting the purge threads and other settings did fix the root case - the messenger_messages table not having a proper covering index for the SELECT FOR UPDATE query.
Because MySQL implementation has been special cased to batch delete's by `delivered_at` having a special value, at least in MySQL 8.0.* and up (we run 8.0.42 and now running 8.4.6) this results in row range locks that basically lock the whole table due to delivered_at index being of extremely low cardinality, resulting in locking of all the rows that delivered_at is at null value.
Then UPDATE queries try to update delivered_at and delete is run by delivered_at condition, resulting in eventual deadlock.

At out scale this lead to deadlocks completelly overwhelming the server within an hour and hard-locking it to a point we had to `kill -9 <mysql pid>`, even running very agressive deadlock timeouts doesn't help.
Our machine for the database has plenty of resources and ram free, so it never was a CPU, RAM or I/O issue - server barelly uses over 15% of the CPU, innodb buffer is only 40% full so everything fits into memory. I/O never rose above 3%, mostly sitting bellow 1% (we have InnoDB io capacity set at 6000 baseline and 12000 peak, which is only a fraction of what the storage layer is capable of).

Adding covering index `delivered_at, id` does help to aliviate the onset of the issue, but still resulted in hard dealocks, just took about 14-16 hours under our workloads.
I was unable to find the original reasons why delete batching was added, but I suspect that's some MySQL 4/5 era schenanigans that are outdated and not true any more.

So this PR is what I have deployed 6 days ago to our production enviroment and it has been running trouble free since then without a single deadlock recorded against messenger compoment table. Collecting statistics also shows that this is the correct way to solve this, here are performance schema queries that show before and after:
I removed all batched handling and let MySQL run the same way all other databases do it, which works like a charm if we also add a proper index of `queue_name + avaiable_at + delivered_at + id` - this allows MySQL to lock only the specificly required row by it's primary id, removing all lock contention issues (the id field in the index is need, that's what gives index the cardinality to do the job right).

Before, notice average lock ms column, it is bad.
```
mysql> SELECT DIGEST_TEXT,
    ->        COUNT_STAR,
    ->        ROUND(SUM_TIMER_WAIT/1e12,3)  AS total_sec,
    ->        ROUND(SUM_LOCK_TIME/1e12,3)   AS lock_sec,
    ->        ROUND((SUM_LOCK_TIME/1e12)/NULLIF(COUNT_STAR,0)*1000,3) AS avg_lock_ms
    -> FROM performance_schema.events_statements_summary_by_digest
    -> WHERE DIGEST_TEXT LIKE '%MESSENGER_MESSAGES%'
    -> ORDER BY SUM_TIMER_WAIT DESC
    -> LIMIT 10;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+------------+------------+-------------+
| DIGEST_TEXT                                                                                                                                                                                                                                 | COUNT_STAR | total_sec  | lock_sec   | avg_lock_ms |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+------------+------------+-------------+
| DELETE FROM `messenger_messages` WHERE `delivered_at` = ?                                                                                                                                                                                   |    2699821 | 126790.694 | 120946.017 |      44.798 |
| UPDATE `messenger_messages` SET `delivered_at` = ? WHERE `id` = ?                                                                                                                                                                           |    3098328 |  43760.777 |  25541.015 |       8.243 |
| SELECT `m` . * FROM `messenger_messages` `m` WHERE ( `m` . `queue_name` = ? ) AND ( `m` . `delivered_at` IS NULL OR `m` . `delivered_at` < ? ) AND ( `m` . `available_at` <= ? ) ORDER BY `available_at` ASC LIMIT ? FOR UPDATE SKIP LOCKED |    2696084 |   4204.948 |      2.202 |       0.001 |
| INSERT INTO `messenger_messages` ( `body` , `headers` , `queue_name` , `created_at` , `available_at` ) VALUES (...)                                                                                                                         |    1552710 |   2470.059 |   1069.126 |       0.689 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+------------+------------+-------------+
```

After
```
mysql> SELECT DIGEST_TEXT,
    ->        COUNT_STAR,
    ->        ROUND(SUM_TIMER_WAIT/1e12,3)  AS total_sec,
    ->        ROUND(SUM_LOCK_TIME/1e12,3)   AS lock_sec,
    ->        ROUND((SUM_LOCK_TIME/1e12)/NULLIF(COUNT_STAR,0)*1000,3) AS avg_lock_ms
    -> FROM performance_schema.events_statements_summary_by_digest
    -> WHERE DIGEST_TEXT LIKE '%MESSENGER_MESSAGES%'
    -> ORDER BY SUM_TIMER_WAIT DESC
    -> LIMIT 10;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+-----------+----------+-------------+
| DIGEST_TEXT                                                                                                                                                                                                                                 | COUNT_STAR | total_sec | lock_sec | avg_lock_ms |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+-----------+----------+-------------+
| SELECT `m` . * FROM `messenger_messages` `m` WHERE ( `m` . `queue_name` = ? ) AND ( `m` . `delivered_at` IS NULL OR `m` . `delivered_at` < ? ) AND ( `m` . `available_at` <= ? ) ORDER BY `available_at` ASC LIMIT ? FOR UPDATE SKIP LOCKED |   19002450 | 29151.318 |   22.938 |       0.001 |
| DELETE FROM `messenger_messages` WHERE `id` = ?                                                                                                                                                                                             |   12677551 | 12511.529 |   66.584 |       0.005 |
| INSERT INTO `messenger_messages` ( `body` , `headers` , `queue_name` , `created_at` , `available_at` ) VALUES (...)                                                                                                                         |   12786292 |  2260.588 |   18.044 |       0.001 |
| UPDATE `messenger_messages` SET `delivered_at` = ? WHERE `id` = ?                                                                                                                                                                           |   12865570 |  1689.881 |    7.368 |       0.001 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+-----------+----------+-------------+
```

I imagine that the same covering index for the select query should have similar results for other databases, as this goes down to basics of indexing columns for database performance, but obviousuly some help with validating would be appriciated.

I also belive this should be backported all the way down to 6.4 branch, as this is an issue I have seen a lot of people running into and common advice being "just use RabbitMQ instead", while the root cause isn't investigated properly. I had the envrioment and authority to dig into root cause and this is the result of that investigation.

Commits
-------

81b9d93 [Messenger][Doctrine] Remove batched message delete for MySQL and add a covering index for a select query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants