Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Classic queue message store GC cannot keep pace under high throughput after 0278980ba0 #16141

@lukebakken

Description

@lukebakken

Summary

Commit 0278980 (PR #13959, "CQ shared store: Delete from index on remove or roll over") introduced a regression in the classic queue message store GC that causes unbounded disk growth under sustained publish load when a slow-consumer queue shares the same vhost as high-throughput queues.

The regression is present in the current main branch. Reverting 0278980 restores stable disk behavior.

Root Cause

PR #13959 changed scan_and_vacuum_message_file in delete_file to an eager index cleanup mechanism. As a side effect, messages removed from non-current files now produce not_found index lookups during scan_and_vacuum_message_file instead of previously_valid ones. This was noted in the PR review by @gomoripeti:

if I see it correctly compaction might become slower with this change (as during scan_file_for_valid_messages before this change there were ref-count=0 index entries which resulted in a previously_valid status while now not_found entries result in invalid status, and causing a scan_next_byte scanning mode)

Under high throughput with many queues, the GC compaction rate drops far enough that it cannot keep pace with the publish rate. Files accumulate faster than they are reclaimed, and disk usage can grow without bound. The GC stall also causes consumer latency spikes and broker unresponsiveness on established TCP connections.

Reproduction

Three concurrent workloads on a single RabbitMQ node (m7g.large, 196 GB EBS):

  • main-workload: 100 classic queues, 100 producers + 100 consumers, 120 KB messages, 5 msg/s per producer (500 msg/s aggregate), consumers acking immediately
  • slow-ack-publisher: 1 producer, 3 msg/s to slow-ack-queue, 120 KB messages
  • slow-ack-consumer: consumer on slow-ack-queue holding acks for 1-30 minutes (up to 1000 messages in flight simultaneously)

All queues in the same vhost with queue-version: 2 policy.

Reproduction scripts: https://github.com/lukebakken/rmq-gc-lag

Observed Behavior

With 0278980 present (main):

Disk declined ~3.1 GB in 6 minutes during the baseline phase (200 msg/s), then briefly recovered when the spike phase began (500 msg/s), then resumed declining. Over ~100 minutes of monitoring, disk fell from 185.4 GB to ~169 GB - a loss of ~16 GB. Ready messages grew from 0 to 3500-4200 as the broker fell behind on delivery, and unacked messages accumulated to 3500+ across multiple reconnect cycles.

The GC stall also causes consumer latency spikes. The broker stopped sending data on an established TCP connection long enough to trigger a client-side socket read timeout:

[AMQP Connection 10.0.1.90:5672] ERROR - An unexpected connection driver error occurred
java.net.SocketTimeoutException: Read timed out

Consumer latency at time of socket read timeout:

min/median/75th/95th/99th/max consumer latency:
64886 / 1,511,958 / 6,254,378 / 46,742,920 / 54,532,926 / 568,205,100 µs

(median 1.5s, 95th 46s, 99th 54s, max 568s)

Grafana dashboard showing continuous disk decline on unpatched main

With 0278980 reverted (branch lukebakken/cq-gc):

Disk stable in a 0.5 GB oscillation band (184.96-185.47 GB) throughout three consecutive 20-minute monitoring windows (60 minutes total) under the same workload at 500 msg/s with ~1000 unacked messages. Ready messages held at 0 throughout. No latency spikes, no broker unresponsiveness.

Grafana dashboard showing stable disk on patched lukebakken/cq-gc branch

With v3.13.7 (pre-regression, pre-refactor):

Disk stable throughout a 23-minute run under the same workload. Note: v3.13.7 predates the major rabbit_msg_store refactor that introduced the shared store architecture, so this data point establishes a pre-regression baseline but is not directly comparable to main.

Grafana dashboard showing stable disk on v3.13.7

Fix

Revert 0278980. Three independent improvements from that commit can be retained safely:

  • compact_file/2 early-exit guard (file already deleted)
  • prioritise_cast/3 in rabbit_msg_store_gc (delete requests before compaction)
  • index_update_fields assertion relaxed (true= to _=)

See branch lukebakken/cq-gc on https://github.com/lukebakken/rmq-rabbitmq-server for the revert with retained improvements.

Workaround

Move queues with long consumer timeouts to a dedicated vhost. This gives them a separate message store instance whose unacked messages do not pin files in the shared store. Confirmed effective: disk stable throughout a 40-minute run with the same workload after vhost isolation.

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions