Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: Requests stuck in backend queue for timeout duration when using SPOE filter with on-backend-http-request event (3.2.x regression) #3290

@tlovrencic-constructor

Description

@tlovrencic-constructor

HAProxy Version

  • Affected: 3.2.4
  • Not affected: 3.1.3

Summary

When using SPOE filter with on-backend-http-request event, some requests get stuck in the backend queue for exactly timeout queue duration (e.g., 20 seconds in test, 60 seconds in production), even though backend servers are completely idle. The bug is a regression in 3.2.x.

Test Results Comparison

HAProxy 3.1.3 - NO BUG

Test: 20 rounds × 200 requests = 4000 total requests
Stuck requests: 0
Max Tw (queue wait): 245ms

HAProxy 3.2.4 - BUG PRESENT

Test: 20 rounds × 200 requests = 4000 total requests
Stuck requests: 26
Max Tw (queue wait): 20021ms (exactly timeout queue!)

Log Evidence

Stuck Request Logs (HAProxy 3.2.4)

140.82.114.3 [25/Feb/2026:13:16:06.163] http_in backend/backend_2 2/20007/0/51/20060 200 459 1/1/0/0/0 0/6 spoe_err=- spoe_proc=2ms spoe_total=2ms
140.82.114.3 [25/Feb/2026:13:16:26.639] http_in backend/backend_2 5/20006/2/69/20082 200 460 2/2/0/0/0 0/1 spoe_err=- spoe_proc=5ms spoe_total=5ms
140.82.114.3 [25/Feb/2026:13:16:47.500] http_in backend/backend_1 5/20007/0/56/20068 200 460 4/4/3/3/0 0/1 spoe_err=- spoe_proc=5ms spoe_total=5ms
140.82.114.3 [25/Feb/2026:13:16:47.492] http_in backend/backend_1 3/20016/1/56/20076 200 460 4/4/2/2/0 0/2 spoe_err=- spoe_proc=3ms spoe_total=3ms
140.82.114.3 [25/Feb/2026:13:17:08.025] http_in backend/backend_1 4/20013/1/53/20071 200 459 2/2/1/0/0 0/2 spoe_err=- spoe_proc=4ms spoe_total=4ms
140.82.114.3 [25/Feb/2026:13:17:28.529] http_in backend/backend_2 6/20001/1/54/20062 200 461 2/2/1/1/0 0/1 spoe_err=- spoe_proc=6ms spoe_total=6ms

Log Format Explanation

Tq/Tw/Tc/Tr/Ta = 2/20007/0/51/20060
│   │     │  │   │
│   │     │  │   └─ Ta=20060ms total time
│   │     │  └───── Tr=51ms backend response (FAST - server was idle!)
│   │     └──────── Tc=0ms connection time (instant)
│   └────────────── Tw=20007ms QUEUE WAIT TIME (BUG - 20 second timeout!)
└────────────────── Tq=2ms time before SPOE

Queue field: 0/6 means 6 requests stuck in backend queue
Connections: 1/1/0/0/0 means only 1 active connection (servers were idle!)
spoe_err=- means NO SPOE error (SPOE completed successfully)

Key Observations From Logs

Field Stuck Request Value What It Proves
Tw 20001-20021ms Request waited EXACTLY timeout queue duration
Tr 51-69ms Backend responded instantly once served (was idle!)
spoe_err - (no error) SPOE completed successfully, not a timeout
spoe_proc 1-6ms SPOE was fast
Connections 1/1/0/0/0 Only 1 connection when finally served
Queue 0/1 to 0/6 Requests were stuck in backend queue

Normal Request Logs (HAProxy 3.1.3 - for comparison)

140.82.114.3 [25/Feb/2026:13:13:30.242] http_in backend/backend_1 11/181/0/52/244 200 460 40/40/35/5/0 0/50 spoe_err=1 spoe_proc=11ms spoe_total=11ms
140.82.114.3 [25/Feb/2026:13:13:30.334] http_in backend/backend_1 11/222/0/52/285 200 461 3/3/2/1/0 0/60 spoe_err=1 spoe_proc=11ms spoe_total=11ms
  • Max Tw = 245ms (normal queue time during burst)
  • No requests stuck for timeout duration

Reproduction

Configuration Files

haproxy.cfg:

global
  daemon
  maxconn 10000

defaults
  mode http
  timeout connect 5s
  timeout client 91s
  timeout server 91s

frontend http_in
  bind :8080
  default_backend backend

backend backend
  balance leastconn
  timeout queue 20s

  filter spoe engine rate-limiting config /usr/local/etc/haproxy/spoe.conf

  default-server maxconn 7
  server backend_1 127.0.0.1:7070 send-proxy
  server backend_2 127.0.0.1:7070 send-proxy

spoe.conf:

[rate-limiting]
spoe-agent rate-limiting-agent
    messages rate-limiting-message
    timeout processing 10ms
    use-backend spoe-backend

spoe-message rate-limiting-message
    args path=path
    event on-backend-http-request

SPOA Setup

Any SPOA that takes variable time (1-20ms) to process. Key is that some requests complete SPOE faster than others.

Test Script

#!/bin/bash
for round in $(seq 1 20); do
    echo -n "Round $round: "
    for j in $(seq 1 200); do
        curl -s -o /dev/null "http://localhost:8080/?key=test-$round-$j" \
            -H "X-Forwarded-For: 10.0.$((round % 255)).$((j % 255))" &
    done
    wait
    sleep 0.2
    stuck=$(docker logs rate-lim-hap 2>&1 | grep -cE "/200[0-9]{2}/")
    echo "$stuck stuck"
done
sleep 25
echo "=== Stuck requests ==="
docker logs rate-lim-hap 2>&1 | grep -E "/200[0-9]{2}/"

Suspected Cause (soruce: claude)

The bug appears related to the ready_srv mechanism introduced in commit cda7275ef5 ("MEDIUM: queue: Handle the race condition between queue and dequeue differently").

Race Condition

  1. Burst of requests arrives
  2. Server slots fill, requests enter SPOE processing then queue
  3. As servers complete, they check queue and dequeue requests
  4. Server finishes, finds queue "empty" (late requests still in SPOE)
  5. Server marks itself as ready_srv
  6. Late request finishes SPOE, enters queue
  7. ready_srv already claimed or cleared by another request
  8. New requests go DIRECT to idle servers (bypass queue check)
  9. Stuck request waits for timeout queue (20s/60s)

The ready_srv optimization assumes requests enter queue synchronously, but SPOE's on-backend-http-request event creates asynchronous delays.

Environment

  • OS: Alpine Linux (Docker), production on AWS ECS
  • Use case: SPOE filter for rate limiting
  • Load pattern: Batch jobs sending 20-200 concurrent requests

Additional Files

Full Docker reproduction setup available at: [can provide GitHub repo if needed]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions