-
Notifications
You must be signed in to change notification settings - Fork 907
Description
HAProxy Version
- Affected: 3.2.4
- Not affected: 3.1.3
Summary
When using SPOE filter with on-backend-http-request event, some requests get stuck in the backend queue for exactly timeout queue duration (e.g., 20 seconds in test, 60 seconds in production), even though backend servers are completely idle. The bug is a regression in 3.2.x.
Test Results Comparison
HAProxy 3.1.3 - NO BUG
Test: 20 rounds × 200 requests = 4000 total requests
Stuck requests: 0
Max Tw (queue wait): 245ms
HAProxy 3.2.4 - BUG PRESENT
Test: 20 rounds × 200 requests = 4000 total requests
Stuck requests: 26
Max Tw (queue wait): 20021ms (exactly timeout queue!)
Log Evidence
Stuck Request Logs (HAProxy 3.2.4)
140.82.114.3 [25/Feb/2026:13:16:06.163] http_in backend/backend_2 2/20007/0/51/20060 200 459 1/1/0/0/0 0/6 spoe_err=- spoe_proc=2ms spoe_total=2ms
140.82.114.3 [25/Feb/2026:13:16:26.639] http_in backend/backend_2 5/20006/2/69/20082 200 460 2/2/0/0/0 0/1 spoe_err=- spoe_proc=5ms spoe_total=5ms
140.82.114.3 [25/Feb/2026:13:16:47.500] http_in backend/backend_1 5/20007/0/56/20068 200 460 4/4/3/3/0 0/1 spoe_err=- spoe_proc=5ms spoe_total=5ms
140.82.114.3 [25/Feb/2026:13:16:47.492] http_in backend/backend_1 3/20016/1/56/20076 200 460 4/4/2/2/0 0/2 spoe_err=- spoe_proc=3ms spoe_total=3ms
140.82.114.3 [25/Feb/2026:13:17:08.025] http_in backend/backend_1 4/20013/1/53/20071 200 459 2/2/1/0/0 0/2 spoe_err=- spoe_proc=4ms spoe_total=4ms
140.82.114.3 [25/Feb/2026:13:17:28.529] http_in backend/backend_2 6/20001/1/54/20062 200 461 2/2/1/1/0 0/1 spoe_err=- spoe_proc=6ms spoe_total=6ms
Log Format Explanation
Tq/Tw/Tc/Tr/Ta = 2/20007/0/51/20060
│ │ │ │ │
│ │ │ │ └─ Ta=20060ms total time
│ │ │ └───── Tr=51ms backend response (FAST - server was idle!)
│ │ └──────── Tc=0ms connection time (instant)
│ └────────────── Tw=20007ms QUEUE WAIT TIME (BUG - 20 second timeout!)
└────────────────── Tq=2ms time before SPOE
Queue field: 0/6 means 6 requests stuck in backend queue
Connections: 1/1/0/0/0 means only 1 active connection (servers were idle!)
spoe_err=- means NO SPOE error (SPOE completed successfully)
Key Observations From Logs
| Field | Stuck Request Value | What It Proves |
|---|---|---|
| Tw | 20001-20021ms | Request waited EXACTLY timeout queue duration |
| Tr | 51-69ms | Backend responded instantly once served (was idle!) |
| spoe_err | - (no error) |
SPOE completed successfully, not a timeout |
| spoe_proc | 1-6ms | SPOE was fast |
| Connections | 1/1/0/0/0 | Only 1 connection when finally served |
| Queue | 0/1 to 0/6 | Requests were stuck in backend queue |
Normal Request Logs (HAProxy 3.1.3 - for comparison)
140.82.114.3 [25/Feb/2026:13:13:30.242] http_in backend/backend_1 11/181/0/52/244 200 460 40/40/35/5/0 0/50 spoe_err=1 spoe_proc=11ms spoe_total=11ms
140.82.114.3 [25/Feb/2026:13:13:30.334] http_in backend/backend_1 11/222/0/52/285 200 461 3/3/2/1/0 0/60 spoe_err=1 spoe_proc=11ms spoe_total=11ms
- Max Tw = 245ms (normal queue time during burst)
- No requests stuck for timeout duration
Reproduction
Configuration Files
haproxy.cfg:
global
daemon
maxconn 10000
defaults
mode http
timeout connect 5s
timeout client 91s
timeout server 91s
frontend http_in
bind :8080
default_backend backend
backend backend
balance leastconn
timeout queue 20s
filter spoe engine rate-limiting config /usr/local/etc/haproxy/spoe.conf
default-server maxconn 7
server backend_1 127.0.0.1:7070 send-proxy
server backend_2 127.0.0.1:7070 send-proxy
spoe.conf:
[rate-limiting]
spoe-agent rate-limiting-agent
messages rate-limiting-message
timeout processing 10ms
use-backend spoe-backend
spoe-message rate-limiting-message
args path=path
event on-backend-http-request
SPOA Setup
Any SPOA that takes variable time (1-20ms) to process. Key is that some requests complete SPOE faster than others.
Test Script
#!/bin/bash
for round in $(seq 1 20); do
echo -n "Round $round: "
for j in $(seq 1 200); do
curl -s -o /dev/null "http://localhost:8080/?key=test-$round-$j" \
-H "X-Forwarded-For: 10.0.$((round % 255)).$((j % 255))" &
done
wait
sleep 0.2
stuck=$(docker logs rate-lim-hap 2>&1 | grep -cE "/200[0-9]{2}/")
echo "$stuck stuck"
done
sleep 25
echo "=== Stuck requests ==="
docker logs rate-lim-hap 2>&1 | grep -E "/200[0-9]{2}/"Suspected Cause (soruce: claude)
The bug appears related to the ready_srv mechanism introduced in commit cda7275ef5 ("MEDIUM: queue: Handle the race condition between queue and dequeue differently").
Race Condition
- Burst of requests arrives
- Server slots fill, requests enter SPOE processing then queue
- As servers complete, they check queue and dequeue requests
- Server finishes, finds queue "empty" (late requests still in SPOE)
- Server marks itself as
ready_srv - Late request finishes SPOE, enters queue
ready_srvalready claimed or cleared by another request- New requests go DIRECT to idle servers (bypass queue check)
- Stuck request waits for
timeout queue(20s/60s)
The ready_srv optimization assumes requests enter queue synchronously, but SPOE's on-backend-http-request event creates asynchronous delays.
Environment
- OS: Alpine Linux (Docker), production on AWS ECS
- Use case: SPOE filter for rate limiting
- Load pattern: Batch jobs sending 20-200 concurrent requests
Additional Files
Full Docker reproduction setup available at: [can provide GitHub repo if needed]