-
Notifications
You must be signed in to change notification settings - Fork 907
Description
Detailed Description of the Problem
cli_io_handler_clear_map() in src/map.c calls trim_all_pools() synchronously after pat_ref_purge_range() completes. This triggers thread_isolate() → malloc_trim(0) → madvise(), which blocks all threads for an unbounded duration proportional to RSS size. Under high connection load (DDoS mitigation in our case), this stall exceeds the watchdog kill threshold and the process is terminated.
The issue is a design inconsistency: pat_ref_purge_range() is carefully batched (100 entries per call with yield), but the subsequent trim_all_pools() call blocks indefinitely with no yield mechanism, negating the batching design.
Call chain:
commit map @<ver> <map>
→ cli_parse_commit_map()
→ cli_io_handler_clear_map() # src/map.c:1030
→ pat_ref_purge_range(..., 100) # batched, yields correctly
→ trim_all_pools() # src/map.c:1045 ← synchronous, no yield
→ thread_isolate() # src/pool.c:151 ← blocks ALL threads
→ malloc_trim(0) # src/pool.c:153
→ madvise() # ★ kernel blocks on large RSS
The relevant code in src/map.c:1030-1047:
static int cli_io_handler_clear_map(struct appctx *appctx)
{
struct show_map_ctx *ctx = appctx->svcctx;
int finished;
HA_RWLOCK_WRLOCK(PATREF_LOCK, &ctx->ref->lock);
finished = pat_ref_purge_range(ctx->ref, ctx->curr_gen, ctx->prev_gen, 100);
HA_RWLOCK_WRUNLOCK(PATREF_LOCK, &ctx->ref->lock);
if (!finished) {
applet_have_more_data(appctx);
return 0; /* yield - come back later */
}
trim_all_pools(); /* ← blocks indefinitely with thread isolation */
return 1;
}And trim_all_pools() in src/pool.c:146-157:
void trim_all_pools(void)
{
int isolated = thread_isolated();
if (!isolated)
thread_isolate(); /* caller is not isolated → acquires isolation here */
malloc_trim(0); /* glibc → madvise() → blocks on large memory */
if (!isolated)
thread_release();
}Since cli_io_handler_clear_map() is not in isolated context when it calls trim_all_pools(), the function acquires thread isolation internally, blocking all other threads while madvise() walks potentially gigabytes of page tables.
Expected Behavior
commit map via CLI should complete without stalling all threads, even under high memory usage. The map purge operation already yields correctly; the trim operation should not undo that by blocking indefinitely.
Steps to Reproduce the Behavior
- Configure HAProxy with a
map_reg(regex-based map, e.g. WAF rules) with many entries - Run HAProxy under sustained high connection load (thousands of concurrent connections, multiple GB RSS)
- Update the map at runtime using the standard CLI atomic map update sequence:
# Step 1: Prepare a new map version prepare map /etc/haproxy/maps/reg_list.map # Returns: New map version is @<ver> # Step 2: Add entries to the new version add map @<ver> /etc/haproxy/maps/reg_list.map <key> <value> # (repeated for each entry) # Step 3: Commit the new version ← triggers the bug commit map @<ver> /etc/haproxy/maps/reg_list.map - The
commit mapcommand invokescli_io_handler_clear_map(), which purges old generation entries viapat_ref_purge_range()in batches of 100 (works fine) - After purge completes,
trim_all_pools()is called synchronously - Thread acquires isolation →
malloc_trim(0)→madvise()blocks for seconds under high RSS - Watchdog detects stuck thread → emits warning → kills process with
SIGABRT
Do you have any idea what may have caused this?
The root cause is trim_all_pools() being called from a CLI I/O handler context (cli_io_handler_clear_map()). This is problematic because:
-
Thread isolation in hot path:
trim_all_pools()callsthread_isolate()since the CLI handler is not already isolated. This blocks every other thread from making progress. -
Unbounded
madvise()duration:malloc_trim(0)callsmadvise(MADV_DONTNEED)in glibc, which walks page tables. On systems with multiple GB of RSS (common under DDoS), this takes seconds, not milliseconds. -
Design inconsistency:
pat_ref_purge_range()carefully limits work to 100 entries and yields, buttrim_all_pools()at the end has no such safeguard.
The only two call sites for trim_all_pools() in the codebase are:
| Call site | Context | Risk |
|---|---|---|
map.c:cli_io_handler_clear_map() |
CLI I/O handler (during traffic) | High — acquires isolation in hot path |
pool.c:pool_gc() |
Periodic GC task (already isolated at entry) | Low — runs during idle time, already isolated |
Do you have an idea how to solve the issue?
Remove trim_all_pools() from cli_io_handler_clear_map():
static int cli_io_handler_clear_map(struct appctx *appctx)
{
struct show_map_ctx *ctx = appctx->svcctx;
int finished;
HA_RWLOCK_WRLOCK(PATREF_LOCK, &ctx->ref->lock);
finished = pat_ref_purge_range(ctx->ref, ctx->curr_gen, ctx->prev_gen, 100);
HA_RWLOCK_WRUNLOCK(PATREF_LOCK, &ctx->ref->lock);
if (!finished) {
applet_have_more_data(appctx);
return 0;
}
- trim_all_pools();
return 1;
}Why this is safe:
- No memory leak: Freed map entries are returned to HAProxy's pool allocator free lists and reused for subsequent allocations.
- OS memory reclaim still works:
pool_gc()(src/pool.c:893) periodically callstrim_all_pools()during idle time, returning memory to the OS through the normal GC path. - More targeted than
no-memory-trimming: The globalno-memory-trimmingoption disablesmalloc_trim()for all callers includingpool_gc(). Removing the call frommap.conly eliminates the problematic hot-path invocation while preserving normal memory management.
What is your configuration?
# Relevant global settings
global
nbthread 38
# no-memory-trimming ← not set at time of incident, now applied as workaround
# WAF regex map used in frontend (simplified)
frontend ft_https
bind :443 ssl ...
# regex map for WAF pattern matching, updated at runtime via CLI
acl waf_match req.hdr(host),map_reg(/etc/haproxy/maps/reg_list.map) -m found
http-request deny if waf_match
The map is updated at runtime using the atomic map update sequence via the stats socket:
# Automated update sequence (via unix socket):
prepare map /etc/haproxy/maps/reg_list.map
add map @<ver> /etc/haproxy/maps/reg_list.map <pattern> <value>
# ... (repeated for all entries)
commit map @<ver> /etc/haproxy/maps/reg_list.map ← triggers the bugOutput of haproxy -vv
HAProxy version 3.2.11-awslc_v3 2026/01/29 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2030.
Known bugs: http://www.haproxy.org/bugs/bugs-3.2.11.html
Running on: Linux 5.14.0-570.42.2.el9_6.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 26 16:42:42 KST 2025 x86_64
Build options :
TARGET = linux-glibc
CC = cc
CFLAGS = -O2 -g -Og -march=native -fwrapv -fvect-cost-model=very-cheap -DTLS_TICKETS_NO=4
OPTIONS = USE_OPENSSL_AWSLC=1 USE_LUA=1 USE_ZLIB=1 USE_QUIC=1 USE_STATIC_PCRE2=1 USE_PCRE2=1 USE_PCRE2_JIT=1
DEBUG =
Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL +OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION +QUIC -QUIC_OPENSSL_COMPAT +RT -SLZ +SSL -STATIC_PCRE +STATIC_PCRE2 +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL +ZLIB +ACME
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_TGROUPS=32, MAX_THREADS=1024, default=8).
Built with SSL library version : OpenSSL 1.1.1 (compatible; AWS-LC 1.66.0)
Running on SSL library version : AWS-LC 1.66.0
SSL library supports TLS extensions : yes
SSL library supports SNI : yes
SSL library FIPS mode : no
SSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
QUIC: connection socket-owner mode support : yes
QUIC: GSO emission support : yes
Built with Lua version : Lua 5.3.5
Built with network namespace support.
Built with Naver SSL Client Hello request capture. version: RB-1.1.2:65646
Built with Naver nvextauth (NID cookie decrypt) module. version: 1.0.0
Built with zlib version : 1.2.12
Running on zlib version : 1.2.12
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.43 2024-02-16
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.5.0 20240719 (Red Hat 11.5.0-5)
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
quic : mode=HTTP side=FE mux=QUIC flags=HTX|NO_UPG|FRAMED
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|HOL_RISK|NO_UPG
<default> : mode=HTTP side=FE|BE mux=H1 flags=HTX
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
<default> : mode=SPOP side=BE mux=SPOP flags=HOL_RISK|NO_UPG
spop : mode=SPOP side=BE mux=SPOP flags=HOL_RISK|NO_UPG
<default> : mode=TCP side=FE|BE mux=PASS flags=
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
Available services : none
Available filters :
[BWLIM] bwlim-in
[BWLIM] bwlim-out
[CACHE] cache
[COMP] compression
[FCGI] fcgi-app
[SPOE] spoe
[TRACE] trace
Last Outputs and Backtraces
Feb 26 18:34:28 server haproxy[81506]: WARNING! thread 38 has stopped processing traffic for 149 milliseconds
Feb 26 18:34:28 server haproxy[81506]: with 21 streams currently blocked, prevented from making any progress.
Feb 26 18:34:28 server haproxy[81506]: While this may occasionally happen with inefficient configurations
Feb 26 18:34:28 server haproxy[81506]: involving excess of regular expressions, map_reg, or heavy Lua processing,
Feb 26 18:34:28 server haproxy[81506]: this must remain exceptional because the system's stability is now at risk.
Feb 26 18:34:28 server haproxy[81506]: Timers in logs may be reported incorrectly, spurious timeouts may happen,
Feb 26 18:34:28 server haproxy[81506]: some incoming connections may silently be dropped, health checks may
Feb 26 18:34:28 server haproxy[81506]: randomly fail, and accesses to the CLI may block the whole process. The
Feb 26 18:34:28 server haproxy[81506]: blocking delay before emitting this warning may be adjusted via the global
Feb 26 18:34:28 server haproxy[81506]: 'warn-blocked-traffic-after' directive. Please check the trace below for
Feb 26 18:34:28 server haproxy[81506]: any clues about configuration elements that need to be corrected:
Feb 26 18:34:28 server haproxy[81506]: * Thread 38: id=0x7f0b7c7f8700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=1 rqsz=1
Feb 26 18:34:28 server haproxy[81506]: 1/38 loops=569272150 ctxsw=2669841903 stuck=0 prof=0 harmless=0 isolated=1
Feb 26 18:34:28 server haproxy[81506]: cpu_ns: poll=43646202684161 now=43646351977145 diff=149292984
Feb 26 18:34:28 server haproxy[81506]: curr_task=0x7f0c29f590b0 (task) calls=2 last=0
Feb 26 18:34:28 server haproxy[81506]: fct=0x695c28(task_process_applet) ctx=0x7f0b14fbf8a0
Feb 26 18:34:28 server haproxy[81506]: lock_hist: S:IDLE_CONNS U:IDLE_CONNS R:TASK_WQ U:TASK_WQ W:PATREF W:PATEXP U:PATEXP U:PATREF
Feb 26 18:34:28 server haproxy[81506]: call trace(18):
Feb 26 18:34:28 server haproxy[81506]: | 0x67240a: ha_stuck_warning+0x12b/0x1de > ha_thread_dump_one
Feb 26 18:34:28 server haproxy[81506]: | 0x777e54: wdt_handler+0x198/0x1b4 > ha_stuck_warning
Feb 26 18:34:28 server haproxy[81506]: | 0x7f0c2ef79cf0: libpthread:+0x12cf0
Feb 26 18:34:28 server haproxy[81506]: | 0x7f0c2ebdaa4b: libc:madvise+0xb/0x25
Feb 26 18:34:28 server haproxy[81506]: | 0x7f0c2ec3ec86: libc:malloc_trim+0x146/0x2d0 > libc:madvise
Feb 26 18:34:28 server haproxy[81506]: | 0x6ed5cf: malloc_trim+0xfa/0x108
Feb 26 18:34:28 server haproxy[81506]: | 0x6ed604: trim_all_pools+0x27/0x3f > malloc_trim
Feb 26 18:34:28 server haproxy[81506]: | 0x7267f2: process_runnable_tasks+0x250b > trim_all_pools
Feb 26 18:34:28 server haproxy[81506]: | 0x630e9c: cli_io_handler+0x50b/0x924
Feb 26 18:34:28 server haproxy[81506]: | 0x696ae0: task_process_applet+0xeb8/0x1405
Feb 26 18:34:28 server haproxy[81506]: ### Note: one thread was found stuck under malloc_trim(), which can run for a
Feb 26 18:34:28 server haproxy[81506]: very long time on large memory systems. You may want to disable this
Feb 26 18:34:28 server haproxy[81506]: memory reclaiming feature by setting 'no-memory-trimming' in the
Feb 26 18:34:28 server haproxy[81506]: 'global' section of your configuration to avoid this in the future.
Additional Information
Current workaround: We have applied no-memory-trimming in the global section, which prevents the issue. However, this disables all memory trimming globally (including the periodic pool_gc() path), which may cause RSS to remain elevated long-term.
Impact: In our environment, this caused a full process restart during active DDoS mitigation, dropping all existing connections. We use the atomic map update sequence (prepare map → add map → commit map) via the stats socket to update WAF regex maps at runtime. The combination of high RSS (from DDoS connection volume) and commit map on a regex map creates a reliable reproduction path.