Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: trim_all_pools() in cli_io_handler_clear_map() causes watchdog kill under high load #3292

@yokim-git

Description

@yokim-git

Detailed Description of the Problem

cli_io_handler_clear_map() in src/map.c calls trim_all_pools() synchronously after pat_ref_purge_range() completes. This triggers thread_isolate()malloc_trim(0)madvise(), which blocks all threads for an unbounded duration proportional to RSS size. Under high connection load (DDoS mitigation in our case), this stall exceeds the watchdog kill threshold and the process is terminated.

The issue is a design inconsistency: pat_ref_purge_range() is carefully batched (100 entries per call with yield), but the subsequent trim_all_pools() call blocks indefinitely with no yield mechanism, negating the batching design.

Call chain:

commit map @<ver> <map>
  → cli_parse_commit_map()
  → cli_io_handler_clear_map()        # src/map.c:1030
    → pat_ref_purge_range(..., 100)    # batched, yields correctly
    → trim_all_pools()                 # src/map.c:1045 ← synchronous, no yield
      → thread_isolate()               # src/pool.c:151 ← blocks ALL threads
      → malloc_trim(0)                 # src/pool.c:153
        → madvise()                    # ★ kernel blocks on large RSS

The relevant code in src/map.c:1030-1047:

static int cli_io_handler_clear_map(struct appctx *appctx)
{
    struct show_map_ctx *ctx = appctx->svcctx;
    int finished;

    HA_RWLOCK_WRLOCK(PATREF_LOCK, &ctx->ref->lock);
    finished = pat_ref_purge_range(ctx->ref, ctx->curr_gen, ctx->prev_gen, 100);
    HA_RWLOCK_WRUNLOCK(PATREF_LOCK, &ctx->ref->lock);

    if (!finished) {
        applet_have_more_data(appctx);
        return 0;       /* yield - come back later */
    }

    trim_all_pools();   /* ← blocks indefinitely with thread isolation */
    return 1;
}

And trim_all_pools() in src/pool.c:146-157:

void trim_all_pools(void)
{
    int isolated = thread_isolated();

    if (!isolated)
        thread_isolate();   /* caller is not isolated → acquires isolation here */

    malloc_trim(0);         /* glibc → madvise() → blocks on large memory */

    if (!isolated)
        thread_release();
}

Since cli_io_handler_clear_map() is not in isolated context when it calls trim_all_pools(), the function acquires thread isolation internally, blocking all other threads while madvise() walks potentially gigabytes of page tables.

Expected Behavior

commit map via CLI should complete without stalling all threads, even under high memory usage. The map purge operation already yields correctly; the trim operation should not undo that by blocking indefinitely.

Steps to Reproduce the Behavior

  1. Configure HAProxy with a map_reg (regex-based map, e.g. WAF rules) with many entries
  2. Run HAProxy under sustained high connection load (thousands of concurrent connections, multiple GB RSS)
  3. Update the map at runtime using the standard CLI atomic map update sequence:
    # Step 1: Prepare a new map version
    prepare map /etc/haproxy/maps/reg_list.map
    # Returns: New map version is @<ver>
    
    # Step 2: Add entries to the new version
    add map @<ver> /etc/haproxy/maps/reg_list.map <key> <value>
    # (repeated for each entry)
    
    # Step 3: Commit the new version ← triggers the bug
    commit map @<ver> /etc/haproxy/maps/reg_list.map
    
  4. The commit map command invokes cli_io_handler_clear_map(), which purges old generation entries via pat_ref_purge_range() in batches of 100 (works fine)
  5. After purge completes, trim_all_pools() is called synchronously
  6. Thread acquires isolation → malloc_trim(0)madvise() blocks for seconds under high RSS
  7. Watchdog detects stuck thread → emits warning → kills process with SIGABRT

Do you have any idea what may have caused this?

The root cause is trim_all_pools() being called from a CLI I/O handler context (cli_io_handler_clear_map()). This is problematic because:

  1. Thread isolation in hot path: trim_all_pools() calls thread_isolate() since the CLI handler is not already isolated. This blocks every other thread from making progress.

  2. Unbounded madvise() duration: malloc_trim(0) calls madvise(MADV_DONTNEED) in glibc, which walks page tables. On systems with multiple GB of RSS (common under DDoS), this takes seconds, not milliseconds.

  3. Design inconsistency: pat_ref_purge_range() carefully limits work to 100 entries and yields, but trim_all_pools() at the end has no such safeguard.

The only two call sites for trim_all_pools() in the codebase are:

Call site Context Risk
map.c:cli_io_handler_clear_map() CLI I/O handler (during traffic) High — acquires isolation in hot path
pool.c:pool_gc() Periodic GC task (already isolated at entry) Low — runs during idle time, already isolated

Do you have an idea how to solve the issue?

Remove trim_all_pools() from cli_io_handler_clear_map():

 static int cli_io_handler_clear_map(struct appctx *appctx)
 {
     struct show_map_ctx *ctx = appctx->svcctx;
     int finished;

     HA_RWLOCK_WRLOCK(PATREF_LOCK, &ctx->ref->lock);
     finished = pat_ref_purge_range(ctx->ref, ctx->curr_gen, ctx->prev_gen, 100);
     HA_RWLOCK_WRUNLOCK(PATREF_LOCK, &ctx->ref->lock);

     if (!finished) {
         applet_have_more_data(appctx);
         return 0;
     }

-    trim_all_pools();
     return 1;
 }

Why this is safe:

  • No memory leak: Freed map entries are returned to HAProxy's pool allocator free lists and reused for subsequent allocations.
  • OS memory reclaim still works: pool_gc() (src/pool.c:893) periodically calls trim_all_pools() during idle time, returning memory to the OS through the normal GC path.
  • More targeted than no-memory-trimming: The global no-memory-trimming option disables malloc_trim() for all callers including pool_gc(). Removing the call from map.c only eliminates the problematic hot-path invocation while preserving normal memory management.

What is your configuration?

# Relevant global settings
global
    nbthread 38
    # no-memory-trimming   ← not set at time of incident, now applied as workaround

# WAF regex map used in frontend (simplified)
frontend ft_https
    bind :443 ssl ...
    # regex map for WAF pattern matching, updated at runtime via CLI
    acl waf_match req.hdr(host),map_reg(/etc/haproxy/maps/reg_list.map) -m found
    http-request deny if waf_match


The map is updated at runtime using the atomic map update sequence via the stats socket:


# Automated update sequence (via unix socket):
prepare map /etc/haproxy/maps/reg_list.map
add map @<ver> /etc/haproxy/maps/reg_list.map <pattern> <value>
# ... (repeated for all entries)
commit map @<ver> /etc/haproxy/maps/reg_list.map   ← triggers the bug

Output of haproxy -vv

HAProxy version 3.2.11-awslc_v3 2026/01/29 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2030.
Known bugs: http://www.haproxy.org/bugs/bugs-3.2.11.html
Running on: Linux 5.14.0-570.42.2.el9_6.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 26 16:42:42 KST 2025 x86_64
Build options :
  TARGET  = linux-glibc
  CC      = cc
  CFLAGS  = -O2 -g -Og -march=native -fwrapv -fvect-cost-model=very-cheap -DTLS_TICKETS_NO=4
  OPTIONS = USE_OPENSSL_AWSLC=1 USE_LUA=1 USE_ZLIB=1 USE_QUIC=1 USE_STATIC_PCRE2=1 USE_PCRE2=1 USE_PCRE2_JIT=1
  DEBUG   =

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL +OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION +QUIC -QUIC_OPENSSL_COMPAT +RT -SLZ +SSL -STATIC_PCRE +STATIC_PCRE2 +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL +ZLIB +ACME

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=32, MAX_THREADS=1024, default=8).
Built with SSL library version : OpenSSL 1.1.1 (compatible; AWS-LC 1.66.0)
Running on SSL library version : AWS-LC 1.66.0
SSL library supports TLS extensions : yes
SSL library supports SNI : yes
SSL library FIPS mode : no
SSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
QUIC: connection socket-owner mode support : yes
QUIC: GSO emission support : yes
Built with Lua version : Lua 5.3.5
Built with network namespace support.
Built with Naver SSL Client Hello request capture. version: RB-1.1.2:65646
Built with Naver nvextauth (NID cookie decrypt) module. version: 1.0.0
Built with zlib version : 1.2.12
Running on zlib version : 1.2.12
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.43 2024-02-16
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.5.0 20240719 (Red Hat 11.5.0-5)

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
       quic : mode=HTTP  side=FE     mux=QUIC  flags=HTX|NO_UPG|FRAMED
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=SPOP  side=BE     mux=SPOP  flags=HOL_RISK|NO_UPG
       spop : mode=SPOP  side=BE     mux=SPOP  flags=HOL_RISK|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : none

Available filters :
	[BWLIM] bwlim-in
	[BWLIM] bwlim-out
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace

Last Outputs and Backtraces

Feb 26 18:34:28 server haproxy[81506]: WARNING! thread 38 has stopped processing traffic for 149 milliseconds
Feb 26 18:34:28 server haproxy[81506]:     with 21 streams currently blocked, prevented from making any progress.
Feb 26 18:34:28 server haproxy[81506]:     While this may occasionally happen with inefficient configurations
Feb 26 18:34:28 server haproxy[81506]:     involving excess of regular expressions, map_reg, or heavy Lua processing,
Feb 26 18:34:28 server haproxy[81506]:     this must remain exceptional because the system's stability is now at risk.
Feb 26 18:34:28 server haproxy[81506]:     Timers in logs may be reported incorrectly, spurious timeouts may happen,
Feb 26 18:34:28 server haproxy[81506]:     some incoming connections may silently be dropped, health checks may
Feb 26 18:34:28 server haproxy[81506]:     randomly fail, and accesses to the CLI may block the whole process. The
Feb 26 18:34:28 server haproxy[81506]:     blocking delay before emitting this warning may be adjusted via the global
Feb 26 18:34:28 server haproxy[81506]:     'warn-blocked-traffic-after' directive. Please check the trace below for
Feb 26 18:34:28 server haproxy[81506]:     any clues about configuration elements that need to be corrected:
Feb 26 18:34:28 server haproxy[81506]: * Thread 38: id=0x7f0b7c7f8700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=1 rqsz=1
Feb 26 18:34:28 server haproxy[81506]:       1/38   loops=569272150 ctxsw=2669841903 stuck=0 prof=0 harmless=0 isolated=1
Feb 26 18:34:28 server haproxy[81506]:              cpu_ns: poll=43646202684161 now=43646351977145 diff=149292984
Feb 26 18:34:28 server haproxy[81506]:              curr_task=0x7f0c29f590b0 (task) calls=2 last=0
Feb 26 18:34:28 server haproxy[81506]:                fct=0x695c28(task_process_applet) ctx=0x7f0b14fbf8a0
Feb 26 18:34:28 server haproxy[81506]:              lock_hist: S:IDLE_CONNS U:IDLE_CONNS R:TASK_WQ U:TASK_WQ W:PATREF W:PATEXP U:PATEXP U:PATREF
Feb 26 18:34:28 server haproxy[81506]:              call trace(18):
Feb 26 18:34:28 server haproxy[81506]:              |       0x67240a: ha_stuck_warning+0x12b/0x1de > ha_thread_dump_one
Feb 26 18:34:28 server haproxy[81506]:              |       0x777e54: wdt_handler+0x198/0x1b4 > ha_stuck_warning
Feb 26 18:34:28 server haproxy[81506]:              | 0x7f0c2ef79cf0: libpthread:+0x12cf0
Feb 26 18:34:28 server haproxy[81506]:              | 0x7f0c2ebdaa4b: libc:madvise+0xb/0x25
Feb 26 18:34:28 server haproxy[81506]:              | 0x7f0c2ec3ec86: libc:malloc_trim+0x146/0x2d0 > libc:madvise
Feb 26 18:34:28 server haproxy[81506]:              |       0x6ed5cf: malloc_trim+0xfa/0x108
Feb 26 18:34:28 server haproxy[81506]:              |       0x6ed604: trim_all_pools+0x27/0x3f > malloc_trim
Feb 26 18:34:28 server haproxy[81506]:              |       0x7267f2: process_runnable_tasks+0x250b > trim_all_pools
Feb 26 18:34:28 server haproxy[81506]:              |       0x630e9c: cli_io_handler+0x50b/0x924
Feb 26 18:34:28 server haproxy[81506]:              |       0x696ae0: task_process_applet+0xeb8/0x1405
Feb 26 18:34:28 server haproxy[81506]: ### Note: one thread was found stuck under malloc_trim(), which can run for a
Feb 26 18:34:28 server haproxy[81506]:           very long time on large memory systems. You may want to disable this
Feb 26 18:34:28 server haproxy[81506]:           memory reclaiming feature by setting 'no-memory-trimming' in the
Feb 26 18:34:28 server haproxy[81506]:           'global' section of your configuration to avoid this in the future.

Additional Information

Current workaround: We have applied no-memory-trimming in the global section, which prevents the issue. However, this disables all memory trimming globally (including the periodic pool_gc() path), which may cause RSS to remain elevated long-term.

Impact: In our environment, this caused a full process restart during active DDoS mitigation, dropping all existing connections. We use the atomic map update sequence (prepare mapadd mapcommit map) via the stats socket to update WAF regex maps at runtime. The combination of high RSS (from DDoS connection volume) and commit map on a regex map creates a reliable reproduction path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    status: needs-triageThis issue needs to be triaged.type: bugThis issue describes a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions