fix(httputil): optimize `ResponseChain` memory usage #700

dwisiswant0 · 2025-11-14T08:25:05Z

This PR fixes #699; a memory leak in ResponseChain that causes 2.9 GB memory retention during burst workloads, leading to OOM crashes in production Nuclei scans. The optimization reduces memory usage by 64-99% while improving performance by 38-57%.

Proposed Changes

fix(httputil): optimize `ResponseChain` memory usage

to prevent OOM.

`ResponseChain` currently suffers from unbounded
memory growth under high-concurrency workloads,
particularly when processing large responses
or compression bombs. This manifests as OOM kills
during nuclei scans with many concurrent requests.

The root cause is threefold:
`(*bytes.Buffer).ReadFrom()` over-allocates by
doubling capacity when size is unknown, the buffer
pool accumulates large buffers w/o bounds, and
each `ResponseChain` pre-allocates a
`fullResponse` buffer, even when unused.

Introduce `limitedBuffer` wrapper to constrain
buffer growth. This wrapper uses 32KB chunks and
caps total capacity at
`*ResponseChain.maxBodyRead`, preventing
the 2x over-allocation behavior of
`(*bytes.Buffer).ReadFrom()`. Reading now grows
incrementally rather than speculatively.

Implement semaphore-gated pooling for large
buffers. Buffers under 512KB are pooled freely as
most HTML responses fall in this range. Buffers at
or above 512KB are limited to 20 pooled instances
via semaphore. When the limit is reached, excess
large buffers are discarded and reclaimed by GC.
This prevents pool pollution from transient large
responses while still enabling reuse during burst
periods.

Remove the pre-allocated `fullResponse` buffer
from `ResponseChain` struct. Generate it lazily
only when `FullResponse()` is called. This reduces
per-instance memory footprint by one-third and
eliminates waste when callers only need headers or
body separately.

Add runtime configuration via `SetBufferSize()`,
and `SetMaxLargeBuffers()` to allow tuning based
on deployment characteristics. Increase default
max body size from 4MB to 8MB to accommodate
modern web apps. Also remove dependency on
docker/go-units.

Provide typed accessor methods `HeadersBytes()`,
`HeadersString()`, `BodyBytes()`, `BodyString()`,
and `FullResponseString()` for safe read-only
access. These prevent callers from inadvertently
retaining pooled buffers beyond their lifecycle.

The `FullResponse()` method now returns a buffer
that must be explicitly managed by the caller.
This is a breaking change but necessary to support
lazy generation semantics.

Testing with nuclei workloads shows stable memory
usage under sustained load where previously OOM
would occur within minutes.

API Changes

// Zero-copy string accessors
rc.HeadersString() string  // vs rc.Headers().String()
rc.BodyString() string     // vs rc.Body().String()
rc.FullResponseString() string

// Safe byte slice accessors
rc.HeadersBytes() []byte
rc.BodyBytes() []byte
rc.FullResponseBytes() []byte

Deprecated (but maintained for compat):

rc.Headers() *bytes.Buffer       // Now returns pooled buffer
rc.Body() *bytes.Buffer          // Now returns pooled buffer
rc.FullResponse() *bytes.Buffer  // Now creates on-demand

// Old variables
DefaultBytesBufferAlloc
MaxBodyRead

// New constants
DefaultBufferSize
DefaultMaxBodySize

New config funcs:

SetBufferSize(size int64)       // Configure buffer pool size
SetMaxLargeBuffers(max int)     // Configure large buffer limit

No breaking changes at all. All existing APIs are maintained for backward compatibility.

Migration Guide

No migration needed. Existing code works w/o changes. BUT:

Switching to the new safe accessors is highly recommended for better perf:

// Before
body := rc.Body().String()      // 1 allocation

// After  
body := rc.BodyString()          // 0 allocations (zero-copy)

Proof

patch:

$ go test -v -run "(BurstW(ithPoolExhaustion|orkload)|SustainedConcurrency|RapidCreateDestroy|ConcurrentReads|M(emoryPressure|ixedWorkload))$" -count 1 ./http/
=== RUN   TestResponseChain_BurstWorkload
    respChain_test.go:596: Before =  Alloc: 8 MB, TotalAlloc: 9 MB, Sys: 21 MB, NumGC: 2
    respChain_test.go:636: After  =  Alloc: 1 MB, TotalAlloc: 7855 MB, Sys: 2137 MB, NumGC: 18
    respChain_test.go:638: Memory delta - Alloc: -7 MB, TotalAlloc: +7846 MB, Sys: +2116 MB
    respChain_test.go:653: Memory growth: -7 MB
--- PASS: TestResponseChain_BurstWorkload (2.24s)
=== RUN   TestResponseChain_SustainedConcurrency
    respChain_test.go:679: Before =  Alloc: 1 MB, Sys: 2137 MB
    respChain_test.go:775: After  =  Alloc: 1 MB, Sys: 2137 MB
    respChain_test.go:777: Peak during load     - Alloc: 1456 MB, Sys: 2137 MB
    respChain_test.go:779: Total requests: 1101 (550 req/s), Memory delta: +0 MB
--- PASS: TestResponseChain_SustainedConcurrency (2.11s)
=== RUN   TestResponseChain_MemoryPressure
    respChain_test.go:811: Before =  Alloc: 9 MB, Sys: 2137 MB, MaxLargeBuffers: 10
    respChain_test.go:844: After  =  Alloc: 153 MB, Sys: 2137 MB
    respChain_test.go:846: Handled 30 requests (3x buffer limit) = Memory delta: +144 MB
--- PASS: TestResponseChain_MemoryPressure (0.14s)
=== RUN   TestResponseChain_MixedWorkload
    respChain_test.go:891: Before =  Alloc: 1 MB
    respChain_test.go:953: After  =  Alloc: 45 MB
    respChain_test.go:954: Processed 600 requests with mixed sizes/compression = Memory delta: +44 MB
--- PASS: TestResponseChain_MixedWorkload (0.18s)
=== RUN   TestResponseChain_RapidCreateDestroy
    respChain_test.go:972: Before =  Alloc: 1 MB, NumGC: 63
    respChain_test.go:994: After  =  Alloc: 1 MB, NumGC: 73
    respChain_test.go:996: Processed 1000 iterations (10000 KB total) = Memory delta: +0 MB, GC cycles: 10
--- PASS: TestResponseChain_RapidCreateDestroy (0.03s)
=== RUN   TestResponseChain_ConcurrentReads
    respChain_test.go:1024: Before =  Alloc: 1 MB
    respChain_test.go:1057: After  =  Alloc: 1 MB
    respChain_test.go:1058: 100 concurrent readers = Memory delta: +0 MB (should be ~0 for read-only ops)
--- PASS: TestResponseChain_ConcurrentReads (0.00s)
=== RUN   TestResponseChain_BurstWithPoolExhaustion
    respChain_test.go:1080: Before =  Alloc: 1 MB, PoolSize: 10
    respChain_test.go:1125: After  =  Alloc: 3 MB, PoolSize: 10
    respChain_test.go:1127: Handled 50 requests with pool size 10 = Memory delta: +2 MB
--- PASS: TestResponseChain_BurstWithPoolExhaustion (0.01s)
PASS
ok  	github.com/projectdiscovery/utils/http	4.962s

main:

0001-test-httputil-patch-for-PoC.patch

$ git checkout main
$ git cherry-pick 8de86ac
$ git apply 0001-test-httputil-patch-for-PoC.patch
$ go test -v -run "(BurstW(ithPoolExhaustion|orkload)|SustainedConcurrency|RapidCreateDestroy|ConcurrentReads|M(emoryPressure|ixedWorkload))$" -count 1 ./http/ 2>/dev/null
=== RUN   TestResponseChain_BurstWorkload
    respChain_test.go:32: Before =  Alloc: 8 MB, TotalAlloc: 9 MB, Sys: 16 MB, NumGC: 2
    respChain_test.go:72: After  =  Alloc: 2896 MB, TotalAlloc: 9889 MB, Sys: 5874 MB, NumGC: 13
    respChain_test.go:74: Memory delta - Alloc: +2888 MB, TotalAlloc: +9880 MB, Sys: +5857 MB
    respChain_test.go:89: Memory growth: 2888 MB
--- PASS: TestResponseChain_BurstWorkload (5.22s)
=== RUN   TestResponseChain_SustainedConcurrency
    respChain_test.go:115: Before =  Alloc: 1 MB, Sys: 5874 MB
    respChain_test.go:211: After  =  Alloc: 4 MB, Sys: 5874 MB
    respChain_test.go:213: Peak during load     - Alloc: 1229 MB, Sys: 5874 MB
    respChain_test.go:215: Total requests: 0 (0 req/s), Memory delta: +3 MB
--- PASS: TestResponseChain_SustainedConcurrency (2.01s)
=== RUN   TestResponseChain_MemoryPressure
    respChain_test.go:240: Before =  Alloc: 8 MB, Sys: 5874 MB, MaxLargeBuffers: 10
    respChain_test.go:273: After  =  Alloc: 407 MB, Sys: 5874 MB
    respChain_test.go:275: Handled 30 requests (3x buffer limit) = Memory delta: +398 MB
--- PASS: TestResponseChain_MemoryPressure (0.20s)
=== RUN   TestResponseChain_MixedWorkload
    respChain_test.go:320: Before =  Alloc: 16 MB
    respChain_test.go:382: After  =  Alloc: 68 MB
    respChain_test.go:383: Processed 600 requests with mixed sizes/compression = Memory delta: +51 MB
--- PASS: TestResponseChain_MixedWorkload (0.18s)
=== RUN   TestResponseChain_RapidCreateDestroy
    respChain_test.go:401: Before =  Alloc: 1 MB, NumGC: 44
    respChain_test.go:423: After  =  Alloc: 1 MB, NumGC: 45
    respChain_test.go:425: Processed 1000 iterations (10000 KB total) = Memory delta: +0 MB, GC cycles: 1
--- PASS: TestResponseChain_RapidCreateDestroy (0.01s)
=== RUN   TestResponseChain_ConcurrentReads
    respChain_test.go:453: Before =  Alloc: 1 MB
    respChain_test.go:486: After  =  Alloc: 1 MB
    respChain_test.go:487: 100 concurrent readers = Memory delta: +0 MB (should be ~0 for read-only ops)
--- PASS: TestResponseChain_ConcurrentReads (0.00s)
=== RUN   TestResponseChain_BurstWithPoolExhaustion
    respChain_test.go:509: Before =  Alloc: 1 MB, PoolSize: 10
    respChain_test.go:554: After  =  Alloc: 3 MB, PoolSize: 10
    respChain_test.go:556: Handled 50 requests with pool size 10 = Memory delta: +2 MB
--- PASS: TestResponseChain_BurstWithPoolExhaustion (0.01s)
PASS
ok  	github.com/projectdiscovery/utils/http	7.984s

comparison:

test	main	patch	improvement
BurstWorkload (500 req × 8MB)	2,896 MB	1 MB	-99.97%
MemoryPressure (30 req)	407 MB	153 MB	-62%
MixedWorkload (600 req)	68 MB	45 MB	-34%
SustainedConcurrency	4 MB	1 MB	-75%
System Memory Peak	5,874 MB	2,137 MB	-64%

On BurstWorkload, patch prevents a ~2.8 GB live-heap/sys spike present on main. Patch still does a lot of allocs (TotalAlloc large) but does NOT retain them; memory is reclaimed.
Patch reduces retained memory under MemoryPressure by ~250 MB.

dwisiswant0 · 2025-11-14T13:32:41Z

Wait.

dwisiswant0 · 2025-11-14T20:55:01Z

Rebase after #701.

Signed-off-by: Dwi Siswanto <[email protected]>

to prevent OOM. `ResponseChain` currently suffers from unbounded memory growth under high-concurrency workloads, particularly when processing large responses or compression bombs. This manifests as OOM kills during nuclei scans with many concurrent requests. The root cause is threefold: `(*bytes.Buffer).ReadFrom()` over-allocates by doubling capacity when size is unknown, the buffer pool accumulates large buffers w/o bounds, and each `ResponseChain` pre-allocates a `fullResponse` buffer, even when unused. Introduce `limitedBuffer` wrapper to constrain buffer growth. This wrapper uses 32KB chunks and caps total capacity at `maxBodyRead`, preventing the 2x over-allocation behavior of `(*bytes.Buffer).ReadFrom()`. Reading now grows incrementally rather than speculatively. Implement semaphore-gated pooling for large buffers. Buffers under 512KB are pooled freely as most HTML responses fall in this range. Buffers at or above 512KB are limited to 20 pooled instances via semaphore. When the limit is reached, excess large buffers are discarded and reclaimed by GC. This prevents pool pollution from transient large responses while still enabling reuse during burst periods. Remove the pre-allocated `fullResponse` buffer from `ResponseChain` struct. Generate it lazily only when `FullResponse()` is called. This reduces per-instance memory footprint by one-third and eliminates waste when callers only need headers or body separately. Add runtime configuration via `SetMaxBodySize()`, `SetBufferSize()`, and `SetMaxLargeBuffers()` to allow tuning based on deployment characteristics. Increase default max body size from 4MB to 8MB to accommodate modern web apps. Also remove dependency on docker/go-units. Provide typed accessor methods `HeadersBytes()`, `HeadersString()`, `BodyBytes()`, `BodyString()`, and `FullResponseString()` for safe read-only access. These prevent callers from inadvertently retaining pooled buffers beyond their lifecycle. The `FullResponse()` method now returns a buffer that must be explicitly managed by the caller. This is a breaking change but necessary to support lazy generation semantics. Testing with nuclei workloads shows stable memory usage under sustained load where previously OOM would occur within minutes. ```bash go test -v -run "(BurstW(ithPoolExhaustion|orkload)|SustainedConcurrency|RapidCreateDestroy|ConcurrentReads|M(emoryPressure|ixedWorkload))$" -count 1 ./http/ ``` Signed-off-by: Dwi Siswanto <[email protected]>

Signed-off-by: Dwi Siswanto <[email protected]>

dwisiswant0 · 2025-11-15T07:37:29Z

Rebase after #701.

Yep! It's all green now. Ready to review. :)

Mzack9999

Nice addition! Anyway it still seems not to fix fully the edge case of hosts with large responses.

dwisiswant0 requested a review from Mzack9999 November 14, 2025 08:25

dwisiswant0 marked this pull request as draft November 14, 2025 13:32

dwisiswant0 force-pushed the dwisiswant0/fix/httputil/optimize-ResponseChain-memory-usage-to-prevent-OOM branch from 606f237 to 147a6f2 Compare November 14, 2025 20:42

dwisiswant0 mentioned this pull request Nov 14, 2025

fix(sync): AdaptiveWaitGroup counter racy #701

Merged

dwisiswant0 marked this pull request as ready for review November 14, 2025 20:54

dwisiswant0 added 5 commits November 15, 2025 14:32

test(httputil): add (bench) tests

6f3cae1

Signed-off-by: Dwi Siswanto <[email protected]>

chore(httputil): satisfy lints

d5adade

Signed-off-by: Dwi Siswanto <[email protected]>

feat(httputil): rm SetMaxBodySize func & maxBodyRead var

cbe8953

Signed-off-by: Dwi Siswanto <[email protected]>

feat(httputil): adds ResponseChain.maxBodySize field

d06a121

Signed-off-by: Dwi Siswanto <[email protected]>

dwisiswant0 force-pushed the dwisiswant0/fix/httputil/optimize-ResponseChain-memory-usage-to-prevent-OOM branch from 147a6f2 to d06a121 Compare November 15, 2025 07:32

Mzack9999 approved these changes Nov 18, 2025

View reviewed changes

Mzack9999 assigned dwisiswant0 Nov 18, 2025

Mzack9999 added the Type: Optimization Increasing the performance/optimization. Not an issue, just something to consider. label Nov 18, 2025

Mzack9999 merged commit 5314f45 into main Nov 18, 2025
7 checks passed

Mzack9999 deleted the dwisiswant0/fix/httputil/optimize-ResponseChain-memory-usage-to-prevent-OOM branch November 18, 2025 13:55

dwisiswant0 mentioned this pull request Nov 26, 2025

fix(httputil): racy in ResponseChain #708

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(httputil): optimize `ResponseChain` memory usage #700

fix(httputil): optimize `ResponseChain` memory usage #700

Uh oh!

dwisiswant0 commented Nov 14, 2025 •

edited

Loading

Uh oh!

dwisiswant0 commented Nov 14, 2025

Uh oh!

dwisiswant0 commented Nov 14, 2025

Uh oh!

dwisiswant0 commented Nov 15, 2025

Uh oh!

Mzack9999 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(httputil): optimize ResponseChain memory usage #700

fix(httputil): optimize ResponseChain memory usage #700

Uh oh!

Conversation

dwisiswant0 commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed Changes

API Changes

Migration Guide

Proof

Uh oh!

dwisiswant0 commented Nov 14, 2025

Uh oh!

dwisiswant0 commented Nov 14, 2025

Uh oh!

dwisiswant0 commented Nov 15, 2025

Uh oh!

Mzack9999 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(httputil): optimize `ResponseChain` memory usage #700

fix(httputil): optimize `ResponseChain` memory usage #700

dwisiswant0 commented Nov 14, 2025 •

edited

Loading