Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@dwisiswant0
Copy link
Member

@dwisiswant0 dwisiswant0 commented Nov 14, 2025

This PR fixes #699; a memory leak in ResponseChain that causes 2.9 GB memory retention during burst workloads, leading to OOM crashes in production Nuclei scans. The optimization reduces memory usage by 64-99% while improving performance by 38-57%.

Proposed Changes

fix(httputil): optimize `ResponseChain` memory usage

to prevent OOM.

`ResponseChain` currently suffers from unbounded
memory growth under high-concurrency workloads,
particularly when processing large responses
or compression bombs. This manifests as OOM kills
during nuclei scans with many concurrent requests.

The root cause is threefold:
`(*bytes.Buffer).ReadFrom()` over-allocates by
doubling capacity when size is unknown, the buffer
pool accumulates large buffers w/o bounds, and
each `ResponseChain` pre-allocates a
`fullResponse` buffer, even when unused.

Introduce `limitedBuffer` wrapper to constrain
buffer growth. This wrapper uses 32KB chunks and
caps total capacity at
`*ResponseChain.maxBodyRead`, preventing
the 2x over-allocation behavior of
`(*bytes.Buffer).ReadFrom()`. Reading now grows
incrementally rather than speculatively.

Implement semaphore-gated pooling for large
buffers. Buffers under 512KB are pooled freely as
most HTML responses fall in this range. Buffers at
or above 512KB are limited to 20 pooled instances
via semaphore. When the limit is reached, excess
large buffers are discarded and reclaimed by GC.
This prevents pool pollution from transient large
responses while still enabling reuse during burst
periods.

Remove the pre-allocated `fullResponse` buffer
from `ResponseChain` struct. Generate it lazily
only when `FullResponse()` is called. This reduces
per-instance memory footprint by one-third and
eliminates waste when callers only need headers or
body separately.

Add runtime configuration via `SetBufferSize()`,
and `SetMaxLargeBuffers()` to allow tuning based
on deployment characteristics. Increase default
max body size from 4MB to 8MB to accommodate
modern web apps. Also remove dependency on
docker/go-units.

Provide typed accessor methods `HeadersBytes()`,
`HeadersString()`, `BodyBytes()`, `BodyString()`,
and `FullResponseString()` for safe read-only
access. These prevent callers from inadvertently
retaining pooled buffers beyond their lifecycle.

The `FullResponse()` method now returns a buffer
that must be explicitly managed by the caller.
This is a breaking change but necessary to support
lazy generation semantics.

Testing with nuclei workloads shows stable memory
usage under sustained load where previously OOM
would occur within minutes.

API Changes

// Zero-copy string accessors
rc.HeadersString() string  // vs rc.Headers().String()
rc.BodyString() string     // vs rc.Body().String()
rc.FullResponseString() string

// Safe byte slice accessors
rc.HeadersBytes() []byte
rc.BodyBytes() []byte
rc.FullResponseBytes() []byte

Deprecated (but maintained for compat):

rc.Headers() *bytes.Buffer       // Now returns pooled buffer
rc.Body() *bytes.Buffer          // Now returns pooled buffer
rc.FullResponse() *bytes.Buffer  // Now creates on-demand

// Old variables
DefaultBytesBufferAlloc
MaxBodyRead

// New constants
DefaultBufferSize
DefaultMaxBodySize

New config funcs:

SetBufferSize(size int64)       // Configure buffer pool size
SetMaxLargeBuffers(max int)     // Configure large buffer limit

No breaking changes at all. All existing APIs are maintained for backward compatibility.

Migration Guide

No migration needed. Existing code works w/o changes. BUT:

Switching to the new safe accessors is highly recommended for better perf:

// Before
body := rc.Body().String()      // 1 allocation

// After  
body := rc.BodyString()          // 0 allocations (zero-copy)

Proof

patch:

$ go test -v -run "(BurstW(ithPoolExhaustion|orkload)|SustainedConcurrency|RapidCreateDestroy|ConcurrentReads|M(emoryPressure|ixedWorkload))$" -count 1 ./http/
=== RUN   TestResponseChain_BurstWorkload
    respChain_test.go:596: Before =  Alloc: 8 MB, TotalAlloc: 9 MB, Sys: 21 MB, NumGC: 2
    respChain_test.go:636: After  =  Alloc: 1 MB, TotalAlloc: 7855 MB, Sys: 2137 MB, NumGC: 18
    respChain_test.go:638: Memory delta - Alloc: -7 MB, TotalAlloc: +7846 MB, Sys: +2116 MB
    respChain_test.go:653: Memory growth: -7 MB
--- PASS: TestResponseChain_BurstWorkload (2.24s)
=== RUN   TestResponseChain_SustainedConcurrency
    respChain_test.go:679: Before =  Alloc: 1 MB, Sys: 2137 MB
    respChain_test.go:775: After  =  Alloc: 1 MB, Sys: 2137 MB
    respChain_test.go:777: Peak during load     - Alloc: 1456 MB, Sys: 2137 MB
    respChain_test.go:779: Total requests: 1101 (550 req/s), Memory delta: +0 MB
--- PASS: TestResponseChain_SustainedConcurrency (2.11s)
=== RUN   TestResponseChain_MemoryPressure
    respChain_test.go:811: Before =  Alloc: 9 MB, Sys: 2137 MB, MaxLargeBuffers: 10
    respChain_test.go:844: After  =  Alloc: 153 MB, Sys: 2137 MB
    respChain_test.go:846: Handled 30 requests (3x buffer limit) = Memory delta: +144 MB
--- PASS: TestResponseChain_MemoryPressure (0.14s)
=== RUN   TestResponseChain_MixedWorkload
    respChain_test.go:891: Before =  Alloc: 1 MB
    respChain_test.go:953: After  =  Alloc: 45 MB
    respChain_test.go:954: Processed 600 requests with mixed sizes/compression = Memory delta: +44 MB
--- PASS: TestResponseChain_MixedWorkload (0.18s)
=== RUN   TestResponseChain_RapidCreateDestroy
    respChain_test.go:972: Before =  Alloc: 1 MB, NumGC: 63
    respChain_test.go:994: After  =  Alloc: 1 MB, NumGC: 73
    respChain_test.go:996: Processed 1000 iterations (10000 KB total) = Memory delta: +0 MB, GC cycles: 10
--- PASS: TestResponseChain_RapidCreateDestroy (0.03s)
=== RUN   TestResponseChain_ConcurrentReads
    respChain_test.go:1024: Before =  Alloc: 1 MB
    respChain_test.go:1057: After  =  Alloc: 1 MB
    respChain_test.go:1058: 100 concurrent readers = Memory delta: +0 MB (should be ~0 for read-only ops)
--- PASS: TestResponseChain_ConcurrentReads (0.00s)
=== RUN   TestResponseChain_BurstWithPoolExhaustion
    respChain_test.go:1080: Before =  Alloc: 1 MB, PoolSize: 10
    respChain_test.go:1125: After  =  Alloc: 3 MB, PoolSize: 10
    respChain_test.go:1127: Handled 50 requests with pool size 10 = Memory delta: +2 MB
--- PASS: TestResponseChain_BurstWithPoolExhaustion (0.01s)
PASS
ok  	github.com/projectdiscovery/utils/http	4.962s

main:

0001-test-httputil-patch-for-PoC.patch

$ git checkout main
$ git cherry-pick 8de86ac
$ git apply 0001-test-httputil-patch-for-PoC.patch
$ go test -v -run "(BurstW(ithPoolExhaustion|orkload)|SustainedConcurrency|RapidCreateDestroy|ConcurrentReads|M(emoryPressure|ixedWorkload))$" -count 1 ./http/ 2>/dev/null
=== RUN   TestResponseChain_BurstWorkload
    respChain_test.go:32: Before =  Alloc: 8 MB, TotalAlloc: 9 MB, Sys: 16 MB, NumGC: 2
    respChain_test.go:72: After  =  Alloc: 2896 MB, TotalAlloc: 9889 MB, Sys: 5874 MB, NumGC: 13
    respChain_test.go:74: Memory delta - Alloc: +2888 MB, TotalAlloc: +9880 MB, Sys: +5857 MB
    respChain_test.go:89: Memory growth: 2888 MB
--- PASS: TestResponseChain_BurstWorkload (5.22s)
=== RUN   TestResponseChain_SustainedConcurrency
    respChain_test.go:115: Before =  Alloc: 1 MB, Sys: 5874 MB
    respChain_test.go:211: After  =  Alloc: 4 MB, Sys: 5874 MB
    respChain_test.go:213: Peak during load     - Alloc: 1229 MB, Sys: 5874 MB
    respChain_test.go:215: Total requests: 0 (0 req/s), Memory delta: +3 MB
--- PASS: TestResponseChain_SustainedConcurrency (2.01s)
=== RUN   TestResponseChain_MemoryPressure
    respChain_test.go:240: Before =  Alloc: 8 MB, Sys: 5874 MB, MaxLargeBuffers: 10
    respChain_test.go:273: After  =  Alloc: 407 MB, Sys: 5874 MB
    respChain_test.go:275: Handled 30 requests (3x buffer limit) = Memory delta: +398 MB
--- PASS: TestResponseChain_MemoryPressure (0.20s)
=== RUN   TestResponseChain_MixedWorkload
    respChain_test.go:320: Before =  Alloc: 16 MB
    respChain_test.go:382: After  =  Alloc: 68 MB
    respChain_test.go:383: Processed 600 requests with mixed sizes/compression = Memory delta: +51 MB
--- PASS: TestResponseChain_MixedWorkload (0.18s)
=== RUN   TestResponseChain_RapidCreateDestroy
    respChain_test.go:401: Before =  Alloc: 1 MB, NumGC: 44
    respChain_test.go:423: After  =  Alloc: 1 MB, NumGC: 45
    respChain_test.go:425: Processed 1000 iterations (10000 KB total) = Memory delta: +0 MB, GC cycles: 1
--- PASS: TestResponseChain_RapidCreateDestroy (0.01s)
=== RUN   TestResponseChain_ConcurrentReads
    respChain_test.go:453: Before =  Alloc: 1 MB
    respChain_test.go:486: After  =  Alloc: 1 MB
    respChain_test.go:487: 100 concurrent readers = Memory delta: +0 MB (should be ~0 for read-only ops)
--- PASS: TestResponseChain_ConcurrentReads (0.00s)
=== RUN   TestResponseChain_BurstWithPoolExhaustion
    respChain_test.go:509: Before =  Alloc: 1 MB, PoolSize: 10
    respChain_test.go:554: After  =  Alloc: 3 MB, PoolSize: 10
    respChain_test.go:556: Handled 50 requests with pool size 10 = Memory delta: +2 MB
--- PASS: TestResponseChain_BurstWithPoolExhaustion (0.01s)
PASS
ok  	github.com/projectdiscovery/utils/http	7.984s

comparison:

test main patch improvement
BurstWorkload (500 req × 8MB) 2,896 MB 1 MB -99.97%
MemoryPressure (30 req) 407 MB 153 MB -62%
MixedWorkload (600 req) 68 MB 45 MB -34%
SustainedConcurrency 4 MB 1 MB -75%
System Memory Peak 5,874 MB 2,137 MB -64%
  • On BurstWorkload, patch prevents a ~2.8 GB live-heap/sys spike present on main. Patch still does a lot of allocs (TotalAlloc large) but does NOT retain them; memory is reclaimed.
  • Patch reduces retained memory under MemoryPressure by ~250 MB.

@dwisiswant0 dwisiswant0 marked this pull request as draft November 14, 2025 13:32
@dwisiswant0
Copy link
Member Author

Wait.

@dwisiswant0 dwisiswant0 force-pushed the dwisiswant0/fix/httputil/optimize-ResponseChain-memory-usage-to-prevent-OOM branch from 606f237 to 147a6f2 Compare November 14, 2025 20:42
@dwisiswant0 dwisiswant0 marked this pull request as ready for review November 14, 2025 20:54
@dwisiswant0
Copy link
Member Author

Rebase after #701.

to prevent OOM.

`ResponseChain` currently suffers from unbounded
memory growth under high-concurrency workloads,
particularly when processing large responses
or compression bombs. This manifests as OOM kills
during nuclei scans with many concurrent requests.

The root cause is threefold:
`(*bytes.Buffer).ReadFrom()` over-allocates by
doubling capacity when size is unknown, the buffer
pool accumulates large buffers w/o bounds, and
each `ResponseChain` pre-allocates a
`fullResponse` buffer, even when unused.

Introduce `limitedBuffer` wrapper to constrain
buffer growth. This wrapper uses 32KB chunks and
caps total capacity at `maxBodyRead`, preventing
the 2x over-allocation behavior of
`(*bytes.Buffer).ReadFrom()`. Reading now grows
incrementally rather than speculatively.

Implement semaphore-gated pooling for large
buffers. Buffers under 512KB are pooled freely as
most HTML responses fall in this range. Buffers at
or above 512KB are limited to 20 pooled instances
via semaphore. When the limit is reached, excess
large buffers are discarded and reclaimed by GC.
This prevents pool pollution from transient large
responses while still enabling reuse during burst
periods.

Remove the pre-allocated `fullResponse` buffer
from `ResponseChain` struct. Generate it lazily
only when `FullResponse()` is called. This reduces
per-instance memory footprint by one-third and
eliminates waste when callers only need headers or
body separately.

Add runtime configuration via `SetMaxBodySize()`,
`SetBufferSize()`, and `SetMaxLargeBuffers()` to
allow tuning based on deployment characteristics.
Increase default max body size from 4MB to 8MB to
accommodate modern web apps. Also remove
dependency on docker/go-units.

Provide typed accessor methods `HeadersBytes()`,
`HeadersString()`, `BodyBytes()`, `BodyString()`,
and `FullResponseString()` for safe read-only
access. These prevent callers from inadvertently
retaining pooled buffers beyond their lifecycle.

The `FullResponse()` method now returns a buffer
that must be explicitly managed by the caller.
This is a breaking change but necessary to support
lazy generation semantics.

Testing with nuclei workloads shows stable memory
usage under sustained load where previously OOM
would occur within minutes.

```bash
go test -v -run "(BurstW(ithPoolExhaustion|orkload)|SustainedConcurrency|RapidCreateDestroy|ConcurrentReads|M(emoryPressure|ixedWorkload))$" -count 1 ./http/
```

Signed-off-by: Dwi Siswanto <[email protected]>
@dwisiswant0 dwisiswant0 force-pushed the dwisiswant0/fix/httputil/optimize-ResponseChain-memory-usage-to-prevent-OOM branch from 147a6f2 to d06a121 Compare November 15, 2025 07:32
@dwisiswant0
Copy link
Member Author

Rebase after #701.

Yep! It's all green now. Ready to review. :)

Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition! Anyway it still seems not to fix fully the edge case of hosts with large responses.
Screenshot 2025-11-18 at 17 51 52

@Mzack9999 Mzack9999 added the Type: Optimization Increasing the performance/optimization. Not an issue, just something to consider. label Nov 18, 2025
@Mzack9999 Mzack9999 merged commit 5314f45 into main Nov 18, 2025
7 checks passed
@Mzack9999 Mzack9999 deleted the dwisiswant0/fix/httputil/optimize-ResponseChain-memory-usage-to-prevent-OOM branch November 18, 2025 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: Optimization Increasing the performance/optimization. Not an issue, just something to consider.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[perf] httputil: Memory optimization for ResponseChain buffer management

3 participants