Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Superlinear parsing complexity in specific Markdown constructs (via dos-fuzzer) #1076

@come-bruneteau

Description

@come-bruneteau

While auditing a project depending on pulldown-cmark, we ran the dos-fuzzer included in this repository against a generated corpus. We identified multiple input patterns that trigger superlinear (quadratic or worse) CPU growth, reproducible across independent runs.
All data below is unmodified fuzzer output. A payload under 5 KB is sufficient to produce a significant CPU spike on any service parsing untrusted Markdown.


Methodology

We ran dos-fuzzer over 600,000+ patterns (~9,000 patterns/sec). We report only the six highest-scoring results here.


Findings

Every high-scoring pattern shares a common structural property involving a specific Markdown construct combined with line boundary characters. This holds consistently across both runs. Specific patterns and timing data are being withheld until a fix is available.


Root cause hypothesis

We have not audited the source in depth, you are better placed to confirm, but the empirical pattern is unambiguous: every high-scoring case involves [^ at or near a line boundary within an unclosed structure. Our hypothesis is that the parser retries the footnote reference match from each newline position, producing O(n) retries each costing O(n). Unclosed HTML tags in the surrounding context may compound this by preventing early termination.


Suggested mitigations

  • Bail out of footnote reference parsing after a maximum number of backtrack steps
  • Or memoize failed match positions to avoid retrying known-failing positions
  • Short-term: cap nesting/repetition depth in the footnote reference parser
  • Consider adding dos-fuzzer to CI to catch regressions

Reproduction

Corpus files are available privately on request to maintainers.


Impact

Any application using pulldown-cmark to parse untrusted Markdown is potentially affected (wikis, comment systems, documentation platforms, CI log renderers). A single < 5 KB request is sufficient to spike a CPU core.


Notes

We are happy to test proposed patches against our fuzzer corpus. Let us know if you'd prefer to continue in a private channel.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions