Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: add OSS-Fuzz harnesses under onnx/fuzz/#8052

Open
andife wants to merge 10 commits into
onnx:mainfrom
andife:feat/oss-fuzz-harnesses
Open

feat: add OSS-Fuzz harnesses under onnx/fuzz/#8052
andife wants to merge 10 commits into
onnx:mainfrom
andife:feat/oss-fuzz-harnesses

Conversation

@andife
Copy link
Copy Markdown
Member

@andife andife commented Jun 3, 2026

What and why

ONNX is a widely-used open standard for ML models. Parsing untrusted model files
(protobuf bytes, text format) and running inference on them are natural attack surfaces.
OSS-Fuzz runs continuous fuzz testing against
open-source projects to find crashes, hangs, and sanitizer violations before they reach
production.

This PR adds the upstream fuzz harnesses to the ONNX repo so they are:

  • version-controlled alongside the code they test
  • easy to update when APIs change
  • visible to contributors who want to understand or extend fuzzing coverage

The companion OSS-Fuzz infrastructure PR (google/oss-fuzz#15382)
will be updated to copy these files from $SRC/onnx/fuzz/ rather than bundling
them in the oss-fuzz repo itself.

Harnesses added (onnx/fuzz/)

File Entry point Input path
fuzz_checker.py checker.check_model Raw bytes → protobuf parser
fuzz_model_loader.py load_model_from_string + check_model Raw bytes → protobuf parser
fuzz_parser.py parser.parse_model UTF-8 text (ONNX text format)
fuzz_shape_inference.py shape_inference.infer_shapes Raw bytes and structured model with subgraphs (If/Loop/Scan) — selected by a toggle byte
fuzz_version_converter.py version_converter.convert_version Raw bytes → protobuf parser
make_seed_corpus.py (seed generator) Produces seed zips consumed by OSS-Fuzz
README.md Usage, design rationale, how to add a harness

Design decisions worth reviewing

except Exception: return — intentional in all harnesses. Expected errors
(protobuf DecodeError, ValidationError, InferenceError, ...) must be swallowed
so libFuzzer can keep running. Real bugs surface as crashes or sanitizer reports.

TestOneInput naming — required by the atheris API. Ruff N802 is suppressed
for onnx/fuzz/** in pyproject.toml.

fuzz_shape_inference.py toggle byte — a single trailing byte selects
strict_mode, check_type, and whether to use the raw-bytes path or a structured
model builder that exercises the recursive subgraph visitor (If/Loop/Scan). This
lets one harness cover both paths without forking.

sys.setrecursionlimit(1000) in fuzz_shape_inference.py — guards a known
unbounded-recursion DoS in shape inference with deeply-nested subgraphs, keeping
the fuzzer alive to find other bugs. Comment notes it should be removed once the
upstream fix lands.

Changes to pyproject.toml

Adds a per-file-ignores block for onnx/fuzz/** suppressing rules that conflict
with the required atheris patterns: N802 (naming), BLE001 (broad except),
PLR2004 (magic numbers), S112/PERF203 (try-except-continue in loop).

Test plan

  • CI passes (lint, mypy, reuse)
  • Harnesses run standalone: python onnx/fuzz/fuzz_checker.py -runs=1000
  • Seed corpus generates cleanly: python onnx/fuzz/make_seed_corpus.py /tmp/vc.zip /tmp/p.zip
  • OSS-Fuzz build reproduces with updated build.sh pointing to $SRC/onnx/fuzz/

Adds five atheris-based Python fuzz targets and a seed-corpus generator
so the fuzzing harnesses live in the upstream repo alongside the code
they test, as requested in google/oss-fuzz#15382.

- fuzz_checker.py           -- checker.check_model (raw bytes)
- fuzz_model_loader.py      -- load_model_from_string + check_model
- fuzz_parser.py            -- parser.parse_model (text format)
- fuzz_shape_inference.py   -- infer_shapes, raw and structured paths
- fuzz_version_converter.py -- version_converter.convert_version
- make_seed_corpus.py       -- generates seed zips for parser and version_converter

Also adds per-file-ignores in pyproject.toml for the fuzz directory to
suppress ruff rules that conflict with the required atheris API
(TestOneInput naming, intentional broad exception catches, etc.).

Signed-off-by: Andreas Fehlner <[email protected]>
@andife andife requested a review from a team as a code owner June 3, 2026 04:35
@github-project-automation github-project-automation Bot moved this to In progress in PR Tracker Jun 3, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.20%. Comparing base (dce5876) to head (b9a793a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8052      +/-   ##
==========================================
- Coverage   56.25%   56.20%   -0.05%     
==========================================
  Files         525      525              
  Lines       34347    34347              
  Branches     2979     2979              
==========================================
- Hits        19321    19304      -17     
- Misses      14189    14202      +13     
- Partials      837      841       +4     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

Comment thread onnx/fuzz/make_seed_corpus.py Fixed
Comment thread onnx/fuzz/make_seed_corpus.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds upstream OSS-Fuzz Python harnesses under onnx/fuzz/ (Atheris-based) to fuzz key ONNX entry points (checker, loader, parser, shape inference, version converter), plus a seed-corpus generator, and updates Ruff configuration to accommodate fuzz-harness patterns.

Changes:

  • Added five Atheris fuzz targets for ONNX model parsing/checking/inference/version conversion.
  • Added make_seed_corpus.py to generate zipped seed corpora for the parser and version-converter fuzzers.
  • Updated pyproject.toml Ruff per-file-ignores for onnx/fuzz/** to allow required fuzz-harness conventions (e.g., TestOneInput, broad exception handling).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pyproject.toml Adds Ruff per-file ignores for the new fuzz harness directory.
onnx/fuzz/make_seed_corpus.py Adds a seed-corpus zip generator for OSS-Fuzz fuzzers.
onnx/fuzz/fuzz_checker.py Adds a fuzz target for checker.check_model on raw bytes.
onnx/fuzz/fuzz_model_loader.py Adds a fuzz target for load_model_from_string + check_model.
onnx/fuzz/fuzz_parser.py Adds a fuzz target for parser.parse_model (text format).
onnx/fuzz/fuzz_shape_inference.py Adds a fuzz target for shape_inference.infer_shapes via raw and structured model paths.
onnx/fuzz/fuzz_version_converter.py Adds a fuzz target for version_converter.convert_version across candidate target opsets.

Comment thread onnx/fuzz/make_seed_corpus.py
andife added 2 commits June 3, 2026 07:27
- Use Mapping[str, bytes | str] (covariant) instead of dict to fix
  mypy arg-type errors when passing dict[str, bytes] or dict[str, str]
- Add argv count check with usage message to main() for clearer error
  when the script is invoked with wrong arguments

Signed-off-by: Andreas Fehlner <[email protected]>
Exit code 2 is the conventional Unix code for incorrect usage
(consistent with argparse and POSIX convention).

Signed-off-by: Andreas Fehlner <[email protected]>
andife added 3 commits June 3, 2026 17:47
Explains what OSS-Fuzz is, how each harness works, how to run them
locally, why broad exception catches and TestOneInput naming are
intentional, the toggle-byte design in fuzz_shape_inference, and
how to add a new harness.

Signed-off-by: Andreas Fehlner <[email protected]>
xadupre added a commit to xadupre/onnx-light that referenced this pull request Jun 4, 2026
* Initial plan

* Port OSS-Fuzz harnesses from onnx PR #8052 to onnx_light/fuzz/

Co-authored-by: xadupre <[email protected]>

* Add fuzz_optim_shape_inference for onnx_light.onnx_optim

Co-authored-by: xadupre <[email protected]>

* Add scheduled fuzz CI workflow

Co-authored-by: xadupre <[email protected]>

* Ignore missing atheris import in pyrefly

Co-authored-by: xadupre <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: xadupre <[email protected]>
@github-project-automation github-project-automation Bot moved this from In progress to Reviewer approved in PR Tracker Jun 5, 2026
Copy link
Copy Markdown
Contributor

@cyyever cyyever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three blockers that affect whether the harnesses actually reach real logic:

1. sys.setrecursionlimit(1000) in fuzz_shape_inference.py is a no-op. CPython's default is already 1000. It also can't guard the "known DoS": infer_shapes wraps the C++ C.infer_shapes (onnx/shape_inference.py:58), so deep-subgraph recursion overflows the C stack (SIGSEGV), never raising Python RecursionError — the except RecursionError will never fire. Drop the line + handler, or set a real Python limit and reword the comment to say it only bounds the builder.

2. fuzz_parser.py seeds don't apply. The harness reads input via FuzzedDataProvider(data).ConsumeUnicode(...), but the seeds in make_seed_corpus.py are raw UTF-8 text. ConsumeUnicode is not a UTF-8 passthrough, so the seeds won't round-trip into valid parse_model inputs. Either decode directly (data.decode("utf-8", "surrogatepass")) or pre-encode the seeds for FuzzedDataProvider.

3. No seeds for fuzz_checker / fuzz_shape_inference. make_seed_corpus.py only emits version_converter and parser zips, so these two have to randomly hit a parseable ModelProto before reaching any logic — most iterations die at parse. Add valid serialized model seeds (e.g. from onnx/backend/test/data/).

Minor: opset = fdp.ConsumeIntInRange(7, 20) misses opset 21–27 (current is 27).


Reviewed with Claude Code.

- fuzz_parser.py: replace FuzzedDataProvider.ConsumeUnicode with
  data.decode("utf-8", "surrogatepass") so UTF-8 seed files round-trip
  correctly into parse_model instead of being mangled by the bytemark
  encoding ConsumeUnicode expects

- fuzz_shape_inference.py: remove sys.setrecursionlimit(1000) (CPython
  default, no-op) and the except RecursionError handler (dead code:
  infer_shapes delegates to C++ via nanobind, so deep subgraph recursion
  causes a C-stack overflow, not RecursionError); fix opset range from
  (7, 20) to (7, 27) to cover all released opsets

- make_seed_corpus.py: add a checker_seeds zip (third output argument)
  with six valid serialized ModelProtos so fuzz_checker reaches real
  validation logic instead of dying at protobuf parse on most iterations

Signed-off-by: Andreas Fehlner <[email protected]>
@andife andife enabled auto-merge (squash) June 5, 2026 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Reviewer approved

Development

Successfully merging this pull request may close these issues.

4 participants