Codestin Search App

andife · 2026-06-03T04:35:47Z

What and why

ONNX is a widely-used open standard for ML models. Parsing untrusted model files
(protobuf bytes, text format) and running inference on them are natural attack surfaces.
OSS-Fuzz runs continuous fuzz testing against
open-source projects to find crashes, hangs, and sanitizer violations before they reach
production.

This PR adds the upstream fuzz harnesses to the ONNX repo so they are:

version-controlled alongside the code they test
easy to update when APIs change
visible to contributors who want to understand or extend fuzzing coverage

The companion OSS-Fuzz infrastructure PR (google/oss-fuzz#15382)
will be updated to copy these files from $SRC/onnx/fuzz/ rather than bundling
them in the oss-fuzz repo itself.

Harnesses added (`onnx/fuzz/`)

File	Entry point	Input path
`fuzz_checker.py`	`checker.check_model`	Raw bytes → protobuf parser
`fuzz_model_loader.py`	`load_model_from_string` + `check_model`	Raw bytes → protobuf parser
`fuzz_parser.py`	`parser.parse_model`	UTF-8 text (ONNX text format)
`fuzz_shape_inference.py`	`shape_inference.infer_shapes`	Raw bytes and structured model with subgraphs (If/Loop/Scan) — selected by a toggle byte
`fuzz_version_converter.py`	`version_converter.convert_version`	Raw bytes → protobuf parser
`make_seed_corpus.py`	(seed generator)	Produces seed zips consumed by OSS-Fuzz
`README.md`	—	Usage, design rationale, how to add a harness

Design decisions worth reviewing

except Exception: return — intentional in all harnesses. Expected errors
(protobuf DecodeError, ValidationError, InferenceError, ...) must be swallowed
so libFuzzer can keep running. Real bugs surface as crashes or sanitizer reports.

TestOneInput naming — required by the atheris API. Ruff N802 is suppressed
for onnx/fuzz/** in pyproject.toml.

fuzz_shape_inference.py toggle byte — a single trailing byte selects
strict_mode, check_type, and whether to use the raw-bytes path or a structured
model builder that exercises the recursive subgraph visitor (If/Loop/Scan). This
lets one harness cover both paths without forking.

sys.setrecursionlimit(1000) in fuzz_shape_inference.py — guards a known
unbounded-recursion DoS in shape inference with deeply-nested subgraphs, keeping
the fuzzer alive to find other bugs. Comment notes it should be removed once the
upstream fix lands.

Changes to `pyproject.toml`

Adds a per-file-ignores block for onnx/fuzz/** suppressing rules that conflict
with the required atheris patterns: N802 (naming), BLE001 (broad except),
PLR2004 (magic numbers), S112/PERF203 (try-except-continue in loop).

Test plan

CI passes (lint, mypy, reuse)
Harnesses run standalone: python onnx/fuzz/fuzz_checker.py -runs=1000
Seed corpus generates cleanly: python onnx/fuzz/make_seed_corpus.py /tmp/vc.zip /tmp/p.zip
OSS-Fuzz build reproduces with updated build.sh pointing to $SRC/onnx/fuzz/

Adds five atheris-based Python fuzz targets and a seed-corpus generator so the fuzzing harnesses live in the upstream repo alongside the code they test, as requested in google/oss-fuzz#15382. - fuzz_checker.py -- checker.check_model (raw bytes) - fuzz_model_loader.py -- load_model_from_string + check_model - fuzz_parser.py -- parser.parse_model (text format) - fuzz_shape_inference.py -- infer_shapes, raw and structured paths - fuzz_version_converter.py -- version_converter.convert_version - make_seed_corpus.py -- generates seed zips for parser and version_converter Also adds per-file-ignores in pyproject.toml for the fuzz directory to suppress ruff rules that conflict with the required atheris API (TestOneInput naming, intentional broad exception catches, etc.). Signed-off-by: Andreas Fehlner <[email protected]>

codecov · 2026-06-03T04:36:52Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.20%. Comparing base (dce5876) to head (b9a793a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8052      +/-   ##
==========================================
- Coverage   56.25%   56.20%   -0.05%     
==========================================
  Files         525      525              
  Lines       34347    34347              
  Branches     2979     2979              
==========================================
- Hits        19321    19304      -17     
- Misses      14189    14202      +13     
- Partials      837      841       +4

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

Copilot

Pull request overview

Adds upstream OSS-Fuzz Python harnesses under onnx/fuzz/ (Atheris-based) to fuzz key ONNX entry points (checker, loader, parser, shape inference, version converter), plus a seed-corpus generator, and updates Ruff configuration to accommodate fuzz-harness patterns.

Changes:

Added five Atheris fuzz targets for ONNX model parsing/checking/inference/version conversion.
Added make_seed_corpus.py to generate zipped seed corpora for the parser and version-converter fuzzers.
Updated pyproject.toml Ruff per-file-ignores for onnx/fuzz/** to allow required fuzz-harness conventions (e.g., TestOneInput, broad exception handling).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
pyproject.toml	Adds Ruff per-file ignores for the new fuzz harness directory.
onnx/fuzz/make_seed_corpus.py	Adds a seed-corpus zip generator for OSS-Fuzz fuzzers.
onnx/fuzz/fuzz_checker.py	Adds a fuzz target for `checker.check_model` on raw bytes.
onnx/fuzz/fuzz_model_loader.py	Adds a fuzz target for `load_model_from_string` + `check_model`.
onnx/fuzz/fuzz_parser.py	Adds a fuzz target for `parser.parse_model` (text format).
onnx/fuzz/fuzz_shape_inference.py	Adds a fuzz target for `shape_inference.infer_shapes` via raw and structured model paths.
onnx/fuzz/fuzz_version_converter.py	Adds a fuzz target for `version_converter.convert_version` across candidate target opsets.

- Use Mapping[str, bytes | str] (covariant) instead of dict to fix mypy arg-type errors when passing dict[str, bytes] or dict[str, str] - Add argv count check with usage message to main() for clearer error when the script is invoked with wrong arguments Signed-off-by: Andreas Fehlner <[email protected]>

Exit code 2 is the conventional Unix code for incorrect usage (consistent with argparse and POSIX convention). Signed-off-by: Andreas Fehlner <[email protected]>

Explains what OSS-Fuzz is, how each harness works, how to run them locally, why broad exception catches and TestOneInput naming are intentional, the toggle-byte design in fuzz_shape_inference, and how to add a new harness. Signed-off-by: Andreas Fehlner <[email protected]>

* Initial plan * Port OSS-Fuzz harnesses from onnx PR #8052 to onnx_light/fuzz/ Co-authored-by: xadupre <[email protected]> * Add fuzz_optim_shape_inference for onnx_light.onnx_optim Co-authored-by: xadupre <[email protected]> * Add scheduled fuzz CI workflow Co-authored-by: xadupre <[email protected]> * Ignore missing atheris import in pyrefly Co-authored-by: xadupre <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: xadupre <[email protected]>

Signed-off-by: Andreas Fehlner <[email protected]>

cyyever

Three blockers that affect whether the harnesses actually reach real logic:

1. sys.setrecursionlimit(1000) in fuzz_shape_inference.py is a no-op. CPython's default is already 1000. It also can't guard the "known DoS": infer_shapes wraps the C++ C.infer_shapes (onnx/shape_inference.py:58), so deep-subgraph recursion overflows the C stack (SIGSEGV), never raising Python RecursionError — the except RecursionError will never fire. Drop the line + handler, or set a real Python limit and reword the comment to say it only bounds the builder.

2. fuzz_parser.py seeds don't apply. The harness reads input via FuzzedDataProvider(data).ConsumeUnicode(...), but the seeds in make_seed_corpus.py are raw UTF-8 text. ConsumeUnicode is not a UTF-8 passthrough, so the seeds won't round-trip into valid parse_model inputs. Either decode directly (data.decode("utf-8", "surrogatepass")) or pre-encode the seeds for FuzzedDataProvider.

3. No seeds for fuzz_checker / fuzz_shape_inference. make_seed_corpus.py only emits version_converter and parser zips, so these two have to randomly hit a parseable ModelProto before reaching any logic — most iterations die at parse. Add valid serialized model seeds (e.g. from onnx/backend/test/data/).

Minor: opset = fdp.ConsumeIntInRange(7, 20) misses opset 21–27 (current is 27).

Reviewed with Claude Code.

- fuzz_parser.py: replace FuzzedDataProvider.ConsumeUnicode with data.decode("utf-8", "surrogatepass") so UTF-8 seed files round-trip correctly into parse_model instead of being mangled by the bytemark encoding ConsumeUnicode expects - fuzz_shape_inference.py: remove sys.setrecursionlimit(1000) (CPython default, no-op) and the except RecursionError handler (dead code: infer_shapes delegates to C++ via nanobind, so deep subgraph recursion causes a C-stack overflow, not RecursionError); fix opset range from (7, 20) to (7, 27) to cover all released opsets - make_seed_corpus.py: add a checker_seeds zip (third output argument) with six valid serialized ModelProtos so fuzz_checker reaches real validation logic instead of dying at protobuf parse on most iterations Signed-off-by: Andreas Fehlner <[email protected]>

andife requested a review from a team as a code owner June 3, 2026 04:35

github-project-automation Bot added this to PR Tracker Jun 3, 2026

github-project-automation Bot moved this to In progress in PR Tracker Jun 3, 2026

Merge branch 'main' into feat/oss-fuzz-harnesses

151f458

andife added the topic: security label Jun 3, 2026

andife requested a review from Copilot June 3, 2026 04:44

Copilot started reviewing on behalf of andife June 3, 2026 04:45 View session

github-advanced-security AI found potential problems Jun 3, 2026

View reviewed changes

Comment thread onnx/fuzz/make_seed_corpus.py Fixed

Comment thread onnx/fuzz/make_seed_corpus.py Fixed

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread onnx/fuzz/make_seed_corpus.py

andife added 2 commits June 3, 2026 07:27

fix: use exit code 2 for usage error in make_seed_corpus.py

4480de5

Exit code 2 is the conventional Unix code for incorrect usage (consistent with argparse and POSIX convention). Signed-off-by: Andreas Fehlner <[email protected]>

andife mentioned this pull request Jun 3, 2026

Track oss-fuzz integration for continuous fuzzing of ONNX #4902

Open

andife added 3 commits June 3, 2026 17:47

Merge branch 'main' into feat/oss-fuzz-harnesses

84d48ac

Merge branch 'main' into feat/oss-fuzz-harnesses

507be17

Copilot AI mentioned this pull request Jun 4, 2026

Port OSS-Fuzz harnesses from onnx/onnx#8052 to onnx_light/fuzz/ xadupre/onnx-light#1680

Merged

andife added 2 commits June 5, 2026 06:15

Merge branch 'main' into feat/oss-fuzz-harnesses

758fe9b

Update README.md

6cb2e96

Signed-off-by: Andreas Fehlner <[email protected]>

cyyever approved these changes Jun 5, 2026

View reviewed changes

github-project-automation Bot moved this from In progress to Reviewer approved in PR Tracker Jun 5, 2026

cyyever reviewed Jun 5, 2026

View reviewed changes

andife enabled auto-merge (squash) June 5, 2026 06:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add OSS-Fuzz harnesses under onnx/fuzz/#8052

feat: add OSS-Fuzz harnesses under onnx/fuzz/#8052
andife wants to merge 10 commits into
onnx:mainfrom
andife:feat/oss-fuzz-harnesses

andife commented Jun 3, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

cyyever left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

andife commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What and why

Harnesses added (onnx/fuzz/)

Design decisions worth reviewing

Changes to pyproject.toml

Test plan

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

cyyever left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andife commented Jun 3, 2026 •

edited

Loading

Harnesses added (`onnx/fuzz/`)

Changes to `pyproject.toml`

codecov Bot commented Jun 3, 2026 •

edited

Loading

cyyever left a comment •

edited

Loading