feat: add OSS-Fuzz harnesses under onnx/fuzz/#8052
Conversation
Adds five atheris-based Python fuzz targets and a seed-corpus generator so the fuzzing harnesses live in the upstream repo alongside the code they test, as requested in google/oss-fuzz#15382. - fuzz_checker.py -- checker.check_model (raw bytes) - fuzz_model_loader.py -- load_model_from_string + check_model - fuzz_parser.py -- parser.parse_model (text format) - fuzz_shape_inference.py -- infer_shapes, raw and structured paths - fuzz_version_converter.py -- version_converter.convert_version - make_seed_corpus.py -- generates seed zips for parser and version_converter Also adds per-file-ignores in pyproject.toml for the fuzz directory to suppress ruff rules that conflict with the required atheris API (TestOneInput naming, intentional broad exception catches, etc.). Signed-off-by: Andreas Fehlner <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8052 +/- ##
==========================================
- Coverage 56.25% 56.20% -0.05%
==========================================
Files 525 525
Lines 34347 34347
Branches 2979 2979
==========================================
- Hits 19321 19304 -17
- Misses 14189 14202 +13
- Partials 837 841 +4 ☔ View full report in Codecov by Harness. |
There was a problem hiding this comment.
Pull request overview
Adds upstream OSS-Fuzz Python harnesses under onnx/fuzz/ (Atheris-based) to fuzz key ONNX entry points (checker, loader, parser, shape inference, version converter), plus a seed-corpus generator, and updates Ruff configuration to accommodate fuzz-harness patterns.
Changes:
- Added five Atheris fuzz targets for ONNX model parsing/checking/inference/version conversion.
- Added
make_seed_corpus.pyto generate zipped seed corpora for the parser and version-converter fuzzers. - Updated
pyproject.tomlRuffper-file-ignoresforonnx/fuzz/**to allow required fuzz-harness conventions (e.g.,TestOneInput, broad exception handling).
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Adds Ruff per-file ignores for the new fuzz harness directory. |
| onnx/fuzz/make_seed_corpus.py | Adds a seed-corpus zip generator for OSS-Fuzz fuzzers. |
| onnx/fuzz/fuzz_checker.py | Adds a fuzz target for checker.check_model on raw bytes. |
| onnx/fuzz/fuzz_model_loader.py | Adds a fuzz target for load_model_from_string + check_model. |
| onnx/fuzz/fuzz_parser.py | Adds a fuzz target for parser.parse_model (text format). |
| onnx/fuzz/fuzz_shape_inference.py | Adds a fuzz target for shape_inference.infer_shapes via raw and structured model paths. |
| onnx/fuzz/fuzz_version_converter.py | Adds a fuzz target for version_converter.convert_version across candidate target opsets. |
- Use Mapping[str, bytes | str] (covariant) instead of dict to fix mypy arg-type errors when passing dict[str, bytes] or dict[str, str] - Add argv count check with usage message to main() for clearer error when the script is invoked with wrong arguments Signed-off-by: Andreas Fehlner <[email protected]>
Exit code 2 is the conventional Unix code for incorrect usage (consistent with argparse and POSIX convention). Signed-off-by: Andreas Fehlner <[email protected]>
Explains what OSS-Fuzz is, how each harness works, how to run them locally, why broad exception catches and TestOneInput naming are intentional, the toggle-byte design in fuzz_shape_inference, and how to add a new harness. Signed-off-by: Andreas Fehlner <[email protected]>
* Initial plan * Port OSS-Fuzz harnesses from onnx PR #8052 to onnx_light/fuzz/ Co-authored-by: xadupre <[email protected]> * Add fuzz_optim_shape_inference for onnx_light.onnx_optim Co-authored-by: xadupre <[email protected]> * Add scheduled fuzz CI workflow Co-authored-by: xadupre <[email protected]> * Ignore missing atheris import in pyrefly Co-authored-by: xadupre <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: xadupre <[email protected]>
Signed-off-by: Andreas Fehlner <[email protected]>
There was a problem hiding this comment.
Three blockers that affect whether the harnesses actually reach real logic:
1. sys.setrecursionlimit(1000) in fuzz_shape_inference.py is a no-op. CPython's default is already 1000. It also can't guard the "known DoS": infer_shapes wraps the C++ C.infer_shapes (onnx/shape_inference.py:58), so deep-subgraph recursion overflows the C stack (SIGSEGV), never raising Python RecursionError — the except RecursionError will never fire. Drop the line + handler, or set a real Python limit and reword the comment to say it only bounds the builder.
2. fuzz_parser.py seeds don't apply. The harness reads input via FuzzedDataProvider(data).ConsumeUnicode(...), but the seeds in make_seed_corpus.py are raw UTF-8 text. ConsumeUnicode is not a UTF-8 passthrough, so the seeds won't round-trip into valid parse_model inputs. Either decode directly (data.decode("utf-8", "surrogatepass")) or pre-encode the seeds for FuzzedDataProvider.
3. No seeds for fuzz_checker / fuzz_shape_inference. make_seed_corpus.py only emits version_converter and parser zips, so these two have to randomly hit a parseable ModelProto before reaching any logic — most iterations die at parse. Add valid serialized model seeds (e.g. from onnx/backend/test/data/).
Minor: opset = fdp.ConsumeIntInRange(7, 20) misses opset 21–27 (current is 27).
Reviewed with Claude Code.
- fuzz_parser.py: replace FuzzedDataProvider.ConsumeUnicode with
data.decode("utf-8", "surrogatepass") so UTF-8 seed files round-trip
correctly into parse_model instead of being mangled by the bytemark
encoding ConsumeUnicode expects
- fuzz_shape_inference.py: remove sys.setrecursionlimit(1000) (CPython
default, no-op) and the except RecursionError handler (dead code:
infer_shapes delegates to C++ via nanobind, so deep subgraph recursion
causes a C-stack overflow, not RecursionError); fix opset range from
(7, 20) to (7, 27) to cover all released opsets
- make_seed_corpus.py: add a checker_seeds zip (third output argument)
with six valid serialized ModelProtos so fuzz_checker reaches real
validation logic instead of dying at protobuf parse on most iterations
Signed-off-by: Andreas Fehlner <[email protected]>
What and why
ONNX is a widely-used open standard for ML models. Parsing untrusted model files
(protobuf bytes, text format) and running inference on them are natural attack surfaces.
OSS-Fuzz runs continuous fuzz testing against
open-source projects to find crashes, hangs, and sanitizer violations before they reach
production.
This PR adds the upstream fuzz harnesses to the ONNX repo so they are:
The companion OSS-Fuzz infrastructure PR (google/oss-fuzz#15382)
will be updated to copy these files from
$SRC/onnx/fuzz/rather than bundlingthem in the oss-fuzz repo itself.
Harnesses added (
onnx/fuzz/)fuzz_checker.pychecker.check_modelfuzz_model_loader.pyload_model_from_string+check_modelfuzz_parser.pyparser.parse_modelfuzz_shape_inference.pyshape_inference.infer_shapesfuzz_version_converter.pyversion_converter.convert_versionmake_seed_corpus.pyREADME.mdDesign decisions worth reviewing
except Exception: return— intentional in all harnesses. Expected errors(protobuf
DecodeError,ValidationError,InferenceError, ...) must be swallowedso libFuzzer can keep running. Real bugs surface as crashes or sanitizer reports.
TestOneInputnaming — required by the atheris API. RuffN802is suppressedfor
onnx/fuzz/**inpyproject.toml.fuzz_shape_inference.pytoggle byte — a single trailing byte selectsstrict_mode,check_type, and whether to use the raw-bytes path or a structuredmodel builder that exercises the recursive subgraph visitor (If/Loop/Scan). This
lets one harness cover both paths without forking.
sys.setrecursionlimit(1000)infuzz_shape_inference.py— guards a knownunbounded-recursion DoS in shape inference with deeply-nested subgraphs, keeping
the fuzzer alive to find other bugs. Comment notes it should be removed once the
upstream fix lands.
Changes to
pyproject.tomlAdds a
per-file-ignoresblock foronnx/fuzz/**suppressing rules that conflictwith the required atheris patterns:
N802(naming),BLE001(broad except),PLR2004(magic numbers),S112/PERF203(try-except-continue in loop).Test plan
python onnx/fuzz/fuzz_checker.py -runs=1000python onnx/fuzz/make_seed_corpus.py /tmp/vc.zip /tmp/p.zipbuild.shpointing to$SRC/onnx/fuzz/