Visual ONNX model inspection, pruning, patching, quantization, and optimization toolkit. 15+ CLI commands for production model surgery.
- Model inspection — Detailed model info, graph visualization, node-level statistics
- Pruning — Remove nodes, inputs, outputs, and unused branches
- Strip — Remove training metadata, doc strings, and non-essential data
- Validation — Check model integrity, shape consistency, and runtime errors
- Analysis — FLOP counting, parameter counting, tensor shape analysis
- Diff — Compare two models and show structural changes
- Extract — Extract subgraphs by node name or pattern
- Simplify — Fold constants, fuse operations, remove identity nodes
- Rename — Batch rename nodes, inputs, and outputs
- Report — Generate comprehensive HTML analysis reports
- Quantization — FP16 half-precision and INT8 dynamic/static quantization
- Diff Report — Interactive HTML diff visualization between model versions
- JSON export — Full model metadata as JSON
pip install onnx-model-surgery
# Show model info
oms info model.onnx
# Print model graph
oms graph model.onnx
# Model statistics
oms stats model.onnx
# Validate model
oms validate model.onnx
# Count FLOPs
oms flops model.onnx
# Prune unused nodes
oms prune model.onnx --output pruned.onnx
# Diff two models
oms diff model-v1.onnx model-v2.onnx
# Strip training metadata
oms strip model.onnx --output stripped.onnx
# Extract subgraph
oms extract model.onnx --nodes "Conv_3,Relu_3,Conv_4" --output subgraph.onnx
# Rename nodes
oms rename model.onnx --map "Conv_3:conv3,Relu_3:relu3" --output renamed.onnx
# Generate report
oms report model.onnx --output report.html
# JSON export
oms json model.onnx --output model.json
# --- v0.3.0+ ---
# FP16 quantization (pure ONNX, no extra deps)
oms quantize model.onnx --mode fp16 --output model_fp16.onnx
# INT8 dynamic quantization (requires onnxruntime)
oms quantize model.onnx --mode int8 --output model_int8.onnx
# INT8 static quantization (requires calibration data)
oms quantize model.onnx --mode int8-static --calibrate inputs.npz --output model_int8.onnx
# Interactive HTML diff report
oms diff-report model-v1.onnx model-v2.onnx --output diff.htmlRepresentative results on a ResNet-50 baseline (ImageNet, batch size 1, Intel Xeon CPU):
| Mode | Model Size | Size Reduction | Inference Latency | Top-1 Accuracy Drop |
|---|---|---|---|---|
| FP32 (baseline) | 97.8 MB | — | 28.4 ms | — |
| FP16 | 48.9 MB | -50% | 19.1 ms (-33%) | < 0.1% |
| INT8 Dynamic | 24.5 MB | -75% | 14.2 ms (-50%) | ~0.5% |
| INT8 Static | 24.5 MB | -75% | 11.8 ms (-58%) | ~0.3% |
Results are representative estimates for illustrative purposes. Actual gains vary by model architecture, hardware, and workload.
| Command | Description |
|---|---|
oms info [model] |
Display model metadata |
oms graph [model] |
Visual graph representation |
oms stats [model] |
Node-level statistics |
oms prune [model] |
Remove unused nodes |
oms strip [model] |
Remove training metadata |
oms validate [model] |
Model integrity check |
oms json [model] |
Export as JSON |
oms flops [model] |
FLOPs estimation |
oms diff [a] [b] |
Compare two models |
oms extract [model] |
Subgraph extraction |
oms simplify [model] |
Graph simplification |
oms report [model] |
HTML analysis report |
oms rename [model] |
Batch node renaming |
oms quantize [model] |
FP16 / INT8 quantization |
oms diff-report [a] [b] |
Interactive HTML diff report |
Reduces model size by ~50% with minimal accuracy loss. Works on any ONNX model without external dependencies.
# Basic FP16 conversion
oms quantize model.onnx --mode fp16 --output model_fp16.onnx
# Using the Python API
from onnx_surgery.tools.quantize import convert_to_fp16
from onnx_surgery.core import load_model
model = load_model("model.onnx")
fp16_model = convert_to_fp16(model)
onnx.save(fp16_model, "model_fp16.onnx")Quantizes weights to INT8 with dynamic activation ranges. No calibration data needed.
# Requires onnxruntime
pip install onnx-model-surgery[quant]
oms quantize model.onnx --mode int8 --output model_int8.onnxQuantizes both weights and activations using representative calibration data. Highest compression ratio.
oms quantize model.onnx --mode int8-static \
--calibrate calibration_data.npz \
--output model_quant.onnxCompare two ONNX models visually:
# Generate an interactive HTML diff report
oms diff-report model_v1.onnx model_v2.onnx --output diff.html
# Using the Python API
from onnx_surgery.tools.diff_report import generate_diff_html
from onnx_surgery.core import load_model
a = load_model("v1.onnx")
b = load_model("v2.onnx")
html = generate_diff_html(a, b, title="v1 vs v2")
Path("diff.html").write_text(html)The report shows:
- Summary cards with node/input/output/parameter deltas
- Colour-coded added / removed / changed node tables
- Operator distribution bar charts
- Input / output shape changes
from onnx_surgery.core import load_model, model_summary, SurgeryGraph, ascii_graph
from onnx_surgery.tools.prune import prune_nodes, strip_initializers
from onnx_surgery.tools.export import validate, optimize
from onnx_surgery.tools.inspect import inspect, to_json
from onnx_surgery.tools.flops import estimate_flops
from onnx_surgery.tools.diff import diff
from onnx_surgery.tools.extract import extract_subgraph
from onnx_surgery.tools.simplify import simplify
from onnx_surgery.tools.report import generate_html_report
from onnx_surgery.tools.quantize import convert_to_fp16, quantize_int8, quantize_model_file
from onnx_surgery.tools.diff_report import generate_diff_html
model = load_model("model.onnx")
print(inspect(model, detailed=True))onnx-model-surgery/
├── onnx_surgery/
│ ├── cli/
│ │ ├── main.py # CLI entry point (15 commands)
│ │ └── __init__.py
│ ├── core/
│ │ ├── graph.py # SurgeryGraph — traversable ONNX graph
│ │ ├── model_loader.py # load_model, model_summary, list_* utils
│ │ ├── visualization.py # ASCII graph, Graphviz DOT, op_stats
│ │ └── __init__.py
│ ├── tools/
│ │ ├── prune.py # Node/initializer pruning
│ │ ├── patch.py # Tensor renaming, attribute patching
│ │ ├── inspect.py # Pretty-print and JSON export
│ │ ├── export.py # Save, validate, optimize
│ │ ├── flops.py # FLOPs & parameter estimation
│ │ ├── diff.py # Structural model comparison
│ │ ├── extract.py # Subgraph extraction
│ │ ├── simplify.py # Constant folding, BN fusion
│ │ ├── report.py # HTML analysis report generator
│ │ ├── quantize.py # INT8/FP16 quantization
│ │ ├── diff_report.py # Interactive HTML diff report
│ │ └── __init__.py
│ └── __init__.py
├── tests/
│ └── test_core.py # 12+ tests, CI-passing
├── pyproject.toml
├── CHANGELOG.md
├── LICENSE
└── README.md
MIT — see LICENSE.