-
Notifications
You must be signed in to change notification settings - Fork 228
Description
Before submitting an issue, please make sure it hasn't been already addressed by searching through the existing and past issues.
Describe the bug
Found that after upgrading to 0.40 tag a model conversion workflow which previously worked began failing. This is due to the additional input validation checks added to reference runner:
| self.input_shapes = { |
When model input is something like ['batch', 1, 1, 1], input_shape in ReferenceRunner becomes [0, 1, 1, 1]. Only dim_value field is being checked, which does not exist for a dynamic input (instead it is dim_param field), this results in 0 value for all dynamic inputs. This is later checked exactly against the incoming calibration data and results in input validation failing.
A side note, args for calibration_data in convert_to_mixed_precision are just "str" | None.
| calibration_data: str | None = None, |
ReferenceRunner does support passing an OrderedDict
| elif isinstance(inputs, (dict, OrderedDict)): |
It is nicer as a user to just pass through a Dict in-memory rather than creating a file just to pass the path along, updating the calibration_data args to match what ReferenceRunner can use would be nice.
Steps/Code to reproduce bug
import numpy as np
import onnx
from modelopt.onnx.autocast.convert import convert_to_mixed_precision
input = onnx.helper.make_tensor_value_info(
"input", onnx.TensorProto.FLOAT, ["batch", 1]
)
output = onnx.helper.make_tensor_value_info(
"output", onnx.TensorProto.FLOAT, ["batch", 1]
)
node = onnx.helper.make_node(
"Relu",
["input"],
["output"],
)
graph = onnx.helper.make_graph(
nodes=[node],
name="foo",
inputs=[input],
outputs=[output],
initializer=[],
)
model = onnx.helper.make_model(
graph,
)
onnx.save(model, "foo.onnx")
input_arr = np.ones((1, 1),dtype=np.float32)
np.savez("calibration_data.npz", input=input_arr)
convert_to_mixed_precision("foo.onnx", calibration_data="calibration_data.npz")
`
Expected behavior
Expect the above reproducer to pass, currently get ValueError on the dimension check
ValueError: Input shape from 'input' does not match provided input shape: [0, 1] vs [1, 1]. Please make sure that your calibration data matches the ONNX input shapes
Who can help?
- ?
System information
- Container used (if applicable): n/a
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): SUSE Linux Enterprise Server 15 SP6
- CPU architecture (x86_64, aarch64): x86_64
- GPU name (e.g. H100, A100, L40S): NVIDIA RTX A4000
- GPU memory size: 15.0 GB
- Number of GPUs: 1
- Library versions (if applicable):
- Python: 3.12.11
- ModelOpt version or commit hash: 0.40.0
- CUDA: 13.0
- PyTorch: n/a
- Transformers: n/a
- TensorRT-LLM: ?
- ONNXRuntime: 1.23.2
- TensorRT: 10.13