Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fail to convert ONNX models with dynamic inputs #713

@jricker2

Description

@jricker2

Before submitting an issue, please make sure it hasn't been already addressed by searching through the existing and past issues.

Describe the bug

Found that after upgrading to 0.40 tag a model conversion workflow which previously worked began failing. This is due to the additional input validation checks added to reference runner:

.

When model input is something like ['batch', 1, 1, 1], input_shape in ReferenceRunner becomes [0, 1, 1, 1]. Only dim_value field is being checked, which does not exist for a dynamic input (instead it is dim_param field), this results in 0 value for all dynamic inputs. This is later checked exactly against the incoming calibration data and results in input validation failing.

A side note, args for calibration_data in convert_to_mixed_precision are just "str" | None.

calibration_data: str | None = None,

ReferenceRunner does support passing an OrderedDict
elif isinstance(inputs, (dict, OrderedDict)):

It is nicer as a user to just pass through a Dict in-memory rather than creating a file just to pass the path along, updating the calibration_data args to match what ReferenceRunner can use would be nice.

Steps/Code to reproduce bug

import numpy as np
import onnx
from modelopt.onnx.autocast.convert import convert_to_mixed_precision

input = onnx.helper.make_tensor_value_info(
"input", onnx.TensorProto.FLOAT, ["batch", 1]
)
output = onnx.helper.make_tensor_value_info(
"output", onnx.TensorProto.FLOAT, ["batch", 1]
)

node = onnx.helper.make_node(
"Relu",
["input"],
["output"],
)
graph = onnx.helper.make_graph(
nodes=[node],
name="foo",
inputs=[input],
outputs=[output],
initializer=[],
)
model = onnx.helper.make_model(
graph,
)
onnx.save(model, "foo.onnx")
input_arr = np.ones((1, 1),dtype=np.float32)
np.savez("calibration_data.npz", input=input_arr)
convert_to_mixed_precision("foo.onnx", calibration_data="calibration_data.npz")
`

Expected behavior

Expect the above reproducer to pass, currently get ValueError on the dimension check
ValueError: Input shape from 'input' does not match provided input shape: [0, 1] vs [1, 1]. Please make sure that your calibration data matches the ONNX input shapes

Who can help?

  • ?

System information

  • Container used (if applicable): n/a
  • OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): SUSE Linux Enterprise Server 15 SP6
  • CPU architecture (x86_64, aarch64): x86_64
  • GPU name (e.g. H100, A100, L40S): NVIDIA RTX A4000
  • GPU memory size: 15.0 GB
  • Number of GPUs: 1
  • Library versions (if applicable):
    • Python: 3.12.11
    • ModelOpt version or commit hash: 0.40.0
    • CUDA: 13.0
    • PyTorch: n/a
    • Transformers: n/a
    • TensorRT-LLM: ?
    • ONNXRuntime: 1.23.2
    • TensorRT: 10.13

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions