Fix external data bug in version converter #6847

yuanyao-nv · 2025-03-31T21:11:51Z

Description

There is currently a bug in the convert.cc::conver_version() function for models with external data. This function calls ir_pb_converter.cc::ImportModelProto(), which calls ir_pb_converter.cc::graphProtoToGraph(), which calls ir_pb_converter.cc::tensorProtoToTensor(). But tensorProtoToTensor() doesn't handle external data. While TensorProto has external data support, the struct in tensor.h::Tensor does not. The version converter works on a Graph and not a GraphProto, so the conversion to Tensor struct destroys the external data references. This PR adds the missing logic for external data.

The change can be verified on a simple model with external data:

import onnx_graphsurgeon as gs
import onnx
import numpy as np

dtype = np.float32
X = gs.Variable(name="X", dtype=dtype, shape=(1,1))

Y = gs.Constant(name="Y", values=np.zeros((20000,30000), dtype=dtype))
Add_out = gs.Variable(name="Add_out", dtype=dtype)

node_identity = gs.Node(op="Add", inputs=[X, Y], outputs=[Add_out])

graph = gs.Graph(nodes=[node_identity], inputs=[X, Y], outputs=[Add_out], opset=20)

model = gs.export_onnx(graph)
onnx.save(model, "test.onnx", save_as_external_data=True)

Motivation and Context

#6529

Signed-off-by: Yuan Yao <[email protected]>

codecov · 2025-03-31T21:18:01Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 56.31%. Comparing base (95ecc67) to head (5aa6ba1).
Report is 7 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6847      +/-   ##
==========================================
+ Coverage   56.28%   56.31%   +0.03%     
==========================================
  Files         509      509              
  Lines       32580    32606      +26     
  Branches     3099     3099              
==========================================
+ Hits        18337    18363      +26     
  Misses      13385    13385              
  Partials      858      858

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

justinchuby · 2025-03-31T21:37:07Z

Could you add a test in https://github.com/onnx/onnx/blob/main/onnx/test/version_converter_test.py

yuanyao-nv · 2025-03-31T22:36:36Z

Could you add a test in https://github.com/onnx/onnx/blob/main/onnx/test/version_converter_test.py

I'm not sure what's a good way to create external data with the ONNX API. I tried running the following, but no external data is created. If I increase the shape to (20000,30000) it takes a long time and goes out of memory. Could be a bug somewhere. Any thoughts?

import onnx
import numpy as np

random_shape = (200,300)
random_data = np.random.rand(*random_shape).astype(np.float32)

# Create two initializer nodes: one for the data tensor, second is just a scalar 1
initializer_tensor = onnx.helper.make_tensor(
    name="initializer_tensor",
    data_type=onnx.TensorProto.FLOAT,
    dims=list(random_shape),
    vals=random_data.flatten(),
)
initializer_scalar = onnx.helper.make_tensor(
    name="initializer_scalar",
    data_type=onnx.TensorProto.FLOAT,
    dims=[],
    vals=[1.0],
)

# Define a graph with simple addition
add_node = onnx.helper.make_node(
    "Add",
    inputs=["initializer_tensor", "initializer_scalar"],
    outputs=["sum_output"]
)

graph_def = onnx.helper.make_graph(
    name="SimpleAddition",
    nodes=[add_node],
    inputs=[],
    outputs=[
        onnx.helper.make_tensor_value_info("sum_output", onnx.TensorProto.FLOAT, list(random_shape))
    ],
    initializer=[initializer_tensor, initializer_scalar]
)

# Save model to file
model_filename = "test_simple_add.onnx"
opset_imports=[onnx.helper.make_opsetid("", 21)]
model_def = onnx.helper.make_model(graph_def, opset_imports=opset_imports)
model_def.ir_version = 10
onnx.save_model(model_def, model_filename, save_as_external_data=True, all_tensors_to_one_file=True, location="data", size_threshold=0, convert_attribute=False)

The example in the description uses onnx-graphsurgeon to create the model. But we probably don't want to introduce another dependency.

justinchuby · 2025-03-31T22:50:42Z

I think you need to set raw=True in make_tensor, according to

onnx/onnx/external_data_helper.py

Line 154 in 95ecc67

tensor.HasField("raw_data")

Admittedly there are more work to be done in the helpers.

Signed-off-by: Yuan Yao <[email protected]>

onnx/test/version_converter_test.py

Signed-off-by: Yuan Yao <[email protected]>

onnx/test/version_converter_test.py

Signed-off-by: Yuan Yao <[email protected]>

onnx/test/version_converter_test.py

Signed-off-by: Yuan Yao <[email protected]>

justinchuby

Thank you!

jywu-msft · 2025-04-30T02:50:04Z

@justinchuby @andife @ramkrishna2910
this isn't included in rel-1.18.0 ? it enable quantization of large models

justinchuby · 2025-04-30T03:08:15Z

@jywu-msft Looks like it was merged after branch cut. To use the version converter for large models, consider

import onnxscript.version_converter
from onnxscript as ir

model = ir.load("model.onnx")  # Or use `model = ir.from_proto(model_proto)` if the model is from a proto already
onnxscript.version_converter.convert_version(model, target_version=21, fallback=True)
model.ir_version = 10  # Enable support for INT4
ir.save(model, "converted.onnx")
# or model_proto = ir.to_proto(model)

If desired we can update the api to accept a protobuf as well.

The same logic is implemented in Olive: https://github.com/microsoft/Olive/blob/422353ecb6ca724495e7a6f366fb53a8b3146818/olive/passes/onnx/conversion.py#L662

bas-aarts · 2025-05-06T21:52:52Z

thanks @justinchuby.
I cannot get the onnxscript converter to work correctly though. it generates incorrect models for trivial examples. Guess I'll wait till this fix lands in a release

justinchuby · 2025-05-07T02:57:30Z

@bas-aarts could you share a concrete example? If anything doesn’t work it is a bug we need to fix.

fix external data bug in version converter

9a336de

Signed-off-by: Yuan Yao <[email protected]>

yuanyao-nv requested a review from a team as a code owner March 31, 2025 21:11

github-project-automation bot added this to PR Tracker Mar 31, 2025

github-project-automation bot moved this to In progress in PR Tracker Mar 31, 2025

yuanyao-nv mentioned this pull request Mar 31, 2025

How to use version converter on a large (>2GB) model? #6529

Closed

add test

2fd49bd

Signed-off-by: Yuan Yao <[email protected]>