Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@yuanyao-nv
Copy link
Contributor

Description

There is currently a bug in the convert.cc::conver_version() function for models with external data. This function calls ir_pb_converter.cc::ImportModelProto(), which calls ir_pb_converter.cc::graphProtoToGraph(), which calls ir_pb_converter.cc::tensorProtoToTensor(). But tensorProtoToTensor() doesn't handle external data. While TensorProto has external data support, the struct in tensor.h::Tensor does not. The version converter works on a Graph and not a GraphProto, so the conversion to Tensor struct destroys the external data references. This PR adds the missing logic for external data.

The change can be verified on a simple model with external data:

import onnx_graphsurgeon as gs
import onnx
import numpy as np

dtype = np.float32
X = gs.Variable(name="X", dtype=dtype, shape=(1,1))

Y = gs.Constant(name="Y", values=np.zeros((20000,30000), dtype=dtype))
Add_out = gs.Variable(name="Add_out", dtype=dtype)

node_identity = gs.Node(op="Add", inputs=[X, Y], outputs=[Add_out])

graph = gs.Graph(nodes=[node_identity], inputs=[X, Y], outputs=[Add_out], opset=20)

model = gs.export_onnx(graph)
onnx.save(model, "test.onnx", save_as_external_data=True)

Motivation and Context

#6529

@codecov
Copy link

codecov bot commented Mar 31, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 56.31%. Comparing base (95ecc67) to head (5aa6ba1).
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6847      +/-   ##
==========================================
+ Coverage   56.28%   56.31%   +0.03%     
==========================================
  Files         509      509              
  Lines       32580    32606      +26     
  Branches     3099     3099              
==========================================
+ Hits        18337    18363      +26     
  Misses      13385    13385              
  Partials      858      858              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@justinchuby
Copy link
Member

@yuanyao-nv
Copy link
Contributor Author

Could you add a test in https://github.com/onnx/onnx/blob/main/onnx/test/version_converter_test.py

I'm not sure what's a good way to create external data with the ONNX API. I tried running the following, but no external data is created. If I increase the shape to (20000,30000) it takes a long time and goes out of memory. Could be a bug somewhere. Any thoughts?

import onnx
import numpy as np

random_shape = (200,300)
random_data = np.random.rand(*random_shape).astype(np.float32)

# Create two initializer nodes: one for the data tensor, second is just a scalar 1
initializer_tensor = onnx.helper.make_tensor(
    name="initializer_tensor",
    data_type=onnx.TensorProto.FLOAT,
    dims=list(random_shape),
    vals=random_data.flatten(),
)
initializer_scalar = onnx.helper.make_tensor(
    name="initializer_scalar",
    data_type=onnx.TensorProto.FLOAT,
    dims=[],
    vals=[1.0],
)

# Define a graph with simple addition
add_node = onnx.helper.make_node(
    "Add",
    inputs=["initializer_tensor", "initializer_scalar"],
    outputs=["sum_output"]
)

graph_def = onnx.helper.make_graph(
    name="SimpleAddition",
    nodes=[add_node],
    inputs=[],
    outputs=[
        onnx.helper.make_tensor_value_info("sum_output", onnx.TensorProto.FLOAT, list(random_shape))
    ],
    initializer=[initializer_tensor, initializer_scalar]
)

# Save model to file
model_filename = "test_simple_add.onnx"
opset_imports=[onnx.helper.make_opsetid("", 21)]
model_def = onnx.helper.make_model(graph_def, opset_imports=opset_imports)
model_def.ir_version = 10
onnx.save_model(model_def, model_filename, save_as_external_data=True, all_tensors_to_one_file=True, location="data", size_threshold=0, convert_attribute=False)

The example in the description uses onnx-graphsurgeon to create the model. But we probably don't want to introduce another dependency.

@justinchuby
Copy link
Member

I think you need to set raw=True in make_tensor, according to

tensor.HasField("raw_data")

Admittedly there are more work to be done in the helpers.

Signed-off-by: Yuan Yao <[email protected]>
@yuanyao-nv yuanyao-nv force-pushed the dev-version-converter branch from dad3e41 to b56ad09 Compare April 1, 2025 17:11
Copy link
Member

@justinchuby justinchuby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@github-project-automation github-project-automation bot moved this from In progress to Reviewer approved in PR Tracker Apr 1, 2025
@justinchuby justinchuby enabled auto-merge April 1, 2025 18:22
@justinchuby justinchuby added the module: ir (c++) The IR component in C++ label Apr 1, 2025
@justinchuby justinchuby added this pull request to the merge queue Apr 1, 2025
Merged via the queue into onnx:main with commit 6ddee87 Apr 1, 2025
48 of 61 checks passed
@github-project-automation github-project-automation bot moved this from Reviewer approved to Done in PR Tracker Apr 1, 2025
@jywu-msft
Copy link

@justinchuby @andife @ramkrishna2910
this isn't included in rel-1.18.0 ? it enable quantization of large models

@justinchuby
Copy link
Member

justinchuby commented Apr 30, 2025

@jywu-msft Looks like it was merged after branch cut. To use the version converter for large models, consider

import onnxscript.version_converter
from onnxscript as ir

model = ir.load("model.onnx")  # Or use `model = ir.from_proto(model_proto)` if the model is from a proto already
onnxscript.version_converter.convert_version(model, target_version=21, fallback=True)
model.ir_version = 10  # Enable support for INT4
ir.save(model, "converted.onnx")
# or model_proto = ir.to_proto(model)

If desired we can update the api to accept a protobuf as well.

The same logic is implemented in Olive: https://github.com/microsoft/Olive/blob/422353ecb6ca724495e7a6f366fb53a8b3146818/olive/passes/onnx/conversion.py#L662

@bas-aarts
Copy link

thanks @justinchuby.
I cannot get the onnxscript converter to work correctly though. it generates incorrect models for trivial examples. Guess I'll wait till this fix lands in a release

@justinchuby
Copy link
Member

@bas-aarts could you share a concrete example? If anything doesn’t work it is a bug we need to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: ir (c++) The IR component in C++ topic: bug fix

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants