Add runtime asserts to AOTI #152125

yushangdi · 2025-04-24T18:07:12Z

Summary:
Solves #151925

Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all _assert_scalar calls in the input graph.

Example:

    def forward(self):
        arg0_1: "f32[s35]";

        arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec)
         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone()
        sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0)

         #
        mod: "Sym(Mod(s35, 100))" = sym_size_int % 100;  sym_size_int = None
        eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0;  mod = None
        _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'");  eq_2 = _assert_scalar = None

         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone()
        view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]);  arg0_1 = None
        clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view);  view = None

         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1
        add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1);  clone = None
        return (add_6,)

Generated cpp code:

    auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1);
    auto arg0_1 = std::move(inputs[0]);
    auto arg0_1_size = arg0_1.sizes();
    int64_t s35 = arg0_1_size[0];
    inputs.clear();
    auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get());
    if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); }

Test Plan:

buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint

Differential Revision: D73596786

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

pytorch-bot · 2025-04-24T18:07:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152125

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 189c7c0 with merge base d042ec8 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu) (gh) (#152916)
[ FAILED ] AotInductorTest.BasicTestCuda

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-04-24T18:07:24Z

This pull request was exported from Phabricator. Differential Revision: D73596786

henrylhtsang · 2025-04-24T18:18:09Z

I can't tell if I am being too careful. Do we want to roll this out to everything at once?

yushangdi · 2025-04-24T18:27:02Z

I can't tell if I am being too careful. Do we want to roll this out to everything at once?

@henrylhtsang Is the worry that having more assert will slow down performance of some models?

If you set TORCHINDUCTOR_SCALAR_ASSERTS=0, then no scalar asserts will be generated (it's default to 1).

Not sure what's a safer roll out plan, but I can implement it if you have any suggestion!

henrylhtsang · 2025-04-24T18:31:46Z

I can't tell if I am being too careful. Do we want to roll this out to everything at once?

@henrylhtsang Is the worry that having more assert will slow down performance of some models?

If you set TORCHINDUCTOR_SCALAR_ASSERTS=0, then no scalar asserts will be generated (it's default to 1).

Not sure what's a safer roll out plan, but I can implement it if you have any suggestion!

I wasn't too worried about perf I guess. I was just worrying this would cause some actual AOTI runs to fail immediately.

We can guard it with a config under aot_inductor? We can set on by default for OSS, then roll out in fbcode, remove the config and the guard in 1 month.

cc @chenyang78 and @desertfire for thoughts, maybe I am just too paranoid

yushangdi · 2025-04-24T18:46:42Z

I can't tell if I am being too careful. Do we want to roll this out to everything at once?

@henrylhtsang Is the worry that having more assert will slow down performance of some models?
If you set TORCHINDUCTOR_SCALAR_ASSERTS=0, then no scalar asserts will be generated (it's default to 1).
Not sure what's a safer roll out plan, but I can implement it if you have any suggestion!

I wasn't too worried about perf I guess. I was just worrying this would cause some actual AOTI runs to fail immediately.

We can guard it with a config under aot_inductor? We can set on by default for OSS, then roll out in fbcode, remove the config and the guard in 1 month.

cc @chenyang78 and @desertfire for thoughts, maybe I am just too paranoid

sure sounds good to me! Being more cautious is never wrong. Would it be better to use the justknob for the fbcode guard?

jingsh · 2025-04-24T19:22:56Z

I can't tell if I am being too careful. Do we want to roll this out to everything at once?

@henrylhtsang Is the worry that having more assert will slow down performance of some models?
If you set TORCHINDUCTOR_SCALAR_ASSERTS=0, then no scalar asserts will be generated (it's default to 1).
Not sure what's a safer roll out plan, but I can implement it if you have any suggestion!

I wasn't too worried about perf I guess. I was just worrying this would cause some actual AOTI runs to fail immediately.
We can guard it with a config under aot_inductor? We can set on by default for OSS, then roll out in fbcode, remove the config and the guard in 1 month.
cc @chenyang78 and @desertfire for thoughts, maybe I am just too paranoid

sure sounds good to me! Being more cautious is never wrong. Would it be better to use the justknob for the fbcode guard?

guard would be great!

Summary: Solves pytorch#151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `pytorch/export:aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify ``` Differential Revision: D73596786

facebook-github-bot · 2025-04-24T19:58:30Z

This pull request was exported from Phabricator. Differential Revision: D73596786

Summary: Solves pytorch#151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify ``` Differential Revision: D73596786

facebook-github-bot · 2025-04-24T20:00:57Z

This pull request was exported from Phabricator. Differential Revision: D73596786

henrylhtsang · 2025-04-24T20:01:21Z

I can't tell if I am being too careful. Do we want to roll this out to everything at once?

@henrylhtsang Is the worry that having more assert will slow down performance of some models?
If you set TORCHINDUCTOR_SCALAR_ASSERTS=0, then no scalar asserts will be generated (it's default to 1).
Not sure what's a safer roll out plan, but I can implement it if you have any suggestion!

I wasn't too worried about perf I guess. I was just worrying this would cause some actual AOTI runs to fail immediately.
We can guard it with a config under aot_inductor? We can set on by default for OSS, then roll out in fbcode, remove the config and the guard in 1 month.
cc @chenyang78 and @desertfire for thoughts, maybe I am just too paranoid

sure sounds good to me! Being more cautious is never wrong. Would it be better to use the justknob for the fbcode guard?

just a flag should be enough in my opinion, but either is fine

Summary: Solves pytorch#151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify ``` Differential Revision: D73596786

facebook-github-bot · 2025-04-24T20:07:29Z

This pull request was exported from Phabricator. Differential Revision: D73596786

yushangdi · 2025-04-24T20:09:23Z

@henrylhtsang I added both a env var "TORCHINDUCTOR_SCALAR_ASSERTS_FULL" and a justknob now. You'll need to see the internal Diff for the justknob and the flag. The env var overrides the justknob.

Summary: Solves pytorch#151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify ``` Differential Revision: D73596786

facebook-github-bot · 2025-04-24T20:22:57Z

This pull request was exported from Phabricator. Differential Revision: D73596786

Summary: Pull Request resolved: pytorch#152125 Solves pytorch#151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify ``` Differential Revision: D73596786

henrylhtsang

left a comment

yushangdi · 2025-04-25T01:07:37Z

there're some CI errors because the extra asserts incremented the buf name index, so the string match failed. Will fix.

yushangdi · 2025-04-25T01:41:06Z

also I think we might need #151919 to fix some of the CI errors related to unbacked symints in the input shape. will try again after #151919 lands

Summary: Solves #151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_unbacked_equals_input_size ``` Reviewed By: henrylhtsang Differential Revision: D73596786

facebook-github-bot · 2025-05-05T23:48:30Z

This pull request was exported from Phabricator. Differential Revision: D73596786

Summary: Solves pytorch#151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_unbacked_equals_input_size ``` Reviewed By: henrylhtsang Differential Revision: D73596786

facebook-github-bot · 2025-05-06T15:54:43Z

This pull request was exported from Phabricator. Differential Revision: D73596786

Summary: Solves pytorch#151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_unbacked_equals_input_size ``` Reviewed By: henrylhtsang Differential Revision: D73596786

facebook-github-bot · 2025-05-07T18:06:22Z

This pull request was exported from Phabricator. Differential Revision: D73596786

pytorch-bot bot added ciflow/inductor module: inductor labels Apr 24, 2025

facebook-github-bot added the fb-exported label Apr 24, 2025

yushangdi mentioned this pull request Apr 24, 2025

[feature request][AOTI] Expand check input assertions to cover input guards created during compilation? #151925

Open

yushangdi added the release notes: export label Apr 24, 2025

yushangdi force-pushed the export-D73596786 branch from 6c12ee2 to 3d7613e Compare April 24, 2025 19:58

yushangdi force-pushed the export-D73596786 branch from 3d7613e to 08886d3 Compare April 24, 2025 20:00

yushangdi force-pushed the export-D73596786 branch from 08886d3 to 4515b05 Compare April 24, 2025 20:07

yushangdi force-pushed the export-D73596786 branch from 4515b05 to 03cf605 Compare April 24, 2025 20:18

yushangdi force-pushed the export-D73596786 branch from 03cf605 to 7213af2 Compare April 24, 2025 20:23

henrylhtsang approved these changes Apr 24, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 24, 2025

jingsh approved these changes Apr 24, 2025

View reviewed changes

yushangdi force-pushed the export-D73596786 branch from 7213af2 to 5731db2 Compare May 5, 2025 23:48

yushangdi force-pushed the export-D73596786 branch from 5731db2 to b93b781 Compare May 6, 2025 15:54

yushangdi force-pushed the export-D73596786 branch from b93b781 to 189c7c0 Compare May 7, 2025 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add runtime asserts to AOTI #152125

Add runtime asserts to AOTI #152125

yushangdi commented Apr 24, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 24, 2025 •

edited

Loading

facebook-github-bot commented Apr 24, 2025

henrylhtsang commented Apr 24, 2025 •

edited

Loading

yushangdi commented Apr 24, 2025 •

edited

Loading

henrylhtsang commented Apr 24, 2025

yushangdi commented Apr 24, 2025

jingsh commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

henrylhtsang commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

yushangdi commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

henrylhtsang left a comment

yushangdi commented Apr 25, 2025 •

edited

Loading

yushangdi commented Apr 25, 2025

facebook-github-bot commented May 5, 2025

facebook-github-bot commented May 6, 2025

facebook-github-bot commented May 7, 2025

Add runtime asserts to AOTI #152125

Are you sure you want to change the base?

Add runtime asserts to AOTI #152125

Conversation

yushangdi commented Apr 24, 2025 • edited by pytorch-bot bot Loading

pytorch-bot bot commented Apr 24, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152125

✅ You can merge normally! (1 Unrelated Failure)

facebook-github-bot commented Apr 24, 2025

henrylhtsang commented Apr 24, 2025 • edited Loading

yushangdi commented Apr 24, 2025 • edited Loading

henrylhtsang commented Apr 24, 2025

yushangdi commented Apr 24, 2025

jingsh commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

henrylhtsang commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

yushangdi commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

henrylhtsang left a comment

Choose a reason for hiding this comment

yushangdi commented Apr 25, 2025 • edited Loading

yushangdi commented Apr 25, 2025

facebook-github-bot commented May 5, 2025

facebook-github-bot commented May 6, 2025

facebook-github-bot commented May 7, 2025

yushangdi commented Apr 24, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 24, 2025 •

edited

Loading

henrylhtsang commented Apr 24, 2025 •

edited

Loading

yushangdi commented Apr 24, 2025 •

edited

Loading

yushangdi commented Apr 25, 2025 •

edited

Loading