Add runtime asserts to AOTI #152125

yushangdi · 2025-04-24T18:07:12Z

Summary:
Solves #151925

Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all _assert_scalar calls in the input graph.

Also factored out the run time assertion logic to a separate function.

    We need to generate runtime asserts directly in Inductor instead
    of just re-using the asserts from input graphs becase we reuse the
    same ShapeEnv as before. In particular, on subsequent graph passes,
    we would immediately turn all of these assertions into noops,
    because when we evaluated their expressions, we would see that
    because we had a deferred runtime assert in the ShapeEnv, we
    know "oh, of course this expression is True" already.
    One example is below:

        class Model(torch.nn.Module):
            def forward(self, a, b, c):
                nz = torch.nonzero(a)
                ones = a.new_ones([nz.size(0), b.size(0)])
                torch._check(ones.size(0) >= 1)
                equals = torch.add(ones, c)
                return equals
        torch._dynamo.mark_dynamic(c, 0)

    When we re-use the ShapeEnv in Inductor lowering, the check that checks
    a and nonzero have the same shape would be evaluted to True after we resolve
    unbacked bindings using the ShapeEnv.
    See test_unbacked_equals_input_size_runtime_assertion in test_aot_inductor.
    
    
    In addition to the Inductor generated runtime asserts, we also
    need the runtime asserts from the input graph, because some derived
    runtime asserts are not generated in Inductor. One example is
    below:

        class Model(torch.nn.Module):
            def forward(self, x):
                y = x.reshape(100, -1).clone()
                y = y + 1
                return y
        
        dynamic_shapes = {
            "x": {0: torch.export.Dim.DYNAMIC},
        }
        x.shape[0] needs to be a multiple of 100.

    See test_aoti_runtime_asserts_backed_symint in test_aot_inductor.

Example:

    def forward(self):
        arg0_1: "f32[s35]";

        arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec)
         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone()
        sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0)

         #
        mod: "Sym(Mod(s35, 100))" = sym_size_int % 100;  sym_size_int = None
        eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0;  mod = None
        _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'");  eq_2 = _assert_scalar = None

         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone()
        view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]);  arg0_1 = None
        clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view);  view = None

         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1
        add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1);  clone = None
        return (add_6,)

Generated cpp code:

    auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1);
    auto arg0_1 = std::move(inputs[0]);
    auto arg0_1_size = arg0_1.sizes();
    int64_t s35 = arg0_1_size[0];
    inputs.clear();
    auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get());
    if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); }

Test Plan:

buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint

Differential Revision: D73596786

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

pytorch-bot · 2025-04-24T18:07:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152125

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 189c7c0 with merge base d042ec8 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu) (gh) (#152916)
[ FAILED ] AotInductorTest.BasicTestCuda

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-04-24T18:07:24Z

This pull request was exported from Phabricator. Differential Revision: D73596786

henrylhtsang · 2025-04-24T18:18:09Z

I can't tell if I am being too careful. Do we want to roll this out to everything at once?

yushangdi · 2025-04-24T18:27:02Z

I can't tell if I am being too careful. Do we want to roll this out to everything at once?

@henrylhtsang Is the worry that having more assert will slow down performance of some models?

If you set TORCHINDUCTOR_SCALAR_ASSERTS=0, then no scalar asserts will be generated (it's default to 1).

Not sure what's a safer roll out plan, but I can implement it if you have any suggestion!

henrylhtsang · 2025-04-24T18:31:46Z

I can't tell if I am being too careful. Do we want to roll this out to everything at once?

@henrylhtsang Is the worry that having more assert will slow down performance of some models?

If you set TORCHINDUCTOR_SCALAR_ASSERTS=0, then no scalar asserts will be generated (it's default to 1).

Not sure what's a safer roll out plan, but I can implement it if you have any suggestion!

I wasn't too worried about perf I guess. I was just worrying this would cause some actual AOTI runs to fail immediately.

We can guard it with a config under aot_inductor? We can set on by default for OSS, then roll out in fbcode, remove the config and the guard in 1 month.

cc @chenyang78 and @desertfire for thoughts, maybe I am just too paranoid

yushangdi · 2025-04-24T18:46:42Z

I can't tell if I am being too careful. Do we want to roll this out to everything at once?

@henrylhtsang Is the worry that having more assert will slow down performance of some models?
If you set TORCHINDUCTOR_SCALAR_ASSERTS=0, then no scalar asserts will be generated (it's default to 1).
Not sure what's a safer roll out plan, but I can implement it if you have any suggestion!

I wasn't too worried about perf I guess. I was just worrying this would cause some actual AOTI runs to fail immediately.

We can guard it with a config under aot_inductor? We can set on by default for OSS, then roll out in fbcode, remove the config and the guard in 1 month.

cc @chenyang78 and @desertfire for thoughts, maybe I am just too paranoid

sure sounds good to me! Being more cautious is never wrong. Would it be better to use the justknob for the fbcode guard?

jingsh · 2025-04-24T19:22:56Z

I can't tell if I am being too careful. Do we want to roll this out to everything at once?

@henrylhtsang Is the worry that having more assert will slow down performance of some models?
If you set TORCHINDUCTOR_SCALAR_ASSERTS=0, then no scalar asserts will be generated (it's default to 1).
Not sure what's a safer roll out plan, but I can implement it if you have any suggestion!

I wasn't too worried about perf I guess. I was just worrying this would cause some actual AOTI runs to fail immediately.
We can guard it with a config under aot_inductor? We can set on by default for OSS, then roll out in fbcode, remove the config and the guard in 1 month.
cc @chenyang78 and @desertfire for thoughts, maybe I am just too paranoid

sure sounds good to me! Being more cautious is never wrong. Would it be better to use the justknob for the fbcode guard?

guard would be great!

facebook-github-bot · 2025-04-24T19:58:30Z

This pull request was exported from Phabricator. Differential Revision: D73596786

facebook-github-bot · 2025-04-24T20:00:57Z

This pull request was exported from Phabricator. Differential Revision: D73596786

henrylhtsang · 2025-04-24T20:01:21Z

I can't tell if I am being too careful. Do we want to roll this out to everything at once?

@henrylhtsang Is the worry that having more assert will slow down performance of some models?
If you set TORCHINDUCTOR_SCALAR_ASSERTS=0, then no scalar asserts will be generated (it's default to 1).
Not sure what's a safer roll out plan, but I can implement it if you have any suggestion!

I wasn't too worried about perf I guess. I was just worrying this would cause some actual AOTI runs to fail immediately.
We can guard it with a config under aot_inductor? We can set on by default for OSS, then roll out in fbcode, remove the config and the guard in 1 month.
cc @chenyang78 and @desertfire for thoughts, maybe I am just too paranoid

sure sounds good to me! Being more cautious is never wrong. Would it be better to use the justknob for the fbcode guard?

just a flag should be enough in my opinion, but either is fine

facebook-github-bot · 2025-04-24T20:07:29Z

This pull request was exported from Phabricator. Differential Revision: D73596786

yushangdi · 2025-04-24T20:09:23Z

@henrylhtsang I added both a env var "TORCHINDUCTOR_SCALAR_ASSERTS_FULL" and a justknob now. You'll need to see the internal Diff for the justknob and the flag. The env var overrides the justknob.

facebook-github-bot · 2025-04-24T20:22:57Z

This pull request was exported from Phabricator. Differential Revision: D73596786

Summary: Pull Request resolved: pytorch#152125 Solves pytorch#151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify ``` Differential Revision: D73596786

henrylhtsang

left a comment

yushangdi · 2025-05-07T23:19:45Z

torch/_inductor/graph.py

+        return result

-        # Emit code for runtime asserts that can be inserted at this point.
-        for i0 in new_unbacked_defs:


refactored into function create_deferred_runtime_asserts

henrylhtsang · 2025-05-07T23:29:32Z

Thanks a lot for the great work!

facebook-github-bot · 2025-05-08T00:20:06Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2025-05-08T00:21:48Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

facebook-github-bot · 2025-05-08T15:56:23Z

@pytorchbot revert -m="Diff reverted internally" -c="ghfirst"

This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).)

pytorchmergebot · 2025-05-08T15:58:10Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit 834bc5e. Reverted #152125 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](#152125 (comment)))

pytorchmergebot · 2025-05-08T15:58:22Z

@yushangdi your PR has been successfully reverted.

laithsakka · 2025-05-08T16:55:31Z

torch/_utils_internal.py

    return True


+def full_aoti_runtime_assert() -> bool:


is this meant to be a config controlled? its always true?

is this meant to be a config controlled? its always true?

it's always True in OSS, internally, we have an env variable and a justknob to control it. Please see the internal Diff for those.

laithsakka · 2025-05-08T18:12:49Z

torch/_inductor/graph.py

+        ):
+            node_args, _ = self.fetch_args_kwargs_from_env(n)
+            # some assert may have been captured by unbacked symint assertion
+            if node_args[0] != True:  # noqa: E712


mmm i do not understand why we skip it if its True?
i mean all of the runtime assertions should be true no? except when reasoning is not smart eough to figure it out
can you share an example of such thing where we ignore those.

mmm i do not understand why we skip it if its True? i mean all of the runtime assertions should be true no? except when reasoning is not smart eough to figure it out can you share an example of such thing where we ignore those.

Do you mean an example where node_args[0] is True or an example where node_args[0] is not True?
test_unbacked_equals_input_size_runtime_assertion has an example where we get True, and test_aoti_runtime_asserts_backed_symint has an example where we get an actual expression.

laithsakka · 2025-05-08T18:14:49Z

torch/_inductor/graph.py

+        if (
+            full_aoti_runtime_assert()
+            and n.target == torch.ops.aten._assert_scalar.default
+            and self.aot_mode


thinking out loud, there must be a reason why for unbacked we did not just depend on torch.ops.aten._assert_scalar.default to generate the runtime assert.
the same reason which idk what it is might apply here?

thinking out loud, there must be a reason why for unbacked we did not just depend on torch.ops.aten._assert_scalar.default to generate the runtime assert. the same reason which idk what it is might apply here?

I added some comments in #153182 explaining why. Ed also explains it in the summary of #124874.

The problem now is that the inductor generated runtime assertions do not contain all assertions in the exported program. Another potential solution is to figure out why the inductor generated assertions are missing some assertions, and we fix it. For example, test_aoti_runtime_asserts_backed_symint has an assertion size//100 ==0 which is not generated by the inductor, but it's in exported program.

Do you think such assertions (e.g. size//100 ==0 ) can also be generated in inductor?

Summary: Solves pytorch#151925 A reland of pytorch#152125. added a try-except around the justknob internally. Also added more documentation Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. Also factored out the run time assertion logic to a separate function. We need to generate runtime asserts directly in Inductor instead of just re-using the asserts from input graphs becase we reuse the same ShapeEnv as before. In particular, on subsequent graph passes, we would immediately turn all of these assertions into noops, because when we evaluated their expressions, we would see that because we had a deferred runtime assert in the ShapeEnv, we know "oh, of course this expression is True" already. One example is below: ``` class Model(torch.nn.Module): def forward(self, a, b, c): nz = torch.nonzero(a) ones = a.new_ones([nz.size(0), b.size(0)]) torch._check(ones.size(0) >= 1) equals = torch.add(ones, c) return equals torch._dynamo.mark_dynamic(c, 0) ``` When we re-use the ShapeEnv in Inductor lowering, the check that checks a and nonzero have the same shape would be evaluted to True after we resolve unbacked bindings using the ShapeEnv. See `test_unbacked_equals_input_size_runtime_assertion` in test_aot_inductor. In addition to the Inductor generated runtime asserts, we also need the runtime asserts from the input graph, because some derived runtime asserts are not generated in Inductor. One example is below: ``` class Model(torch.nn.Module): def forward(self, x): y = x.reshape(100, -1).clone() y = y + 1 return y dynamic_shapes = { "x": {0: torch.export.Dim.DYNAMIC}, } x.shape[0] needs to be a multiple of 100. ``` See `test_aoti_runtime_asserts_backed_symint` in test_aot_inductor. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_unbacked_equals_input_size ``` Differential Revision: D74361799

Summary: Pull Request resolved: pytorch#153182 Solves pytorch#151925 A reland of pytorch#152125. added a try-except around the justknob internally. Also added more documentation Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. Also factored out the run time assertion logic to a separate function. We need to generate runtime asserts directly in Inductor instead of just re-using the asserts from input graphs becase we reuse the same ShapeEnv as before. In particular, on subsequent graph passes, we would immediately turn all of these assertions into noops, because when we evaluated their expressions, we would see that because we had a deferred runtime assert in the ShapeEnv, we know "oh, of course this expression is True" already. One example is below: ``` class Model(torch.nn.Module): def forward(self, a, b, c): nz = torch.nonzero(a) ones = a.new_ones([nz.size(0), b.size(0)]) torch._check(ones.size(0) >= 1) equals = torch.add(ones, c) return equals torch._dynamo.mark_dynamic(c, 0) ``` When we re-use the ShapeEnv in Inductor lowering, the check that checks a and nonzero have the same shape would be evaluted to True after we resolve unbacked bindings using the ShapeEnv. See `test_unbacked_equals_input_size_runtime_assertion` in test_aot_inductor. In addition to the Inductor generated runtime asserts, we also need the runtime asserts from the input graph, because some derived runtime asserts are not generated in Inductor. One example is below: ``` class Model(torch.nn.Module): def forward(self, x): y = x.reshape(100, -1).clone() y = y + 1 return y dynamic_shapes = { "x": {0: torch.export.Dim.DYNAMIC}, } x.shape[0] needs to be a multiple of 100. ``` See `test_aoti_runtime_asserts_backed_symint` in test_aot_inductor. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_unbacked_equals_input_size ``` Reviewed By: henrylhtsang Differential Revision: D74361799

Summary: Solves #151925 A reland of #152125. added a try-except around the justknob internally. Also added more documentation Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. Also factored out the run time assertion logic to a separate function. We need to generate runtime asserts directly in Inductor instead of just re-using the asserts from input graphs becase we reuse the same ShapeEnv as before. In particular, on subsequent graph passes, we would immediately turn all of these assertions into noops, because when we evaluated their expressions, we would see that because we had a deferred runtime assert in the ShapeEnv, we know "oh, of course this expression is True" already. One example is below: ``` class Model(torch.nn.Module): def forward(self, a, b, c): nz = torch.nonzero(a) ones = a.new_ones([nz.size(0), b.size(0)]) torch._check(ones.size(0) >= 1) equals = torch.add(ones, c) return equals torch._dynamo.mark_dynamic(c, 0) ``` When we re-use the ShapeEnv in Inductor lowering, the check that checks a and nonzero have the same shape would be evaluted to True after we resolve unbacked bindings using the ShapeEnv. See `test_unbacked_equals_input_size_runtime_assertion` in test_aot_inductor. In addition to the Inductor generated runtime asserts, we also need the runtime asserts from the input graph, because some derived runtime asserts are not generated in Inductor. One example is below: ``` class Model(torch.nn.Module): def forward(self, x): y = x.reshape(100, -1).clone() y = y + 1 return y dynamic_shapes = { "x": {0: torch.export.Dim.DYNAMIC}, } x.shape[0] needs to be a multiple of 100. ``` See `test_aoti_runtime_asserts_backed_symint` in test_aot_inductor. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_unbacked_equals_input_size ``` Differential Revision: D74361799 Pull Request resolved: #153182 Approved by: https://github.com/henrylhtsang

yushangdi · 2025-05-12T16:21:36Z

close, relanded in #153182

pytorch-bot bot added ciflow/inductor module: inductor labels Apr 24, 2025

facebook-github-bot added the fb-exported label Apr 24, 2025

yushangdi mentioned this pull request Apr 24, 2025

[feature request][AOTI] Expand check input assertions to cover input guards created during compilation? #151925

Closed

yushangdi added the release notes: export label Apr 24, 2025

yushangdi force-pushed the export-D73596786 branch from 6c12ee2 to 3d7613e Compare April 24, 2025 19:58

yushangdi force-pushed the export-D73596786 branch from 3d7613e to 08886d3 Compare April 24, 2025 20:00

yushangdi force-pushed the export-D73596786 branch from 08886d3 to 4515b05 Compare April 24, 2025 20:07

yushangdi force-pushed the export-D73596786 branch from 4515b05 to 03cf605 Compare April 24, 2025 20:18

yushangdi force-pushed the export-D73596786 branch from 03cf605 to 7213af2 Compare April 24, 2025 20:23

henrylhtsang approved these changes Apr 24, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 24, 2025

yushangdi requested a review from jingsh May 7, 2025 23:19

yushangdi commented May 7, 2025

View reviewed changes

henrylhtsang approved these changes May 7, 2025

View reviewed changes

pytorchmergebot added the merging label May 8, 2025

pytorchmergebot added the Merged label May 8, 2025

pytorchmergebot closed this in 834bc5e May 8, 2025

pytorchmergebot removed the merging label May 8, 2025

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels May 8, 2025

pytorchmergebot reopened this May 8, 2025

laithsakka reviewed May 8, 2025

View reviewed changes

yushangdi mentioned this pull request May 8, 2025

[reland] Add graph module runtime asserts to AOTI #153182

Closed

yushangdi closed this May 12, 2025

Add runtime asserts to AOTI #152125

Add runtime asserts to AOTI #152125

Uh oh!

Conversation

yushangdi commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152125

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

facebook-github-bot commented Apr 24, 2025

Uh oh!

henrylhtsang commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yushangdi commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henrylhtsang commented Apr 24, 2025

Uh oh!

yushangdi commented Apr 24, 2025

Uh oh!

jingsh commented Apr 24, 2025

Uh oh!

facebook-github-bot commented Apr 24, 2025

Uh oh!

facebook-github-bot commented Apr 24, 2025

Uh oh!

henrylhtsang commented Apr 24, 2025

Uh oh!

facebook-github-bot commented Apr 24, 2025

Uh oh!

yushangdi commented Apr 24, 2025

Uh oh!

facebook-github-bot commented Apr 24, 2025

Uh oh!

henrylhtsang left a comment

Choose a reason for hiding this comment

Uh oh!

yushangdi May 7, 2025

Choose a reason for hiding this comment

Uh oh!

henrylhtsang commented May 7, 2025

Uh oh!

facebook-github-bot commented May 8, 2025

Uh oh!

pytorchmergebot commented May 8, 2025

Merge started

Uh oh!

facebook-github-bot commented May 8, 2025

Uh oh!

pytorchmergebot commented May 8, 2025

Uh oh!

pytorchmergebot commented May 8, 2025

Uh oh!

laithsakka May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yushangdi May 12, 2025

Choose a reason for hiding this comment

Uh oh!

laithsakka May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yushangdi May 12, 2025

Choose a reason for hiding this comment

Uh oh!

laithsakka May 8, 2025

Choose a reason for hiding this comment

Uh oh!

yushangdi May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yushangdi commented May 12, 2025

yushangdi commented Apr 24, 2025 •

edited

Loading

pytorch-bot bot commented Apr 24, 2025 •

edited

Loading

henrylhtsang commented Apr 24, 2025 •

edited

Loading

yushangdi commented Apr 24, 2025 •

edited

Loading

laithsakka May 8, 2025 •

edited

Loading

laithsakka May 8, 2025 •

edited

Loading

yushangdi May 12, 2025 •

edited

Loading