-
Notifications
You must be signed in to change notification settings - Fork 24.1k
Add runtime asserts to AOTI #152125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add runtime asserts to AOTI #152125
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152125
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 189c7c0 with merge base d042ec8 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D73596786 |
I can't tell if I am being too careful. Do we want to roll this out to everything at once? |
@henrylhtsang Is the worry that having more assert will slow down performance of some models? If you set TORCHINDUCTOR_SCALAR_ASSERTS=0, then no scalar asserts will be generated (it's default to 1). Not sure what's a safer roll out plan, but I can implement it if you have any suggestion! |
I wasn't too worried about perf I guess. I was just worrying this would cause some actual AOTI runs to fail immediately. We can guard it with a config under aot_inductor? We can set on by default for OSS, then roll out in fbcode, remove the config and the guard in 1 month. cc @chenyang78 and @desertfire for thoughts, maybe I am just too paranoid |
sure sounds good to me! Being more cautious is never wrong. Would it be better to use the justknob for the fbcode guard? |
guard would be great! |
6c12ee2
to
3d7613e
Compare
Summary: Solves pytorch#151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `pytorch/export:aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify ``` Differential Revision: D73596786
This pull request was exported from Phabricator. Differential Revision: D73596786 |
Summary: Solves pytorch#151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify ``` Differential Revision: D73596786
3d7613e
to
08886d3
Compare
This pull request was exported from Phabricator. Differential Revision: D73596786 |
just a flag should be enough in my opinion, but either is fine |
Summary: Solves pytorch#151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify ``` Differential Revision: D73596786
08886d3
to
4515b05
Compare
This pull request was exported from Phabricator. Differential Revision: D73596786 |
@henrylhtsang I added both a env var |
Summary: Solves pytorch#151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify ``` Differential Revision: D73596786
4515b05
to
03cf605
Compare
This pull request was exported from Phabricator. Differential Revision: D73596786 |
03cf605
to
7213af2
Compare
Summary: Pull Request resolved: pytorch#152125 Solves pytorch#151925 Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. It's guarded behind the `aoti_full_runtime_assert` justknob. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` #buildmore Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify ``` Differential Revision: D73596786
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a comment
@henrylhtsang Finally the PR is ready to land! It's changed quite a bit from the original PR, mainly because I did a little code refactoring, can I get a review again? (I'll need to final review after it lands anyway). Thanks! |
|
||
# Emit code for runtime asserts that can be inserted at this point. | ||
for i0 in new_unbacked_defs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored into function create_deferred_runtime_asserts
Thanks a lot for the great work! |
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot revert -m="Diff reverted internally" -c="ghfirst" This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).) |
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 834bc5e. Reverted #152125 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](#152125 (comment)))
@yushangdi your PR has been successfully reverted. |
@@ -159,6 +159,10 @@ def export_training_ir_rollout_check() -> bool: | |||
return True | |||
|
|||
|
|||
def full_aoti_runtime_assert() -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this meant to be a config controlled? its always true?
): | ||
node_args, _ = self.fetch_args_kwargs_from_env(n) | ||
# some assert may have been captured by unbacked symint assertion | ||
if node_args[0] != True: # noqa: E712 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm i do not understand why we skip it if its True?
i mean all of the runtime assertions should be true no? except when reasoning is not smart eough to figure it out
can you share an example of such thing where we ignore those.
if ( | ||
full_aoti_runtime_assert() | ||
and n.target == torch.ops.aten._assert_scalar.default | ||
and self.aot_mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking out loud, there must be a reason why for unbacked we did not just depend on torch.ops.aten._assert_scalar.default to generate the runtime assert.
the same reason which idk what it is might apply here?
Summary: Solves pytorch#151925 A reland of pytorch#152125. added a try-except around the justknob internally. Also added more documentation Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. Also factored out the run time assertion logic to a separate function. We need to generate runtime asserts directly in Inductor instead of just re-using the asserts from input graphs becase we reuse the same ShapeEnv as before. In particular, on subsequent graph passes, we would immediately turn all of these assertions into noops, because when we evaluated their expressions, we would see that because we had a deferred runtime assert in the ShapeEnv, we know "oh, of course this expression is True" already. One example is below: ``` class Model(torch.nn.Module): def forward(self, a, b, c): nz = torch.nonzero(a) ones = a.new_ones([nz.size(0), b.size(0)]) torch._check(ones.size(0) >= 1) equals = torch.add(ones, c) return equals torch._dynamo.mark_dynamic(c, 0) ``` When we re-use the ShapeEnv in Inductor lowering, the check that checks a and nonzero have the same shape would be evaluted to True after we resolve unbacked bindings using the ShapeEnv. See `test_unbacked_equals_input_size_runtime_assertion` in test_aot_inductor. In addition to the Inductor generated runtime asserts, we also need the runtime asserts from the input graph, because some derived runtime asserts are not generated in Inductor. One example is below: ``` class Model(torch.nn.Module): def forward(self, x): y = x.reshape(100, -1).clone() y = y + 1 return y dynamic_shapes = { "x": {0: torch.export.Dim.DYNAMIC}, } x.shape[0] needs to be a multiple of 100. ``` See `test_aoti_runtime_asserts_backed_symint` in test_aot_inductor. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_unbacked_equals_input_size ``` Differential Revision: D74361799
Summary: Solves pytorch#151925 A reland of pytorch#152125. added a try-except around the justknob internally. Also added more documentation Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. Also factored out the run time assertion logic to a separate function. We need to generate runtime asserts directly in Inductor instead of just re-using the asserts from input graphs becase we reuse the same ShapeEnv as before. In particular, on subsequent graph passes, we would immediately turn all of these assertions into noops, because when we evaluated their expressions, we would see that because we had a deferred runtime assert in the ShapeEnv, we know "oh, of course this expression is True" already. One example is below: ``` class Model(torch.nn.Module): def forward(self, a, b, c): nz = torch.nonzero(a) ones = a.new_ones([nz.size(0), b.size(0)]) torch._check(ones.size(0) >= 1) equals = torch.add(ones, c) return equals torch._dynamo.mark_dynamic(c, 0) ``` When we re-use the ShapeEnv in Inductor lowering, the check that checks a and nonzero have the same shape would be evaluted to True after we resolve unbacked bindings using the ShapeEnv. See `test_unbacked_equals_input_size_runtime_assertion` in test_aot_inductor. In addition to the Inductor generated runtime asserts, we also need the runtime asserts from the input graph, because some derived runtime asserts are not generated in Inductor. One example is below: ``` class Model(torch.nn.Module): def forward(self, x): y = x.reshape(100, -1).clone() y = y + 1 return y dynamic_shapes = { "x": {0: torch.export.Dim.DYNAMIC}, } x.shape[0] needs to be a multiple of 100. ``` See `test_aoti_runtime_asserts_backed_symint` in test_aot_inductor. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_unbacked_equals_input_size ``` Reviewed By: henrylhtsang Differential Revision: D74361799
Summary: Pull Request resolved: pytorch#153182 Solves pytorch#151925 A reland of pytorch#152125. added a try-except around the justknob internally. Also added more documentation Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. Also factored out the run time assertion logic to a separate function. We need to generate runtime asserts directly in Inductor instead of just re-using the asserts from input graphs becase we reuse the same ShapeEnv as before. In particular, on subsequent graph passes, we would immediately turn all of these assertions into noops, because when we evaluated their expressions, we would see that because we had a deferred runtime assert in the ShapeEnv, we know "oh, of course this expression is True" already. One example is below: ``` class Model(torch.nn.Module): def forward(self, a, b, c): nz = torch.nonzero(a) ones = a.new_ones([nz.size(0), b.size(0)]) torch._check(ones.size(0) >= 1) equals = torch.add(ones, c) return equals torch._dynamo.mark_dynamic(c, 0) ``` When we re-use the ShapeEnv in Inductor lowering, the check that checks a and nonzero have the same shape would be evaluted to True after we resolve unbacked bindings using the ShapeEnv. See `test_unbacked_equals_input_size_runtime_assertion` in test_aot_inductor. In addition to the Inductor generated runtime asserts, we also need the runtime asserts from the input graph, because some derived runtime asserts are not generated in Inductor. One example is below: ``` class Model(torch.nn.Module): def forward(self, x): y = x.reshape(100, -1).clone() y = y + 1 return y dynamic_shapes = { "x": {0: torch.export.Dim.DYNAMIC}, } x.shape[0] needs to be a multiple of 100. ``` See `test_aoti_runtime_asserts_backed_symint` in test_aot_inductor. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_unbacked_equals_input_size ``` Reviewed By: henrylhtsang Differential Revision: D74361799
Summary: Solves pytorch#151925 A reland of pytorch#152125. added a try-except around the justknob internally. Also added more documentation Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. Also factored out the run time assertion logic to a separate function. We need to generate runtime asserts directly in Inductor instead of just re-using the asserts from input graphs becase we reuse the same ShapeEnv as before. In particular, on subsequent graph passes, we would immediately turn all of these assertions into noops, because when we evaluated their expressions, we would see that because we had a deferred runtime assert in the ShapeEnv, we know "oh, of course this expression is True" already. One example is below: ``` class Model(torch.nn.Module): def forward(self, a, b, c): nz = torch.nonzero(a) ones = a.new_ones([nz.size(0), b.size(0)]) torch._check(ones.size(0) >= 1) equals = torch.add(ones, c) return equals torch._dynamo.mark_dynamic(c, 0) ``` When we re-use the ShapeEnv in Inductor lowering, the check that checks a and nonzero have the same shape would be evaluted to True after we resolve unbacked bindings using the ShapeEnv. See `test_unbacked_equals_input_size_runtime_assertion` in test_aot_inductor. In addition to the Inductor generated runtime asserts, we also need the runtime asserts from the input graph, because some derived runtime asserts are not generated in Inductor. One example is below: ``` class Model(torch.nn.Module): def forward(self, x): y = x.reshape(100, -1).clone() y = y + 1 return y dynamic_shapes = { "x": {0: torch.export.Dim.DYNAMIC}, } x.shape[0] needs to be a multiple of 100. ``` See `test_aoti_runtime_asserts_backed_symint` in test_aot_inductor. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_unbacked_equals_input_size ``` Reviewed By: henrylhtsang Differential Revision: D74361799
Summary: Pull Request resolved: pytorch#153182 Solves pytorch#151925 A reland of pytorch#152125. added a try-except around the justknob internally. Also added more documentation Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. Also factored out the run time assertion logic to a separate function. We need to generate runtime asserts directly in Inductor instead of just re-using the asserts from input graphs becase we reuse the same ShapeEnv as before. In particular, on subsequent graph passes, we would immediately turn all of these assertions into noops, because when we evaluated their expressions, we would see that because we had a deferred runtime assert in the ShapeEnv, we know "oh, of course this expression is True" already. One example is below: ``` class Model(torch.nn.Module): def forward(self, a, b, c): nz = torch.nonzero(a) ones = a.new_ones([nz.size(0), b.size(0)]) torch._check(ones.size(0) >= 1) equals = torch.add(ones, c) return equals torch._dynamo.mark_dynamic(c, 0) ``` When we re-use the ShapeEnv in Inductor lowering, the check that checks a and nonzero have the same shape would be evaluted to True after we resolve unbacked bindings using the ShapeEnv. See `test_unbacked_equals_input_size_runtime_assertion` in test_aot_inductor. In addition to the Inductor generated runtime asserts, we also need the runtime asserts from the input graph, because some derived runtime asserts are not generated in Inductor. One example is below: ``` class Model(torch.nn.Module): def forward(self, x): y = x.reshape(100, -1).clone() y = y + 1 return y dynamic_shapes = { "x": {0: torch.export.Dim.DYNAMIC}, } x.shape[0] needs to be a multiple of 100. ``` See `test_aoti_runtime_asserts_backed_symint` in test_aot_inductor. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_unbacked_equals_input_size ``` Reviewed By: henrylhtsang Differential Revision: D74361799
Summary: Solves #151925 A reland of #152125. added a try-except around the justknob internally. Also added more documentation Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. Also factored out the run time assertion logic to a separate function. We need to generate runtime asserts directly in Inductor instead of just re-using the asserts from input graphs becase we reuse the same ShapeEnv as before. In particular, on subsequent graph passes, we would immediately turn all of these assertions into noops, because when we evaluated their expressions, we would see that because we had a deferred runtime assert in the ShapeEnv, we know "oh, of course this expression is True" already. One example is below: ``` class Model(torch.nn.Module): def forward(self, a, b, c): nz = torch.nonzero(a) ones = a.new_ones([nz.size(0), b.size(0)]) torch._check(ones.size(0) >= 1) equals = torch.add(ones, c) return equals torch._dynamo.mark_dynamic(c, 0) ``` When we re-use the ShapeEnv in Inductor lowering, the check that checks a and nonzero have the same shape would be evaluted to True after we resolve unbacked bindings using the ShapeEnv. See `test_unbacked_equals_input_size_runtime_assertion` in test_aot_inductor. In addition to the Inductor generated runtime asserts, we also need the runtime asserts from the input graph, because some derived runtime asserts are not generated in Inductor. One example is below: ``` class Model(torch.nn.Module): def forward(self, x): y = x.reshape(100, -1).clone() y = y + 1 return y dynamic_shapes = { "x": {0: torch.export.Dim.DYNAMIC}, } x.shape[0] needs to be a multiple of 100. ``` See `test_aoti_runtime_asserts_backed_symint` in test_aot_inductor. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_unbacked_equals_input_size ``` Differential Revision: D74361799 Pull Request resolved: #153182 Approved by: https://github.com/henrylhtsang
Summary:
Solves #151925
Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all
_assert_scalar
calls in the input graph.Also factored out the run time assertion logic to a separate function.
Example:
Generated cpp code:
Test Plan:
Differential Revision: D73596786
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov