Tags: collabora/pytorch
Tags
Pass triton kernel info to record function (pytorch#123871) Summary: This DIFF is to pass triton kernel information, such as kernel python file, kernel type, grid, and stream, to record_function. With these information, Execution trace can capture triton kernel and replay it in PARAM. Test Plan: unit test buck2 test mode/opt caffe2/test:profiler -- test_record_function_fast Reviewed By: sraikund16 Differential Revision: D56021651
Update on "Enable dynamo test_state_dict_deterministic" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
Update on "[compiled autograd][dynamo] Codegen aliases to keep grad m…
…utated tensors alive"
The current codegen is problematic if __compiled_fn_0 clears the inputs list, since we need it for assignment afterwards
```python
def forward(inputs):
__compiled_fn_0 = ... # The actual function needs to be provided
graph_out_0 = __compiled_fn_0(inputs) # clears inputs
temp_list = []
temp_list.append(graph_out_0[0])
inputs[4].grad = graph_out_0[1] # inputs is empty, index error
inputs[7].grad = graph_out_0[2]
inputs[8].grad = graph_out_0[3]
inputs[9].grad = graph_out_0[3]
del graph_out_0
return temp_list
```
With this fix, we use aliases to keep the tensors alive
```python
def forward(inputs):
__compiled_fn_0 = ... # The actual function needs to be provided
inputs_ref_1 = inputs[9]
inputs_ref_2 = inputs[4]
inputs_ref_3 = inputs[8]
inputs_ref_4 = inputs[7]
graph_out_0 = __compiled_fn_0(inputs)
temp_list = []
temp_list.append(graph_out_0[0])
inputs_ref_2.grad = graph_out_0[1]
inputs_ref_4.grad = graph_out_0[2]
inputs_ref_3.grad = graph_out_0[3]
inputs_ref_1.grad = graph_out_0[3]
del graph_out_0
return temp_list
```
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang
[ghstack-poisoned]
Update on "[compiled autograd][dynamo] Codegen aliases to keep grad m…
…utated tensors alive"
The current codegen is problematic if __compiled_fn_0 clears the inputs list, since we need it for assignment afterwards
```python
def forward(inputs):
__compiled_fn_0 = ... # The actual function needs to be provided
graph_out_0 = __compiled_fn_0(inputs) # clears inputs
temp_list = []
temp_list.append(graph_out_0[0])
inputs[4].grad = graph_out_0[1] # inputs is empty, index error
inputs[7].grad = graph_out_0[2]
inputs[8].grad = graph_out_0[3]
inputs[9].grad = graph_out_0[3]
del graph_out_0
return temp_list
```
With this fix, we use aliases to keep the tensors alive
```python
def forward(inputs):
__compiled_fn_0 = ... # The actual function needs to be provided
inputs_ref_1 = inputs[9]
inputs_ref_2 = inputs[4]
inputs_ref_3 = inputs[8]
inputs_ref_4 = inputs[7]
graph_out_0 = __compiled_fn_0(inputs)
temp_list = []
temp_list.append(graph_out_0[0])
inputs_ref_2.grad = graph_out_0[1]
inputs_ref_4.grad = graph_out_0[2]
inputs_ref_3.grad = graph_out_0[3]
inputs_ref_1.grad = graph_out_0[3]
del graph_out_0
return temp_list
```
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang
[ghstack-poisoned]
Replaces all invocations of _linux-build.yml with _linux-build-label.yml
Update on "[WIP][Inductor Intel GPU backend Upstream] Reuse inductor … …test for Intel GPU (PART 1)" backend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
Replaces all invocations of _linux-build.yml with _linux-build-label.yml
1. Enable EFMT on test/test_functionalization.py
Normalize remote/local cache names (pytorch#123914) Summary: Pull Request resolved: pytorch#123914 Test Plan: ci Differential Revision: D56027380
PreviousNext