Tags: expedock/pytorch
Tags
Update on "Implement numpy(force=True)" cc mruberry rgommers voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
Update on "inductor: only do the conv+bn folding for the freezing path" Re-enable PR: pytorch#109270 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
Update on "[Inductor] Break the loop fusion when node2 depends on nod… …e1 mutations" **Summary** Fix the issue: pytorch#108963. After this PR, loop fusion should break when node2 depends on node1's buffer mutation. Take the UT as example: - Before this PR, the generated code is: ``` cpp_fused_div_index_add_0 = async_compile.cpp(''' #include "/tmp/torchinductor_root/ib/cibrnuq56cxamjj4krp4zpjvsirbmlolpbnmomodzyd46huzhdw7.h" extern "C" void kernel(const double* in_ptr0, const long* in_ptr1, const double* in_ptr2, double* out_ptr0, double* out_ptr1) { { auto tmp0 = in_ptr0[static_cast<long>(0L)]; out_ptr0[static_cast<long>(0L)] = tmp0; } { auto tmp0 = in_ptr1[static_cast<long>(0L)]; auto tmp1 = in_ptr2[static_cast<long>(0L)]; auto tmp4 = out_ptr0[static_cast<long>(0L)]; auto tmp2 = static_cast<double>(2.0); auto tmp3 = decltype(tmp1)(tmp1 * tmp2); auto tmp5 = tmp4 / tmp2; atomic_add(&out_ptr0[static_cast<long>(0L)], tmp3); out_ptr1[static_cast<long>(0L)] = tmp5; } } ''') ``` - After this PR, the generated code is: ``` cpp_fused_div_index_add_0 = async_compile.cpp(''' #include "/tmp/torchinductor_root/ib/cibrnuq56cxamjj4krp4zpjvsirbmlolpbnmomodzyd46huzhdw7.h" extern "C" void kernel(const double* in_ptr0, const long* in_ptr1, const double* in_ptr2, double* out_ptr0, double* out_ptr1) { { auto tmp0 = in_ptr0[static_cast<long>(0L)]; out_ptr0[static_cast<long>(0L)] = tmp0; } { auto tmp0 = in_ptr1[static_cast<long>(0L)]; auto tmp1 = in_ptr2[static_cast<long>(0L)]; auto tmp2 = static_cast<double>(2.0); auto tmp3 = decltype(tmp1)(tmp1 * tmp2); atomic_add(&out_ptr0[static_cast<long>(0L)], tmp3); } { auto tmp0 = out_ptr0[static_cast<long>(0L)]; auto tmp1 = static_cast<double>(2.0); auto tmp2 = tmp0 / tmp1; out_ptr1[static_cast<long>(0L)] = tmp2; } } ''') ``` **Test Plan** ``` python -u -m pytest -s -v test_torchinductor.py -k test_mutations_loop_fusion ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
Update on "Add 3d Attn Pattern to match HF Whisper" Adds a 3d pattern that improves perf of HF Whisper from 1.3 -> 4.1. We could be matching more generally on 3d, but i'll leave that for another pr. Thanks to drisspg for helping me write the pattern. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
Update on "Use pretty print for checking no duplicated pattern" The pretty print is faster and more concise because it memoizes objects. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
[Inductor] Extend Pattern Matcher to Match Equivalent Function Invoc… …ation (pytorch#107832) Summary: Fixes pytorch#104391 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov Reviewed By: eellison Differential Revision: D49433268 Pulled By: yanboliang
Update on "Reland "Make adding buffers more like adding parameters (p… …ytorch#104069)" (take pytorch#2)" Merged in forward fix from pytorch#106783, not sure whether we are relanding in this state but opening for CI cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov anijain2305 [ghstack-poisoned]
Use new self-hosted runner group Add gfx90a target to PYTORCH_ROCM_ARCH Add gfx90a target to PYTORCH_ROCM_ARCH typo Update build.sh Removing patch since already merged via pytorch#106879
PreviousNext