Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tags: expedock/pytorch

Tags

ciflow/trunk/109664

Toggle ciflow/trunk/109664's commit message
update vision commit hash

ciflow/trunk/109636

Toggle ciflow/trunk/109636's commit message
Update on "Implement numpy(force=True)"

cc mruberry rgommers voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng

[ghstack-poisoned]

ciflow/trunk/109626

Toggle ciflow/trunk/109626's commit message
add sharded tensor test with empty shard

ciflow/trunk/109587

Toggle ciflow/trunk/109587's commit message
Update on "inductor: only do the conv+bn folding for the freezing path"


Re-enable PR: pytorch#109270

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]

ciflow/trunk/109172

Toggle ciflow/trunk/109172's commit message
Update on "[Inductor] Break the loop fusion when node2 depends on nod…

…e1 mutations"


**Summary**
Fix the issue: pytorch#108963. After this PR, loop fusion should break when node2 depends on node1's buffer mutation. Take the UT as example:

- Before this PR, the generated code is:
```
cpp_fused_div_index_add_0 = async_compile.cpp('''
#include "/tmp/torchinductor_root/ib/cibrnuq56cxamjj4krp4zpjvsirbmlolpbnmomodzyd46huzhdw7.h"
extern "C" void kernel(const double* in_ptr0,
                       const long* in_ptr1,
                       const double* in_ptr2,
                       double* out_ptr0,
                       double* out_ptr1)
{
    {
        auto tmp0 = in_ptr0[static_cast<long>(0L)];
        out_ptr0[static_cast<long>(0L)] = tmp0;
    }
    {
        auto tmp0 = in_ptr1[static_cast<long>(0L)];
        auto tmp1 = in_ptr2[static_cast<long>(0L)];
        auto tmp4 = out_ptr0[static_cast<long>(0L)];
        auto tmp2 = static_cast<double>(2.0);
        auto tmp3 = decltype(tmp1)(tmp1 * tmp2);
        auto tmp5 = tmp4 / tmp2;
        atomic_add(&out_ptr0[static_cast<long>(0L)], tmp3);
        out_ptr1[static_cast<long>(0L)] = tmp5;
    }
}
''')
```

- After this PR, the generated code is:
```
cpp_fused_div_index_add_0 = async_compile.cpp('''
#include "/tmp/torchinductor_root/ib/cibrnuq56cxamjj4krp4zpjvsirbmlolpbnmomodzyd46huzhdw7.h"
extern "C" void kernel(const double* in_ptr0,
                       const long* in_ptr1,
                       const double* in_ptr2,
                       double* out_ptr0,
                       double* out_ptr1)
{
    {
        auto tmp0 = in_ptr0[static_cast<long>(0L)];
        out_ptr0[static_cast<long>(0L)] = tmp0;
    }
    {
        auto tmp0 = in_ptr1[static_cast<long>(0L)];
        auto tmp1 = in_ptr2[static_cast<long>(0L)];
        auto tmp2 = static_cast<double>(2.0);
        auto tmp3 = decltype(tmp1)(tmp1 * tmp2);
        atomic_add(&out_ptr0[static_cast<long>(0L)], tmp3);
    }
    {
        auto tmp0 = out_ptr0[static_cast<long>(0L)];
        auto tmp1 = static_cast<double>(2.0);
        auto tmp2 = tmp0 / tmp1;
        out_ptr1[static_cast<long>(0L)] = tmp2;
    }
}
''')
```


**Test Plan**
```
python -u -m pytest -s -v test_torchinductor.py -k test_mutations_loop_fusion
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]

ciflow/trunk/109156

Toggle ciflow/trunk/109156's commit message
Update on "Add 3d Attn Pattern to match HF Whisper"


Adds a 3d pattern that improves perf of HF Whisper from 1.3 -> 4.1. We could be matching more generally on 3d, but i'll leave that for another pr. 

Thanks to drisspg for helping me write the pattern.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]

ciflow/trunk/109066

Toggle ciflow/trunk/109066's commit message
Update on "Use pretty print for checking no duplicated pattern"


The pretty print is faster and more concise because it memoizes objects.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]

ciflow/trunk/107832

Toggle ciflow/trunk/107832's commit message
 [Inductor] Extend Pattern Matcher to Match Equivalent Function Invoc…

…ation (pytorch#107832)

Summary:
Fixes pytorch#104391


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov


Reviewed By: eellison

Differential Revision: D49433268

Pulled By: yanboliang

ciflow/trunk/106981

Toggle ciflow/trunk/106981's commit message
Update on "Reland "Make adding buffers more like adding parameters (p…

…ytorch#104069)" (take pytorch#2)"

Merged in forward fix from pytorch#106783, not sure whether we are relanding in this state but opening for CI



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov anijain2305

[ghstack-poisoned]

ciflow/trunk/105980

Toggle ciflow/trunk/105980's commit message
Use new self-hosted runner group

Add gfx90a target to PYTORCH_ROCM_ARCH

Add gfx90a target to PYTORCH_ROCM_ARCH

typo

Update build.sh

Removing patch since already merged via pytorch#106879