-
Notifications
You must be signed in to change notification settings - Fork 24.4k
add channels last support for ChannelShuffle #50247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 0c7bd64 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
This patch adds native support of channels last memory format for From performance perspective, ChannelShuffle favors NCHW over NHWC, since on NHWC it would end up with matrix transpose. I made fast paths with vectorized transpose for groups = 2, 4, 8, 16, ... so that NHWC has similar perf as NCHW. Performance BenchmarkingI tested pytorch operator benchmark for ChannelShuffle, Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, single core. Unit: (us) The benchmark naming is self explanatory. short tag:python -m pt.channel_shuffle_test --omp_num_threads 1 --mkl_num_threads 1
long tag:python -m pt.channel_shuffle_test --omp_num_threads 1 --mkl_num_threads 1 --tag_filter long
|
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
This stack has been rebased, please help review :) |
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, please include tests
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
Please rebase stack, landing tool requires it. |
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
updated the original test case from test_nn.py: python test_nn.py TestNN.test_channel_shuffle The test case has validated all 3 memory formats: |
The stack is newly rebased! |
Sorry, can you please rebase again, I'm getting ghstack errors |
Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
This stack is newly newly based, please review! @VitalyFedyunin Also #58348 has been updated: the backward OPs are removed to align with e6c3aa3 |
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@mingfeima why does this PR introduce |
@@ -3394,6 +3394,11 @@ | |||
CPU: channel_shuffle | |||
QuantizedCPU: channel_shuffle_quantized_cpu | |||
|
|||
- func: native_channel_shuffle(Tensor self, int groups) -> Tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @VitalyFedyunin, it looks like this op was added without any python docs. For the upcoming release, can either mark the op as private or add docs for it, and cherry-pick into the release branch?
It's because Also one more reason is that channel_shuffle could possibly run into XNNPack (decided at compile time) #if defined(C10_MOBILE) && defined(USE_XNNPACK)
if (self.is_contiguous(MemoryFormat::ChannelsLast) &&
xnnpack::use_channel_shuffle(self, groups)) {
return xnnpack::channel_shuffle(self, groups);
} If i merge native_channel_shuffle into channel_shuffle, i could not handle it. Another tricky thing about it is the native_ implementation only applies for CPU and CUDA will fall into dispatch key of Feel free to make a change if better options can be made :) |
@mingfeima could you please submit PR with python docstrings for this function. |
Stack from ghstack:
Differential Revision: D26007052