add channels last support for ChannelShuffle #50247

mingfeima · 2021-01-08T02:27:58Z

Stack from ghstack:

Differential Revision: D26007052

[ghstack-poisoned]

facebook-github-bot · 2021-01-08T02:28:09Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/50247
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 0c7bd64 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ghstack-source-id: b5d29fa Pull Request resolved: #50247

mingfeima · 2021-01-08T02:59:43Z

This patch adds native support of channels last memory format for nn.ChannelShuffle.
Currently nn.ChannelShuffle cannot propagate input memory format, channels last would be treated as non-contiguous NCHW tensor.

From performance perspective, ChannelShuffle favors NCHW over NHWC, since on NHWC it would end up with matrix transpose. I made fast paths with vectorized transpose for groups = 2, 4, 8, 16, ... so that NHWC has similar perf as NCHW.

Performance Benchmarking

I tested pytorch operator benchmark for ChannelShuffle, Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, single core.

Unit: (us)
Jemalloc is applied, the benchmark has quite a large memory footprint.

The benchmark naming is self explanatory.

short tag:

python -m pt.channel_shuffle_test --omp_num_threads 1 --mkl_num_threads 1

Name: (tag short)	before	after	speedup
channel_shuffle_batch_size2_channels_per_group16_height16_width16_groups2_channel_lastTrue	15.597	5.965	2.61
channel_shuffle_batch_size2_channels_per_group16_height16_width16_groups2_channel_lastFalse	8.208	4.433	1.85
channel_shuffle_batch_size2_channels_per_group32_height32_width32_groups2_channel_lastTrue	102.936	31.916	3.23
channel_shuffle_batch_size2_channels_per_group32_height32_width32_groups2_channel_lastFalse	37.120	28.515	1.30
channel_shuffle_batch_size4_channels_per_group32_height32_width32_groups4_channel_lastTrue	427.871	160.150	2.67
channel_shuffle_batch_size4_channels_per_group32_height32_width32_groups4_channel_lastFalse	168.129	159.821	1.05
channel_shuffle_batch_size4_channels_per_group64_height64_width64_groups4_channel_lastTrue	13326.980	1902.588	7.00
channel_shuffle_batch_size4_channels_per_group64_height64_width64_groups4_channel_lastFalse	1977.063	1943.890	1.02
channel_shuffle_batch_size8_channels_per_group64_height64_width64_groups8_channel_lastTrue	67471.954	11689.359	5.77
channel_shuffle_batch_size8_channels_per_group64_height64_width64_groups8_channel_lastFalse	9851.476	9995.010	0.99
channel_shuffle_batch_size16_channels_per_group64_height64_width64_groups16_channel_lastTrue	299455.756	52310.426	5.72
channel_shuffle_batch_size16_channels_per_group64_height64_width64_groups16_channel_lastFalse	42148.276	42693.034	0.99

long tag:

python -m pt.channel_shuffle_test --omp_num_threads 1 --mkl_num_threads 1 --tag_filter long

Name: (tag long)	before	after	speedup
channel_shuffle_batch_size4_channels_per_group32_height32_width32_groups4_channel_lastTrue	456.409	162.546	2.81
channel_shuffle_batch_size4_channels_per_group32_height32_width32_groups4_channel_lastFalse	172.105	162.034	1.06
channel_shuffle_batch_size4_channels_per_group32_height32_width32_groups8_channel_lastTrue	1433.409	328.539	4.36
channel_shuffle_batch_size4_channels_per_group32_height32_width32_groups8_channel_lastFalse	326.948	316.455	1.03
channel_shuffle_batch_size4_channels_per_group32_height32_width64_groups4_channel_lastTrue	1423.315	317.068	4.49
channel_shuffle_batch_size4_channels_per_group32_height32_width64_groups4_channel_lastFalse	325.531	316.596	1.03
channel_shuffle_batch_size4_channels_per_group32_height32_width64_groups8_channel_lastTrue	6571.123	678.229	9.69
channel_shuffle_batch_size4_channels_per_group32_height32_width64_groups8_channel_lastFalse	878.245	646.271	1.36
channel_shuffle_batch_size4_channels_per_group32_height64_width32_groups4_channel_lastTrue	1423.888	316.824	4.49
channel_shuffle_batch_size4_channels_per_group32_height64_width32_groups4_channel_lastFalse	325.770	316.317	1.03
channel_shuffle_batch_size4_channels_per_group32_height64_width32_groups8_channel_lastTrue	6569.904	673.562	9.75
channel_shuffle_batch_size4_channels_per_group32_height64_width32_groups8_channel_lastFalse	879.080	645.087	1.36
channel_shuffle_batch_size4_channels_per_group32_height64_width64_groups4_channel_lastTrue	6501.307	673.263	9.66
channel_shuffle_batch_size4_channels_per_group32_height64_width64_groups4_channel_lastFalse	862.646	645.630	1.34
channel_shuffle_batch_size4_channels_per_group32_height64_width64_groups8_channel_lastTrue	13488.342	2107.488	6.40
channel_shuffle_batch_size4_channels_per_group32_height64_width64_groups8_channel_lastFalse	2019.342	1911.166	1.06
channel_shuffle_batch_size4_channels_per_group64_height32_width32_groups4_channel_lastTrue	1309.758	317.439	4.13
channel_shuffle_batch_size4_channels_per_group64_height32_width32_groups4_channel_lastFalse	328.172	316.336	1.04
channel_shuffle_batch_size4_channels_per_group64_height32_width32_groups8_channel_lastTrue	6578.720	660.705	9.96
channel_shuffle_batch_size4_channels_per_group64_height32_width32_groups8_channel_lastFalse	902.144	645.365	1.40
channel_shuffle_batch_size4_channels_per_group64_height32_width64_groups4_channel_lastTrue	6504.782	675.302	9.63
channel_shuffle_batch_size4_channels_per_group64_height32_width64_groups4_channel_lastFalse	875.443	645.004	1.36
channel_shuffle_batch_size4_channels_per_group64_height32_width64_groups8_channel_lastTrue	13474.683	2154.331	6.25
channel_shuffle_batch_size4_channels_per_group64_height32_width64_groups8_channel_lastFalse	2066.495	1905.396	1.08
channel_shuffle_batch_size4_channels_per_group64_height64_width32_groups4_channel_lastTrue	6479.559	736.724	8.80
channel_shuffle_batch_size4_channels_per_group64_height64_width32_groups4_channel_lastFalse	878.409	649.855	1.35
channel_shuffle_batch_size4_channels_per_group64_height64_width32_groups8_channel_lastTrue	13459.314	2053.354	6.55
channel_shuffle_batch_size4_channels_per_group64_height64_width32_groups8_channel_lastFalse	2060.625	1906.264	1.08
channel_shuffle_batch_size4_channels_per_group64_height64_width64_groups4_channel_lastTrue	13346.251	1993.075	6.70
channel_shuffle_batch_size4_channels_per_group64_height64_width64_groups4_channel_lastFalse	2055.785	1935.230	1.06
channel_shuffle_batch_size4_channels_per_group64_height64_width64_groups8_channel_lastTrue	30438.491	5696.887	5.34
channel_shuffle_batch_size4_channels_per_group64_height64_width64_groups8_channel_lastFalse	4797.431	4646.504	1.03
channel_shuffle_batch_size8_channels_per_group32_height32_width32_groups4_channel_lastTrue	850.684	317.521	2.68
channel_shuffle_batch_size8_channels_per_group32_height32_width32_groups4_channel_lastFalse	328.374	316.374	1.04
channel_shuffle_batch_size8_channels_per_group32_height32_width32_groups8_channel_lastTrue	3125.623	682.702	4.58
channel_shuffle_batch_size8_channels_per_group32_height32_width32_groups8_channel_lastFalse	903.669	646.157	1.40
channel_shuffle_batch_size8_channels_per_group32_height32_width64_groups4_channel_lastTrue	2950.647	672.118	4.39
channel_shuffle_batch_size8_channels_per_group32_height32_width64_groups4_channel_lastFalse	879.654	645.399	1.36
channel_shuffle_batch_size8_channels_per_group32_height32_width64_groups8_channel_lastTrue	13647.319	2020.811	6.75
channel_shuffle_batch_size8_channels_per_group32_height32_width64_groups8_channel_lastFalse	2090.614	1932.686	1.08
channel_shuffle_batch_size8_channels_per_group32_height64_width32_groups4_channel_lastTrue	2875.812	739.154	3.89
channel_shuffle_batch_size8_channels_per_group32_height64_width32_groups4_channel_lastFalse	878.228	648.036	1.36
channel_shuffle_batch_size8_channels_per_group32_height64_width32_groups8_channel_lastTrue	13564.547	2026.004	6.70
channel_shuffle_batch_size8_channels_per_group32_height64_width32_groups8_channel_lastFalse	2093.294	1947.590	1.07
channel_shuffle_batch_size8_channels_per_group32_height64_width64_groups4_channel_lastTrue	13354.635	2034.973	6.56
channel_shuffle_batch_size8_channels_per_group32_height64_width64_groups4_channel_lastFalse	2043.774	1933.158	1.06
channel_shuffle_batch_size8_channels_per_group32_height64_width64_groups8_channel_lastTrue	30528.643	5488.128	5.56
channel_shuffle_batch_size8_channels_per_group32_height64_width64_groups8_channel_lastFalse	4801.484	4640.786	1.03
channel_shuffle_batch_size8_channels_per_group64_height32_width32_groups4_channel_lastTrue	2727.968	644.440	4.23
channel_shuffle_batch_size8_channels_per_group64_height32_width32_groups4_channel_lastFalse	894.660	645.652	1.39
channel_shuffle_batch_size8_channels_per_group64_height32_width32_groups8_channel_lastTrue	13535.079	2131.735	6.35
channel_shuffle_batch_size8_channels_per_group64_height32_width32_groups8_channel_lastFalse	2123.859	1937.743	1.10
channel_shuffle_batch_size8_channels_per_group64_height32_width64_groups4_channel_lastTrue	13382.192	1997.564	6.70
channel_shuffle_batch_size8_channels_per_group64_height32_width64_groups4_channel_lastFalse	2086.732	1920.518	1.09
channel_shuffle_batch_size8_channels_per_group64_height32_width64_groups8_channel_lastTrue	30911.592	5818.629	5.31
channel_shuffle_batch_size8_channels_per_group64_height32_width64_groups8_channel_lastFalse	4819.277	4721.406	1.02
channel_shuffle_batch_size8_channels_per_group64_height64_width32_groups4_channel_lastTrue	13413.047	1930.201	6.95
channel_shuffle_batch_size8_channels_per_group64_height64_width32_groups4_channel_lastFalse	2083.692	1940.887	1.07
channel_shuffle_batch_size8_channels_per_group64_height64_width32_groups8_channel_lastTrue	30626.032	5529.039	5.54
channel_shuffle_batch_size8_channels_per_group64_height64_width32_groups8_channel_lastFalse	4829.248	4657.726	1.04
channel_shuffle_batch_size8_channels_per_group64_height64_width64_groups4_channel_lastTrue	30409.803	4680.921	6.50
channel_shuffle_batch_size8_channels_per_group64_height64_width64_groups4_channel_lastFalse	4965.459	4666.142	1.06
channel_shuffle_batch_size8_channels_per_group64_height64_width64_groups8_channel_lastTrue	65786.889	11573.771	5.68
channel_shuffle_batch_size8_channels_per_group64_height64_width64_groups8_channel_lastFalse	10538.863	10004.425	1.05

[ghstack-poisoned]

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

mingfeima · 2021-12-17T05:14:41Z

This stack has been rebased, please help review :)

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

pytorch-probot · 2021-12-21T06:26:06Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/5e3e455094062b404c4867fde37ea083ce3dd205/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`, `ciflow/xla`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.6-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.6-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.6-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

pytorch-probot · 2021-12-22T10:01:45Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/e2fd29d0242e864cce70dac18ce89d190da1760b/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`, `ciflow/xla`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.6-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.6-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.6-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

VitalyFedyunin · 2021-12-29T19:55:31Z

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

VitalyFedyunin

Code looks good, please include tests

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

pytorch-probot · 2022-01-05T06:42:35Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/17d65df1297702b9ea245888e7cf09b203176e1c/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

VitalyFedyunin · 2022-01-05T16:38:09Z

Please rebase stack, landing tool requires it.

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

pytorch-probot · 2022-01-07T04:47:25Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/fa0682aa82babb3f85553a41595743016e93cb8d/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

mingfeima · 2022-01-07T04:47:40Z

updated the original test case from test_nn.py:

python test_nn.py TestNN.test_channel_shuffle

The test case has validated all 3 memory formats: torch.contiguous, torch.channels_last, torch.channels_last_3d

mingfeima · 2022-01-07T04:50:31Z

The stack is newly rebased!

VitalyFedyunin · 2022-01-10T22:44:11Z

Sorry, can you please rebase again, I'm getting ghstack errors Internal error: couldn't understand base diff

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

pytorch-probot · 2022-01-13T02:10:05Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/0c7bd64cc1d3b5853a5abe523595fe30fd6fa6c1/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries/conda`	🚫 skipped
linux-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries/libtorch`	🚫 skipped
linux-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries/libtorch`	🚫 skipped
linux-binary-manywheel	`ciflow/binaries`, `ciflow/binaries/wheel`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

mingfeima · 2022-01-13T02:12:32Z

This stack is newly newly based, please review! @VitalyFedyunin

Also #58348 has been updated: the backward OPs are removed to align with e6c3aa3

VitalyFedyunin · 2022-01-13T20:35:08Z

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

kimishpatel · 2022-02-07T18:39:28Z

@mingfeima why does this PR introduce native_channel_shuffle in native_functions.yaml when it doesnt really need to provide aten op by the same name and thus be usable from torch namespace?

bdhirsh · 2022-02-08T20:33:58Z

aten/src/ATen/native/native_functions.yaml

@@ -3394,6 +3394,11 @@
    CPU: channel_shuffle
    QuantizedCPU: channel_shuffle_quantized_cpu

+- func: native_channel_shuffle(Tensor self, int groups) -> Tensor


Hey @VitalyFedyunin, it looks like this op was added without any python docs. For the upcoming release, can either mark the op as private or add docs for it, and cherry-pick into the release branch?

mingfeima · 2022-02-09T01:06:30Z

@mingfeima why does this PR introduce native_channel_shuffle in native_functions.yaml when it doesnt really need to provide aten op by the same name and thus be usable from torch namespace?

It's because ChannelShuffle has a math implementation, while native_ indicates a directly kernel level implementation. Similar to other OPs with math implementation like LayerNorm or GroupNorm, they have native_ dispatch as well.

Also one more reason is that channel_shuffle could possibly run into XNNPack (decided at compile time)

#if defined(C10_MOBILE) && defined(USE_XNNPACK)
  if (self.is_contiguous(MemoryFormat::ChannelsLast) &&
      xnnpack::use_channel_shuffle(self, groups)) {
    return xnnpack::channel_shuffle(self, groups);
  }

If i merge native_channel_shuffle into channel_shuffle, i could not handle it.

Another tricky thing about it is the native_ implementation only applies for CPU and CUDA will fall into dispatch key of CompositeImplicitAutograd. It would be more decent if the cuda kernel is added as well but usually we (intel folks) only touched the CPU part.

Feel free to make a change if better options can be made :)
Actually we cares more about the performance part and also make sure that it won't break MemoryFormat propagation (so that it won't introduce reorder in following OPs).

VitalyFedyunin · 2022-02-09T19:21:14Z

@mingfeima could you please submit PR with python docstrings for this function.

add channels last support for ChannelShuffle

38637f1

[ghstack-poisoned]

This was referenced Jan 8, 2021

add channels last for MaxPool2d #48917

Closed

add channels last support for AvgPool2d on CPU #48918

Closed

facebook-github-bot added the cla signed label Jan 8, 2021

mingfeima mentioned this pull request Jan 8, 2021

optimize channels last for BatchNorm2d on CPU #48919

Closed

This was referenced Jan 8, 2021

add channels last for AdapativeMaxPool2d #48920

Closed

add channels last support for thnn_conv2d (non-dilated) #49582

Closed

add channels last for GroupNorm #49821

Closed

add channel last support for MaxUnpool2d #49984

Closed

pytorchbot added the open source label Jan 8, 2021

mingfeima added a commit that referenced this pull request Jan 8, 2021

add channels last support for ChannelShuffle

d2691e0

ghstack-source-id: b5d29fa Pull Request resolved: #50247

mingfeima mentioned this pull request Jan 15, 2021

add channels last support for PixelShuffle and PixelUnshuffle #50573

Closed

mingfeima added 5 commits January 18, 2021 10:01

Update on "add channels last support for ChannelShuffle"

b4ab2ac

[ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

5d6afd4

[ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

b4a7348

[ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

ba1bbc6

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

cff616f

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

mingfeima mentioned this pull request Jan 27, 2021

add channels last support for ConvTranspose2d #51185

Closed

mingfeima added 10 commits February 5, 2021 11:02

Update on "add channels last support for ChannelShuffle"

026088c

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

33c6a9c

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

5febf38

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

a278b84

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

f6873f9

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

53473c2

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

29dce2a

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

af05708

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

fafc7a3

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

61660ff

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

mingfeima mentioned this pull request Dec 20, 2021

add channels last support for thnn_conv2d (non-dilated) #68101

Closed

Update on "add channels last support for ChannelShuffle"

5e3e455

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

e2fd29d

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

VitalyFedyunin reviewed Dec 29, 2021

View reviewed changes

VitalyFedyunin approved these changes Jan 3, 2022

View reviewed changes

Update on "add channels last support for ChannelShuffle"

17d65df

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

mingfeima mentioned this pull request Jan 5, 2022

add channels last support for slow_conv_dilated2d #70665

Closed

mingfeima mentioned this pull request Jan 6, 2022

add channels last support for slow_conv_transpose2d #70897

Closed

Update on "add channels last support for ChannelShuffle"

fa0682a

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

Update on "add channels last support for ChannelShuffle"

0c7bd64

Differential Revision: [D26007052](https://our.internmc.facebook.com/intern/diff/D26007052) [ghstack-poisoned]

facebook-github-bot closed this in 054b90f Jan 14, 2022

facebook-github-bot deleted the gh/mingfeima/10/head branch January 18, 2022 15:15

bdhirsh reviewed Feb 8, 2022

View reviewed changes

add channels last support for ChannelShuffle #50247

add channels last support for ChannelShuffle #50247

Uh oh!

Conversation

mingfeima commented Jan 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jan 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

mingfeima commented Jan 8, 2021

Performance Benchmarking

short tag:

long tag:

Uh oh!

mingfeima commented Dec 17, 2021

Uh oh!

pytorch-probot bot commented Dec 21, 2021

⚛️ CI Flow

Uh oh!

pytorch-probot bot commented Dec 22, 2021

⚛️ CI Flow

Uh oh!

VitalyFedyunin commented Dec 29, 2021

Uh oh!

VitalyFedyunin left a comment

Choose a reason for hiding this comment

Uh oh!

pytorch-probot bot commented Jan 5, 2022

⚛️ CI Flow

Uh oh!

VitalyFedyunin commented Jan 5, 2022

Uh oh!

pytorch-probot bot commented Jan 7, 2022

⚛️ CI Flow

Uh oh!

mingfeima commented Jan 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mingfeima commented Jan 7, 2022

Uh oh!

VitalyFedyunin commented Jan 10, 2022

Uh oh!

pytorch-probot bot commented Jan 13, 2022

⚛️ CI Flow

Uh oh!

mingfeima commented Jan 13, 2022

Uh oh!

VitalyFedyunin commented Jan 13, 2022

Uh oh!

kimishpatel commented Feb 7, 2022

Uh oh!

bdhirsh Feb 8, 2022

Choose a reason for hiding this comment

Uh oh!

mingfeima commented Feb 9, 2022

Uh oh!

VitalyFedyunin commented Feb 9, 2022

Uh oh!

Uh oh!

mingfeima commented Jan 8, 2021 •

edited

Loading

facebook-github-bot commented Jan 8, 2021 •

edited

Loading

mingfeima commented Jan 7, 2022 •

edited

Loading