-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Automatic Layout Management #20718
Automatic Layout Management #20718
Conversation
|
Hey @mk-61 , Thanks for submitting the PR
CI supported jobs: [miscellaneous, windows-gpu, windows-cpu, sanity, unix-gpu, centos-gpu, website, clang, centos-cpu, unix-cpu, edge] Note: |
|
@mxnet-bot run ci [macosx-x86_64, centos-gpu, unix-gpu, website, windows-gpu] |
|
Jenkins CI successfully triggered : [centos-gpu, unix-gpu, windows-gpu, website] |
5a54053 to
0ddf20a
Compare
|
@mxnet-bot run ci [unix-cpu] |
|
Jenkins CI successfully triggered : [unix-cpu] |
|
@mxnet-bot run ci [sanity] |
|
Jenkins CI successfully triggered : [sanity] |
|
@mxnet-bot run ci [centos-cpu, centos-gpu, unix-cpu, unix-gpu, windows-cpu, windows-gpu] |
|
Jenkins CI successfully triggered : [centos-cpu, windows-cpu, unix-cpu, windows-gpu, centos-gpu, unix-gpu] |
|
@mxnet-bot run ci [clang, edge, miscellaneous, website] |
|
Jenkins CI successfully triggered : [edge, clang, miscellaneous, website] |
|
@mxnet-bot run ci [centos-cpu] |
|
Jenkins CI successfully triggered : [centos-cpu] |
|
@mxnet-bot run ci [unix-cpu] |
|
Jenkins CI successfully triggered : [unix-cpu] |
|
@mxnet-bot run ci [unix-cpu] |
|
Jenkins CI successfully triggered : [unix-cpu] |
ptrendx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for addressing the comments :-).
Automatic Layout Management improves performance when used together with AMP by converting parts of the computational graph to NHWC layouts.
Description
Target graph parts (containing convolution / deconvolution ops) are automatically converted to NHWC layout by surrounding with transposes.
This functionality was originally implemented by Dawid Tracz [email protected] in the Nvidia container. I later changed the algorithm to a single pass of DFSVisit.
Checklist
Essentials
Changes
Comments
This other PR: #20635 is required in order to get the expected performance improvements.