-
-
Notifications
You must be signed in to change notification settings - Fork 56.3k
CUDA backend for the DNN module #14827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
129 commits
Select commit
Hold shift + click to select a range
64716ab
stub cuda4dnn design
YashasSamaga 20f4f2b
minor fixes for tests and doxygen
YashasSamaga d8f49fd
add csl public api directory to module headers
YashasSamaga b9edc00
add low-level CSL components
YashasSamaga 2f9afc8
add high-level CSL components
YashasSamaga adad256
integrate csl::Tensor into backbone code
YashasSamaga 8635b5e
switch to CPU iff unsupported; otherwise, fail on error
YashasSamaga 6615b7c
add fully connected layer
YashasSamaga cd0234f
add softmax layer
YashasSamaga e3e5cc4
add activation layers
YashasSamaga eb69bf7
support arbitary rank TensorDescriptor
YashasSamaga f24ad2c
pass input wrappers to `initCUDA()`
YashasSamaga a5ae407
add 1d/2d/3d-convolution
YashasSamaga bb984df
add pooling layer
YashasSamaga 16db28b
reorganize and refactor code
YashasSamaga 883968e
fixes for gcc, clang and doxygen; remove cxx14/17 code
YashasSamaga 99fe393
add blank_layer
YashasSamaga 35a1d8f
add LRN layer
YashasSamaga 84067f0
add rounding modes for pooling layer
YashasSamaga e203703
split tensor.hpp into tensor.hpp and tensor_ops.hpp
YashasSamaga 4c8d23b
add concat layer
YashasSamaga 2ab9bdd
add scale layer
YashasSamaga b12e4fc
add batch normalization layer
YashasSamaga 4ae2d35
split math.cu into activations.cu and math.hpp
YashasSamaga cf34c65
add eltwise layer
YashasSamaga ed87d45
add flatten layer
YashasSamaga 9261242
add tensor transform api
YashasSamaga 7db9e6e
add asymmetric padding support for convolution layer
YashasSamaga 205c191
fix rebase issues
YashasSamaga e04e463
add reshape layer
YashasSamaga f120bd0
add permute layer
YashasSamaga bf114d7
add padding support for concat layer
YashasSamaga 0ab06a9
refactor and reorganize code
YashasSamaga 5d2d336
add normalize layer
YashasSamaga 1619f0b
optimize bias addition in scale layer
YashasSamaga ed16c7e
add prior box layer
YashasSamaga 76eaf7b
fix and optimize normalize layer
YashasSamaga ebf5cfb
add asymmetric padding support for pooling layer
YashasSamaga 6fc4ce0
add event API
YashasSamaga 699867e
improve pooling performance for some padding scenarios
YashasSamaga 4791852
avoid over-allocation of compute resources to kernels
YashasSamaga 170fc3e
improve prior box performance
YashasSamaga 8f664f6
enable layer fusion
YashasSamaga 00557bd
add const layer
YashasSamaga 0f21706
add resize layer
YashasSamaga c850cb5
add slice layer
YashasSamaga 085e632
add padding layer
YashasSamaga 1dfc409
add deconvolution layer
YashasSamaga 39cc3a7
fix channelwise ReLU initialization
YashasSamaga fd1acaf
add vector traits
YashasSamaga ad0e4c6
add vectorized versions of relu, clipped_relu, power
YashasSamaga 9414e0b
add vectorized concat kernels
YashasSamaga 1357a9f
improve concat_with_offsets performance
YashasSamaga c8eee86
vectorize scale and bias kernels
YashasSamaga 3e78b21
add support for multi-billion element tensors
YashasSamaga ca95f5c
vectorize prior box kernels
YashasSamaga b678bff
fix address alignment check
YashasSamaga 986b466
improve bias addition performance of conv/deconv/fc layers
YashasSamaga b0799b1
restructure code for supporting multiple targets
YashasSamaga ab0b196
add DNN_TARGET_CUDA_FP64
YashasSamaga 6df05bf
add DNN_TARGET_FP16
YashasSamaga 3957460
improve vectorization
YashasSamaga 052b8e7
add region layer
YashasSamaga 1abec63
improve tensor API, add dynamic ranks
YashasSamaga 977cac2
fix parametric relu activation
YashasSamaga 35a8e89
add squeeze/unsqueeze tensor API
YashasSamaga b30ab12
add reorg layer
YashasSamaga bbfb5c3
optimize permute and enable 2d permute
YashasSamaga b6da715
enable 1d and 2d slice
YashasSamaga 9d52163
add split layer
YashasSamaga 00f55dc
add shuffle channel layer
YashasSamaga 800d2a9
allow tensors of different ranks in reshape primitive
YashasSamaga badd916
patch SliceOp to allow Crop Layer
YashasSamaga 00a4242
allow extra shape inputs in reshape layer
YashasSamaga 085fd05
use `std::move_backward` instead of `std::move` for insert in resizabβ¦
YashasSamaga 93ca2bc
improve workspace management
YashasSamaga 399c83c
add spatial LRN
YashasSamaga 3ff54f1
add nms (cpu) to region layer
YashasSamaga 052e25f
add max pooling with argmax ( and a fix to limits.hpp)
YashasSamaga 2803b91
add max unpooling layer
YashasSamaga db733fc
refactoring, fixes and many optimizations
YashasSamaga 7ee7025
drop DNN_TARGET_CUDA_FP64
YashasSamaga f8249ee
rename DNN_TARGET_CUDA_FP32 to DNN_TARGET_CUDA
YashasSamaga f501821
update supportBackend to be more rigorous
YashasSamaga 1c52f4e
remove stray include from preventing non-cuda build
YashasSamaga 757245b
include op_cuda.hpp outside condition #if
YashasSamaga ba49b27
fix gcc errors
YashasSamaga 52d7740
increase max. tensor rank limit to six
YashasSamaga bde84d9
add Interp layer
YashasSamaga b02811e
drop custom layers; use BackendNode
YashasSamaga 1ee54e8
vectorize activation kernels
YashasSamaga 3f3d0af
fixes for gcc
YashasSamaga 14a79c5
remove wrong assertion
YashasSamaga 37c7026
fix broken assertion in unpooling primitive
YashasSamaga afa7c2b
fix build errors in non-CUDA build
YashasSamaga c44aefd
completely remove workspace from public API
YashasSamaga db3f4f7
fix permute layer
YashasSamaga 9725413
enable accuracy and perf. tests for DNN_TARGET_CUDA
YashasSamaga 47bbd14
add asynchronous forward
YashasSamaga 780eeaf
vectorize eltwise ops
YashasSamaga 0ffc1fa
vectorize fill kernel
YashasSamaga f93435f
fixes for gcc
YashasSamaga 6a23810
remove CSL headers from public API
YashasSamaga d66f72b
remove csl header source group from cmake
YashasSamaga 4ed600c
update min. cudnn version in cmake
YashasSamaga 91da82f
add numerically stable FP32 log1pexp
YashasSamaga 027c0d6
refactor code
YashasSamaga ec5342d
add FP16 specialization to cudnn based tensor addition
YashasSamaga 66de1f2
vectorize scale1 and bias1 + minor refactoring
YashasSamaga bd8a84e
fix doxygen build
YashasSamaga 94f7bad
fix invalid alignment assertion
YashasSamaga c681da6
clear backend wrappers before allocateLayers
YashasSamaga cdb53f6
ignore memory lock failures
YashasSamaga 8732d4f
do not allocate internal blobs
YashasSamaga ca81308
integrate NVTX
YashasSamaga c22d1e8
add numerically stable half precision log1pexp
YashasSamaga 0d780d7
fix indentation, following coding style, improve docs
YashasSamaga 9578fc2
remove accidental modification of IE code
YashasSamaga 8b8f780
Revert "add asynchronous forward"
YashasSamaga 9c75b0b
[cmake] throw error for unsupported CC versions
YashasSamaga 4752d7b
fix rebase issues
YashasSamaga 71829dc
add more docs, refactor code, fix bugs
YashasSamaga 7cf6874
minor refactoring and fixes
YashasSamaga 7fc76a4
resolve warnings/errors from clang
YashasSamaga 2818f1c
remove haveCUDA() checks from supportBackend()
YashasSamaga a97c6c5
remove NVTX integration
YashasSamaga 886b01c
changes based on review comments
YashasSamaga 4536219
avoid exception when no CUDA device is present
YashasSamaga 5eb7fa5
add color code for CUDA in Net::dump
YashasSamaga File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
add DNN_TARGET_FP16
- Loading branch information
commit 6df05bf6145092c3bf846e4d54f3abcac7b3ef75
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.