Thanks to visit codestin.com
Credit goes to github.com

Skip to content

syedshazli/cuda-convolution-from-first-principles

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

113 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Convolution in CUDA From First Principles

Implementing a convolution layer in CUDA, based on Pytorch nn.conv2D.

Performance

On a A30 GPU, a unoptimized 4000 by 4000 convolution kernel runs in about 0.5 seconds. On a Intel Xeon Gold CPU, a 4000 by 4000 convolution runs in about 1.5 seconds.

Repository Structure

Code for convolution in cuda is under src, in which you'll find CPU implementations of convolution and a GPU implementation. Performance and correctness testing is under the 'Tests' directory, while scripts for running the tests is under the 'Scripts' directory.

About

Implementing a convolution layer in CUDA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors