Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Seryilmaz/fused dropout softmax#985

Merged
ptrblck merged 7 commits into
NVIDIA:masterfrom
seryilmaz:seryilmaz/fused_dropout_softmax
Dec 4, 2020
Merged

Seryilmaz/fused dropout softmax#985
ptrblck merged 7 commits into
NVIDIA:masterfrom
seryilmaz:seryilmaz/fused_dropout_softmax

Conversation

@seryilmaz
Copy link
Copy Markdown
Contributor

@seryilmaz seryilmaz commented Nov 4, 2020

Fuses dropout and softmax both in fprop and bprop. This is only done in additive masked case for now, used by BERT. Following changes are made:

  1. To avoid writing softmax outputs in fprop, recompute softmax results in dgrad kernel
  2. Use pytorch philox random number generator for dropout instead of cuRAND since it is faster
  3. Kernels are vectorized for large enough sequence length and when sequences are multiples of 4.

set random offset

fix typos in softmax.h

write softmax results, needed for dgrad

fix typo

typos

typo

typo

set types

remove some includes

remove nullpyt, use reinterpret cast

data type instead of pointer for data_ptr template argument

fix pointer increments in vector copies

changes for recomputing softmax in dgrad

typo

fixes for recomputing softmax in dgrad

change backward function in fused dropout module

typo

change function name

typo

typo

typo

typo

don't return pad_mask in fprop function

add stream

fix softmax dgrad summation

save more memory by removing softmax output

remove softmax results from cpp file

typo

typo

debugging print

typo

remove debugging stuff

some vectorization optimizations

vectorize both fprop and bprop

typo

try no vectorization

remove float4 for dropout

Revert "remove float4 for dropout"

This reverts commit 59894b4.

Revert "try no vectorization"

This reverts commit b2ef02f.

cleanup

typo

typo

typo

typo

use null tensor for backward

typo

add specialization for vectorization

typo

typo

don't use hadd2 for additive mask

print args

typo

pull rand generation early, pipeline mask stores

use half2 for loads

remove prints

philox from pytorch upstream

actually ad philox file

cleanup

cleanup
Copy link
Copy Markdown
Contributor

@kevinstephano kevinstephano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks okay to me.

@ptrblck ptrblck merged commit 3fe10b5 into NVIDIA:master Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants