Codestin Search App

seryilmaz · 2020-11-04T03:00:33Z

Fuses dropout and softmax both in fprop and bprop. This is only done in additive masked case for now, used by BERT. Following changes are made:

To avoid writing softmax outputs in fprop, recompute softmax results in dgrad kernel
Use pytorch philox random number generator for dropout instead of cuRAND since it is faster
Kernels are vectorized for large enough sequence length and when sequences are multiples of 4.

set random offset fix typos in softmax.h write softmax results, needed for dgrad fix typo typos typo typo set types remove some includes remove nullpyt, use reinterpret cast data type instead of pointer for data_ptr template argument fix pointer increments in vector copies changes for recomputing softmax in dgrad typo fixes for recomputing softmax in dgrad change backward function in fused dropout module typo change function name typo typo typo typo don't return pad_mask in fprop function add stream fix softmax dgrad summation save more memory by removing softmax output remove softmax results from cpp file typo typo debugging print typo remove debugging stuff some vectorization optimizations vectorize both fprop and bprop typo try no vectorization remove float4 for dropout Revert "remove float4 for dropout" This reverts commit 59894b4. Revert "try no vectorization" This reverts commit b2ef02f. cleanup typo typo typo typo use null tensor for backward typo add specialization for vectorization typo typo don't use hadd2 for additive mask print args typo pull rand generation early, pipeline mask stores use half2 for loads remove prints philox from pytorch upstream actually ad philox file cleanup cleanup

kevinstephano

This looks okay to me.

seryilmaz added 6 commits October 7, 2020 16:43

fuse dropout into softmax in fprop for additive mask case

71e88a2

typo

eb143bc

different function for vector4 case

77822ec

vectorize seq 128 too

036f43c

range bounds for element within batch

16eb9e9

kevinstephano approved these changes Dec 2, 2020

View reviewed changes

add tests

8a0209c

ptrblck merged commit 3fe10b5 into NVIDIA:master Dec 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seryilmaz/fused dropout softmax#985

Seryilmaz/fused dropout softmax#985
ptrblck merged 7 commits into
NVIDIA:masterfrom
seryilmaz:seryilmaz/fused_dropout_softmax

seryilmaz commented Nov 4, 2020 •

edited

Loading

Uh oh!

kevinstephano left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

seryilmaz commented Nov 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevinstephano left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

seryilmaz commented Nov 4, 2020 •

edited

Loading