Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit aee5aff

Browse files
committed
Merge branch 'master' into deyuf/fused_optimizer_v2
2 parents 007c594 + 880ab92 commit aee5aff

71 files changed

Lines changed: 6335 additions & 32 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,11 @@ A Python-only build omits:
9494
- Fused kernels that improve the performance of `apex.parallel.DistributedDataParallel` and `apex.amp`.
9595
`DistributedDataParallel`, `amp`, and `SyncBatchNorm` will still be usable, but they may be slower.
9696

97+
To enable PyProf support, you need to install the packages required by PyProf. To do so, add the "--pyprof" option at installation time:
98+
```
99+
$ pip install -v --no-cache-dir --global-option="--pyprof" --global-option="--cpp_ext" --global-option="--cuda_ext" ./
100+
```
101+
97102
### Windows support
98103
Windows support is experimental, and Linux is recommended. `pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .` may work if you were able to build Pytorch from source
99104
on your system. `pip install -v --no-cache-dir .` (without CUDA/C++ extensions) is more likely to work. If you installed Pytorch in a Conda environment, make sure to install Apex in that same environment.

apex/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# May help avoid undefined symbol errors https://pytorch.org/cppdocs/notes/faq.html#undefined-symbol-errors-from-pytorch-aten
22
import torch
3+
import warnings
34

45
from . import parallel
56
from . import amp
@@ -14,3 +15,4 @@
1415
# load time) the error message is timely and visible.
1516
from . import optimizers
1617
from . import normalization
18+
from . import pyprof

apex/pyprof/FAQs.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
1. How do I intercept the Adam optimizer in APEX ?
2+
3+
```python
4+
from apex import pyprof
5+
import fused_adam_cuda
6+
pyprof.nvtx.wrap(fused_adam_cuda, 'adam')
7+
```
8+
9+
2. If you are using JIT and/or AMP, the correct initialization sequence is
10+
1. Let any JIT to finish.
11+
2. Initlialize pyprof `pyprof.nvtx.init()`.
12+
3. Initialize AMP.
13+
14+
3. How do I profile with `torch.distributed.launch` ?
15+
16+
```python
17+
nvprof -f -o net%p.sql \
18+
--profile-from-start off \
19+
--profile-child-processes \
20+
python -m torch.distributed.launch net.py
21+
```

apex/pyprof/README.md

Lines changed: 252 additions & 0 deletions
Large diffs are not rendered by default.

apex/pyprof/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
import warnings
2+
3+
from . import nvtx

apex/pyprof/examples/.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
__pycache__
2+
*.sql
3+
*.dict
4+
*.csv
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
This directory has examples of how to use `pyprof` with APEX extensions e.g. `fused_adam_cuda` and `fused_layer_norm_cuda`.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
import torch
2+
import fused_adam_cuda
3+
from apex.optimizers import FusedAdam, FP16_Optimizer
4+
from apex import pyprof
5+
6+
pyprof.nvtx.init()
7+
pyprof.nvtx.wrap(fused_adam_cuda, 'adam')
8+
9+
model = torch.nn.Linear(10, 20).cuda().half()
10+
criterion = torch.nn.CrossEntropyLoss().cuda()
11+
optimizer = FusedAdam(model.parameters())
12+
optimizer = FP16_Optimizer(optimizer)
13+
14+
x = torch.ones(32, 10).cuda().half()
15+
target = torch.empty(32, dtype=torch.long).random_(20).cuda()
16+
y = model(x)
17+
loss = criterion(y, target)
18+
optimizer.zero_grad()
19+
loss.backward()
20+
optimizer.step()
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
import torch
2+
import fused_layer_norm_cuda
3+
from apex.normalization import FusedLayerNorm
4+
from apex import pyprof
5+
6+
pyprof.nvtx.init()
7+
pyprof.nvtx.wrap(fused_layer_norm_cuda, 'forward')
8+
pyprof.nvtx.wrap(fused_layer_norm_cuda, 'backward')
9+
pyprof.nvtx.wrap(fused_layer_norm_cuda, 'forward_affine')
10+
pyprof.nvtx.wrap(fused_layer_norm_cuda, 'backward_affine')
11+
12+
input = torch.randn(20, 5, 10, 10).cuda()
13+
14+
# With Learnable Parameters
15+
m = FusedLayerNorm(input.size()[1:]).cuda()
16+
output = m(input)
17+
18+
# Without Learnable Parameters
19+
m = FusedLayerNorm(input.size()[1:], elementwise_affine=False).cuda()
20+
output = m(input)
21+
22+
# Normalize over last two dimensions
23+
m = FusedLayerNorm([10, 10]).cuda()
24+
output = m(input)
25+
26+
# Normalize over last dimension of size 10
27+
m = FusedLayerNorm(10).cuda()
28+
output = m(input)

apex/pyprof/examples/apex/test.sh

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
#!/bin/bash
2+
3+
set -e
4+
5+
SCRIPT=`realpath $0`
6+
SCRIPTPATH=`dirname $SCRIPT`
7+
PYPROF="$SCRIPTPATH/../.."
8+
9+
parse="python $PYPROF/parse/parse.py"
10+
prof="python $PYPROF/prof/prof.py"
11+
12+
for f in *.py
13+
do
14+
base=`basename $f .py`
15+
sql=$base.sql
16+
dict=$base.dict
17+
18+
#NVprof
19+
echo "nvprof -fo $sql python $f"
20+
nvprof -fo $sql python $f
21+
22+
#Parse
23+
echo $parse $sql
24+
$parse $sql > $dict
25+
26+
#Prof
27+
echo $prof $dict
28+
$prof -w 130 $dict
29+
\rm $sql $dict
30+
done

0 commit comments

Comments
 (0)