Commit 7995de1
authored
DP-independent checkpoint format for distributed Adam optimizer (#1704)
* Add new checkpoint format for distributed Adam optimizer
v2 format no longer requires parallel configuration and optimizer options to match between saving and loading checkpoints.
Signed-off-by: Tim Moon <[email protected]>
* Overlap NCCL gather and CPU memcpy in distopt checkpointing
Signed-off-by: Tim Moon <[email protected]>
* Black formatting
Signed-off-by: Tim Moon <[email protected]>
---------
Signed-off-by: Tim Moon <[email protected]>1 parent 38a1269 commit 7995de1
2 files changed
Lines changed: 697 additions & 74 deletions
0 commit comments