Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Small scale distributed training of sequential deep learning models, built on Numpy and MPI.

Notifications You must be signed in to change notification settings

siboehm/ShallowSpeed

Repository files navigation

Shallowspeed

stability-wip

A tiny POC implementation of distributed training for sequential deep learning models. Implemented using plain Numpy & mpi4py.

Currently implements:

  • Sequential models / deep MLPs, training using SGD.
  • Data parallel training with interleaved communication & computation, similar to PyTorch's DistributedDataParallel.
  • Pipeline parallel training:
    • Naive schedule without interleaved stages.
    • Gpipe schedule with interleaved FWD & interleaved BWD.
    • (soon) PipeDream Flush schedule with additional inter-FWD & BWD interleaving.
  • Any combination of DP & PP algorithms.

Setup

conda env create
pip install -e .
# M1 Macs: conda install "libblas=*=*accelerate"
python download_dataset.py
pytest

Usage

# Sequential training
python train.py
# Data parallel distributed training
mpirun -n 4 python train.py --dp 4
# Pipeline parallel distributed training
mpirun -n 4 python train.py --pp 4 --schedule naive
# Data & pipeline parallel distributed training
mpirun -n 8 python train.py --dp 2 --pp 4 --schedule gpipe

Internals

About

Small scale distributed training of sequential deep learning models, built on Numpy and MPI.

Topics

Resources

Stars

Watchers

Forks

Languages