Codestin Search App

Introduction

Source code of Middleware'24 paper: Cannikin: Optimal Adaptive Distributed DNN Training over Heterogeneous Clusters

Environment setup

Docker image: pytorch/pytorch 2.1.0-cuda12.1-cudnn8-devel

Numpy version: 1.22.4

Easy-to-use Elastic API

Aladdin introduced the HeteroDataLoader for adaptive batch size training over heterogeneous clusters. For other APIs, refer the AdaptDL Documentation

BEFORE:

torch.distributed.init_process_group("nccl")
model = torch.nn.parallel.DistributedDataParallel(model)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=128)
for epoch in range(100):
    ...

AFTER:

adaptdl.torch.init_process_group("nccl")
model = adaptdl.torch.AdaptiveDataParallel(model, optimizer)
dataloader = adaptdl.torch.HeteroDataLoader(dataset, batch_size=128)
for epoch in adaptdl.torch.remaining_epochs_until(100):
    ...

Getting Started

Cannikin is built based on the adaptive training library of AdaptDL. It can be used following:

Adapting the batch size and learning rate for a single training job

(Standalone Training).

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
.github/workflows		.github/workflows
adaptdl		adaptdl
cli		cli
deploy		deploy
docs		docs
examples		examples
grafana		grafana
helm/adaptdl-sched		helm/adaptdl-sched
ray		ray
sched		sched
tests		tests
tutorial		tutorial
.dockerignore		.dockerignore
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
requirements-test.txt		requirements-test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Environment setup

Easy-to-use Elastic API

Getting Started

About

Uh oh!

Releases

Packages

Languages

License

chengyinie/hetero_adaptdl

Folders and files

Latest commit

History

Repository files navigation

Introduction

Environment setup

Easy-to-use Elastic API

Getting Started

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages