Releases: lRomul/argus
Maintenance release, new guides and updated docs
Fix
- Fix
AverageMeterfor n > 1 cases.
Breaking Changes
- Delete batch object after iteration complete.
- Don't store data loader in the state of engine.
New Features
- Return metrics from fit method the same way as from from validate.
- Use constructor from
BuildModelto be able to user passbuild_order.
Docs
New guides on:
- Custom metrics.
- Partial weights loading and manipulation.
- Model export.
- Custom callbacks.
- LR schedulers.
Other improvements:
- New solutions of competitions to examples.
- Improve docstrings in many places.
Chore
- Use
pyproject.toml. - Update GitHub Actions versions.
- Update dependencies.
- Use ruff linter.
Full Changelog: v1.0.0...v1.1.0
Argus 1.0.0
New Features
- Add mode argument to
argus.Model.train(like in torch).
Docs
- Add guides that provide an in-depth overview of how the framework works (link).
- Fix minor typos in docstrings.
Examples
- New example with sequential LR scheduler (link).
- Transitioning from torch.distributed.launch to torchrun in cifar_advanced example.
Chore
- Add
__all__for all modules. - Update CUDA 11.3.1.
- Update PyTorch 1.10.0.
Logo, pydata sphinx theme, custom state loading, share train and val states
New Features
-
Share train and val states between phases with
phase_statesattribute of state.@argus.callbacks.on_epoch_complete def some_validation_callback(state: argus.engine.State): train_step_output = state.phase_states['train'].step_output ...
-
Option to use custom state load function for
argus.load_model.def state_load_from_dir(dir_path): file_path = pathlib.Path(dir_path) / 'some_model_name.pth' return torch.load(dir_path) model = load_model(path_to_dir_with_model, state_load_func=custom_state_load_func)
Docs
- Argus logo!
- Migrate to pydata-sphinx-theme.
Fix
- Fix sdist package installation by adding
MANIFEST.inwithrequirements.txt.
Examples
- Use
torch.cuda.ampinstead Apex in advanced CIFAR example. - Add as an example solution for RANZCR CLiP - Catheter and Line Position Challenge.
Chore
setup.cfgwith pytest and flake8 settings.- CI check code style with flake8.
- Run tests on macOS and Windows.
- Update Dockerfile and tests to PyTorch 1.8.0.
- Update Dockerfile to CUDA 11.1.
Save optimizer state, improve docs and typing
New Features
- Add saving of optimizer state for
argus.Modeland checkpoint callbacks.model.save('models/model.pth', optimizer_state=True) checkpoint = Checkpoint(dir_path='models/', optimizer_state=True)
- Add
get_devicemethod toargus.Model. - Typing and fixing most cases of
mypyerrors.
Fix
- Remove
torch.optim._multi_tensoroptimizers from defaults (torch >= 1.7.0).
Docs
- Section
argus.engine. - Section
argus.metrics. - Section
argus.utilswith deep conversions. - Add docs for decorator callbacks.
- Add docs for
argus.Modelmethods:__init__,set_device,get_device,get_nn_module. - Update examples section.
- Proofread and improve docs. Many small docstring fixes.
Internal changes
- Use abstract container classes from
collections.abc. - Now
EngineandStateonly work with theargus.Modelmethods as astep_method. Phase name takes from the method name. - Simplify default logging.
Breaking Changes
- Change optimizer state in
argus.load_model. Nowchange_state_dict_functakes two argumentsnn_state_dictandoptimizer_state_dict(example). - Remove
handler_kwargs_dictfrom the attach method ofargus.callbacks.Callback.
Tests, replace params while model loading, custom events
New Features
- Tests, 100% coverage (codecov).
- Mechanism of
paramsreplacement while model loading (example).# change optimizer params model = load_model(model_path, optimizer=('AdamW', {'lr': 0.001})) # load model without optimizer and loss model = load_model(model_path, optimizer=None, loss=None)
- Custom events for callbacks (example).
import argus from argus.engine import EventEnum class CustomEvents(EventEnum): BACKWARD_START = 'backward_start' BACKWARD_COMPLETE = 'backward_complete' @argus.callbacks.on_event(CustomEvents.BACKWARD_START) def before_backward(state): ... class CustomEventModel(argus.Model): ... def train_step(self, batch, state): ... state.engine.raise_event(CustomEvents.BACKWARD_START) loss.backward() state.engine.raise_event(CustomEvents.BACKWARD_COMPLETE) ...
- Typing.
- Raise exceptions instead asserts.
- Setup unique logger for each instance of
argus.Model. - Check that
paramsis a pickleble at model construction. create_dirparameter forargus.callbacks.logging.LoggingToCSV.- Use instance of
argus.utils.Identityas default forprediction_transforminstead oflambda x: x.
Fix
- Correctly save checkpoints with
save_after_exceptionargument forargus.callbacks.checkpoints.
Breaking Changes
- Change default
appendargument value toFalseforargus.callbacks.logging.LoggingToFile. - Rename attribute
_schedulerofargus.callbacks.lr_schedulers.LRSchedulertoscheduler.
Custom build methods, more examples
Features
- New mechanics building of attributes. It allows customizing the creation of model parts. Example here.
- CIFAR example with Distributed Data Parallel, mixed precision, and gradient accumulation cifar_advanced.py.
- Add
save_modelmethod toargus.callbacks.checkpoints. It allows customizing checkpoint saving. - Add logging time and LR to
argus.callbacks.logging.LoggingToCSV. argus.utils.deep_chunksimilar to scatter function in PyTorch DataParallel.- Dockerfile and Makefile for developing.
Breaking Changes
- Use
argus.utils.deep_tofunction instead methodargus.Model.prepare_batch.argus.Model.prepare_batchremoved so if you use customval_trainortrain_stepyou should change replacetoinput, target = self.prepare_batch(batch, self.device)
input, target = deep_to(batch, self.device, non_blocking=True)
- Rename
max_epochstonum_epochsofargus.Model.fitmethod.model.fit(train_loader, val_loader=val_loader, num_epochs=1000)
- Remove
copy_lastparameter fromargus.callbacks.checkpoints. - Remove
periodparameter fromargus.callbacks.checkpoints.MonitorCheckpoint.
Documentation, LR scheduler step on iteration, new LR schedulers
New Features
- Documentation https://pytorch-argus.readthedocs.io
- Add step on iteration option for LR schedulers.
from argus.callbacks import CosineAnnealingLR CosineAnnealingLR(10000, step_on_iteration=True)
- New LR schedulers.
argus.callbacks.lr_schedulers.MultiplicativeLR: Multiply learning rate by the factor given in the specified function.argus.callbacks.lr_schedulers.OneCycleLR: One Cycle learning rate policy.
- Make LR scheduler step on epoch complete instead start.
- Compute metric score with torch no grad.
Fix
- Fix LR logging with several parameters group in optimizer.
- Fix key error in redefine metric warning.
Breaking Changes
- PyTorch requirements
torch>=1.1.0.
New LR schedulers, csv logger, state in step functions
New Features
-
CyclicLRandCosineAnnealingWarmRestartsLR schedulers.argus.callbacks.lr_schedulers.CyclicLR: Support for Cyclical Learning Rate and Momentum.argus.callbacks.lr_schedulers.CosineAnnealingWarmRestarts: Stochastic Gradient Descent with Warm Restarts.
-
argus.callbacks.logging.LoggingToCSV: add csv logger callback.from argus.callbacks import LoggingToCSV LoggingToCSV('path/to/log.csv', separator=',', append=False)
-
Add
trainandevalmode methods toargus.Model.model.train()sets thenn_modulein training mode.model.eval()sets thenn_modulein evaluation mode
-
Set
step_outputofStatetoNoneafter each iteration for saving GPU memory.
Breaking Changes
-
Pass state to train and val step functions:
Before:
def train_step(self, batch): ...
Now:
def train_step(self, batch, state: State): print(state.epoch) ...
-
Scheduler step on epoch start, train epochs from 0 to max_epochs - 1. The scheduler callback uses the epoch param of a scheduler step function, so it now works like in 20124.
-
Remove deprecated
to_deviceanddetach_tensorsutils functions
Data parallel
Data parallel for multi-gpu training.
Select gpu with device indexing:
model = load_model(model_path, device="cuda:1")
model.set_device("cuda:0")
For multi-gpu you can use list of devices:
params = {
...,
'device': ['cuda:0', 'cuda:1']
}
model = CnnFinetune(params)
model = load_model(model_path, device=["cuda:1", "cuda:0"])
model.set_device(["cuda:0", "cuda:1"])
Batch tensors will be scattered on dim 0. First device in list is location of output.
By default device "cuda" is one gpu training on torch.cuda.current_device.