This repository is meant to port most of the original KFAC repository to PyTorch.
KFAC is an approximation of the natural gradients, i.e., it is an approximate second-order method allowing for larger step sizes at the cost of compute time. This should enable faster convergence.
git clone https://github.com/n-gao/pytorch-kfac.git
cd pytorch-kfac
python setup.py installAn example notebook is provided in examples.
One has to enable the tracking of the forward pass and backward pass. This prevents the optimizer to track other passes, e.g., for validation.
kfac = torch_kfac.KFAC(model, learning_rate=1e-3, damping=1e-3)
...
model.zero_grad()
with kfac.track_forward():
# forward pass
loss = ...
with kfac.track_backward():
# backward pass
loss.backward()
kfac.step()
# Do some validationThis implementation features the following features:
- Regular and Adam momentum
- Adaptive damping
- Weight decay
- Norm constraint
The following layers are currently supported:
nn.Linearnn.Conv1d,nn.Conv2d,nn.Conv3d
Further, the code is designed to be easily extendible by implementing the torch_kfac.layers.Layer interface with a custom layer.
Unsupported layers fall back to classical gradient descent.
-
Due to the different structure of the KFAC optimizer, implementing the
torch.optim.optimizer.Optimizerinterface could only work by hacks, but you are welcome to add this. This implies thattorch.optim.lr_schedulerare not available for KFAC. However, the learning rate can still be manually be edited by changing the.learning_rateproperty. -
Currently it the optimizer assumes that the output is reduced by
mean, other reductions are not supported yet. So if you use e.g.torch.nn.CrossEntropyLossmake sure to havereduction='mean'. -
Add more examples
-
Add support for shared layers (,e.g., RNNs)
-
Documentation
-
Add support for distributed training (This is partially possible by synchronizing the covariance matrices and gradients manually.)
-
optimizer.zero_gardis not available yet, you can usemodel.zero_grad
This work is based on: