Changelog for DilatedAttention with ParallelWrapper:
1. Added ParallelWrapper Class
- Introduced a
ParallelWrapperclass to simplify the usage of data parallelism. - The
ParallelWrapperclass:- Takes a neural network model as input.
- Allows the user to specify a device ("cuda" or "cpu").
- Contains a flag
use_data_parallelto enable or disable data parallelism. - Checks if multiple GPUs are available and applies
nn.DataParallelto the model accordingly. - Redirects attribute accesses to the internal model for seamless usage.
2. Modified Usage of DilatedAttention Model
- Wrapped the
DilatedAttentionmodel using theParallelWrapperclass. - Enabled the model to be run on multiple GPUs if available.
3. Device Assignment
- Explicitly defined a device and used it to specify where the
DilatedAttentionmodel should be loaded. - The device defaults to GPU (
cuda:0) if CUDA is available; otherwise, it defaults to CPU.
4. Example Usage
- Provided an example of how to initialize and use the
ParallelWrapperwith theDilatedAttentionmodel.
Summary:
The key addition was the ParallelWrapper class to facilitate easy and configurable usage of data parallelism with the provided DilatedAttention model. This ensures scalability across multiple GPUs without any significant change in the existing workflow. The user can now enable or disable data parallelism using a single flag.