This repository contains the source code for the paper Should we be Pre-Training ? Exploring End-Task Aware Training In Lieu of Continued Pre-training, by Lucio M Dery, Paul Michel, Ameet Talwalkar and Graham Neubig (ICLR 2022).
- Paper
- Bibtext :
@inproceedings{
dery2022should,
title={Should We Be Pre-training? An Argument for End-task Aware Training as an Alternative},
author={Lucio M. Dery and Paul Michel and Ameet Talwalkar and Graham Neubig},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=2bO2x8NAIMB}
}
This repo builds off the Don't Stop Pre-training paper repo here. Please follow their installation instructions. Repeated here for ease:
conda env create -f environment.yml
conda activate domainsOur experiments were run on A6000 and A100 gpus which have > 40G gpu memory. To ensure that batches fit into memory, consider modifying the following variables
--classf_iter_batchsz # Batch Size for primary task. Effective batchsize is (classf_iter_batchsz * gradient_accumulation_steps)
--per_gpu_train_batch_size # Batch Size for auxiliary tasks. Effective batchsize is (per_gpu_train_batch_size * gradient_accumulation_steps)
--gradient_accumulation_steps # Number of timesteps to accumulate gradient overWe used the TAPT baseline from the Don't Stop Pre-training paper. To reproduce this baseline, please follow the instructions in their repo here - to download and run their pre-trained models.
./run_mt_multiple.sh {task} {output_dir} {gpuid} {startseed} {endseed}
./run_meta_multiple.sh {task} {output_dir} {gpuid} {startseed} {endseed}
Modify the following lines in *.sh files to allow mlm auxiliary tasks based on multiple datasets :
--train_data_file [file1 file2 file3]
--aux-task-names [MLM1 MLM2 MLM3]note that the data used for DAPT auxiliary tasks can be found in datasets/{task}/domain.NxTAPT.txt
