- Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models [ICLR 2020] Cheolhyoung Lee, Kyunghyun Cho, Wanmo Kang.
- finetune on large-scale dataset to reduce overfiting.
- Origin + Dropout
- proof the random mixture function have one lower bound.
- Do We Need Zero Training Loss A�er Achieving Zero Training Error? [ICML 2020] Takashi Ishida, Ikko Yamane, Tomoya Sakai, Gang Niu, Masashi Sugiyama.
- Give a constant loss when overfit which can give more momentum to make model give up locally minimum.