Neural Machine Translation Non-Autoregressive Neural Machine Translation [ICLR 2018] Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, Richard Socher. The auxiliary task is fertilities. But it’s not robust in multi-label, so add KD RL in two-stage. Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation [-] Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah A. Smith. Reduce the decoder layer number to improve the inference efficiency. Improving Transformer Models by Reordering their Sublayers [ACL 2020] Ofir Press, Noah A. Smith, Omer Levy. Change FFN & MHA order.