PyTorch implementations of seq2seq models for Neural Machine Translation (NMT) task:
- seq2seq (RNN)
- seq2seq with attention (RNN + attention)
- ConvS2S
- Transformer
- DynamicConv (+ LightConv)
Please refer to no-torchtext tag.
In this version, the dataset.py, lang.py and data_prepare.py structuralize low-level text to
make it easier to use in the training code.
Supporting datasets include pytorch tutorial ENG to FRA translation dataset and torchtext NMT datasets.
org: ENG to FRA translation from pytorch tutorial- To use this data, please download dataset from https://download.pytorch.org/tutorial/data.zip first.
multi30kiwsltwmt14
- python3
- pyyaml
- pytorch >= 1.10
- tensorboard >= 1.14
- torchtext
- spacy
python -m spacy download enpython -m spacy download de
Hparams:
- Task & data: ENG to FRA translation task, max_len=14, min_freq=2.
Models:
| Model | Loss (sum) | PPL | BLEU* | Note |
|---|---|---|---|---|
| Seq2Seq | 15.11 | 6.320 | ||
| Seq2Seq + KV attn | 13.57 | 5.244 | 64.10 | |
| Seq2Seq + Additive attn | 13.28 | 5.054 | 64.48 | |
| Seq2Seq + Multiplicative attn | 14.01 | 5.526 | ||
| ConvS2S | 13.06 | 4.931 | 61.62 | |
| ConvS2S + out-caching | 12.44 | 4.572 | 60.90 | |
| Transformer-init | 12.73 | 4.675 | 66.38 | |
| LightConv | 12.29 | 4.493 | K=[3,3,5,5,7,7] | |
| DynamicConv | 11.81 | 4.237 | 68.35 | K=[3,3,5,5,7,7] |
- [!] BLEU is recorded in different run
- PPL and BLEU does not match
- about the Transformer
- after-norm does not work; should use before-norm.
- LR warmup and xavier init is important for the performance
- Beam search
- Word tokenization
- BPE
- Word piece model