This is the repository for our 2024 ACL Findings paper "Rethinking Efficient Multilingual Text Summarization Meta-Evaluation".
MRoSE is created by translating the derived form of CNN/DailyMail test set from the RoSE Dataset into 30 languages using M2M-100 and GPT-3.5 models.
The dataset encompasses a diverse range of languages, as listed below:
| Language Code | Language Name |
|---|---|
| ar | Arabic |
| cs | Czech |
| de | German |
| es | Spanish |
| fi | Finnish |
| fr | French |
| he | Hebrew |
| hi | Hindi |
| id | Indonesian |
| ja | Japanese |
| ko | Korean |
| nl | Dutch |
| pt | Portuguese |
| ru | Russian |
| sv | Swedish |
| ta | Tamil |
| tr | Turkish |
| uk | Ukrainian |
| vi | Vietnamese |
| zh | Chinese |
| bg | Bulgarian |
| bn | Bengali |
| el | Greek |
| fa | Persian |
| ga | Irish |
| hu | Hungarian |
| it | Italian |
| lt | Lithuanian |
| pl | Polish |
| yi | Yiddish |
We utilized the following models for translating our dataset:
| Model Name | Description |
|---|---|
| GPT-3.5 Turbo | A large language model by OpenAI |
| M2M-100 | a multilingual seq-to-seq model primarily intended for translation tasks |
Our transformed datasets can be found in ./rose_translation/ directory.
Our analysis employs a range of metrics to assess text summarization performance comprehensively:
| Metric Type | Metric Name | Description and Reference |
|---|---|---|
| Text Matching Metrics | ROUGE-1,2,L | / |
| BLEU | / | |
| METEOR | / | |
| Neural Metrics | BERTScore | Uses bert-base-multilingual-cased version |
| BARTScore | Uses mbart-large-50 model from MBART | |
| MoverScore | Uses bert-base-multilingual-cased version |
The /metrics/ directory contains all computed evaluation metrics, which are further organized into two subdirectories: /translated_metrics/ and /back_translated_metrics/. The structure for each file is displayed below:
[
{
"example_id": str,
"count_id": 0,
"metric_scores":
{
"bertscore_f1": {"brio-ext": float, "bart": float, ....},
"bertscore_p": {"brio-ext": float, "bart": float, ....},
....
}
},
{
"example_id": str,
"count_id": 1,
"metric_scores":
{
"bertscore_f1": {"brio-ext": float, "bart": float, ....},
"bertscore_p": {"brio-ext": float, "bart": float, ....},
....
}
},
....
]
The following versions are recommended:
- Python 3.9
- Pytorch 2.0.1
- Numpy 1.24.3
- Scipy 1.10.1
- Pandas 1.5.3
- Scikit-learn 1.2.2
- In
./scripts/metric_scores/compute_metrics.py, set back_src_path, back_output_path to compute back translated metric scores, and set src_path, output_path to compute translated metric scores.
python ./scripts/metric_scores/compute_metrics.py
- Configure the
./scripts/meta_evaluation/meta_evaluation.pyscript with your root directory paths for RESULTS, BASE_DIR, and HUMAN_SCORE.
python `./scripts/meta_evaluation/meta_evaluation.py`
Results will be saved in your desired path.