Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Official implementation of the paper "IteraTeR: Understanding Iterative Revision from Human-Written Text" (ACL 2022)

License

Notifications You must be signed in to change notification settings

dykang/IteraTeR

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IteraTeR: Understanding Iterative Revision from Human-Written Text

This repository provides datasets and code for preprocessing, training and testing models for Iterative Text Revision with the official Hugging Face implementation of the following paper:

Understanding Iterative Revision from Human-Written Text
Wanyu Du, Vipul Raheja, Dhruv Kumar, Zae Myung Kim, Melissa Lopez and Dongyeop Kang
ACL 2022

It is mainly based on transformers.

Installation

The following command installs all necessary packages:

pip install -r requirements.txt

The project was tested using Python 3.7.

HuggingFace Integration

We uploaded both our datasets and model checkpoints to Hugging Face's repo. You can directly load our data using datasets and load our model using transformers.

# load our dataset
from datasets import load_dataset
dataset = load_dataset("wanyu/IteraTeR_human_sent")

# load our model
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("wanyu/IteraTeR-PEGASUS-Revision-Generator")
model = AutoModelForSeq2SeqLM.from_pretrained("wanyu/IteraTeR-PEGASUS-Revision-Generator")

You can change the following data and model specifications:

We also provided a demo code for how to use them to do iterative text revision.

Datasets

You can load our dataset using Hugging Face's datasets, and you can also download the raw data in datasets/.
We splited IteraTeR dataset as follows:

Document-level Sentence-level
Dataset Train Dev Test Train Dev Test
IteraTeR-FULL 29848 856 927 157579 19705 19703
IteraTeR-HUMAN 481 27 51 3254 400 364

All data and detailed description for the data structure can be found under datasets/.
Code for collecting the revision history data can be found under code/crawler/.

Models

Intent classification model

Model checkpoints

Model Dataset Edit-Intention Precision Recall F1
Roberta IteraTeR-HUMAN Clarity 0.75 0.63 0.69
Roberta IteraTeR-HUMAN Fluency 0.74 0.86 0.80
Roberta IteraTeR-HUMAN Coherence 0.29 0.36 0.32
Roberta IteraTeR-HUMAN Style 1.00 0.07 0.13
Roberta IteraTeR-HUMAN Meaning-changed 0.44 0.69 0.53

Model training and inference

The code and instructions for the training and inference of the intent classifier model can be found under code/model/intent_classification/.

Generation models

Model checkpoints

Model Dataset SARI BLEU ROUGE-L Avg.
BART IteraTeR-FULL 37.28 77.50 86.14 66.97
PEGASUS IteraTeR-FULL 37.11 77.60 86.84 67.18

Model training and inference

The code and instructions for the training and inference of the Pegasus and BART models can be found under code/model/generation/.

Human-in-the-loop Iterative Text Revision

This repository also contains the code and data of the following paper:

Understanding Iterative Revision from Human-Written Text
Wanyu Du1, Zae Myung Kim1, Vipul Raheja, Dhruv Kumar and Dongyeop Kang
First Workshop on Intelligent and Interactive Writing Assistants (ACL 2022)

The IteraTeR_v2 dataset is larger than IteraTeR with around 24K more unique documents and 170K more edits, which is splitted as follows:

Train Dev Test
IteraTeR_v2 292929 34029 39511

Human-model interaction data in R3: we also provide our collected human-model interaction data in R3 in dataset/R3_eval_data.zip.

Citation

If you find this work useful for your research, please cite our papers:

Understanding Iterative Revision from Human-Written Text

@inproceedings{du2022iterater,
    title = "Understanding Iterative Revision from Human-Written Text",
    author = "Du, Wanyu and Raheja, Vipul and Kumar, Dhruv and Kim, Zae Myung and Lopez, Melissa and Kang, Dongyeop",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
    year = "2022",
    publisher = "Association for Computational Linguistics",
}

Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision

@inproceedings{du2022r3,
    title = "Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision",
    author = "*Du, Wanyu and *Kim, Zae Myung and Raheja, Vipul and Kumar, Dhruv and Kang, Dongyeop",
    booktitle = "Proceedings of the First Workshop on Intelligent and Interactive Writing Assistants",
    year = "2022",
    publisher = "Association for Computational Linguistics",
}

About

Official implementation of the paper "IteraTeR: Understanding Iterative Revision from Human-Written Text" (ACL 2022)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 82.6%
  • Jupyter Notebook 15.9%
  • Shell 1.5%