IteraTeR: Understanding Iterative Revision from Human-Written Text

This repository provides datasets and code for preprocessing, training and testing models for Iterative Text Revision with the official Hugging Face implementation of the following paper:

Understanding Iterative Revision from Human-Written Text
Wanyu Du, Vipul Raheja, Dhruv Kumar, Zae Myung Kim, Melissa Lopez and Dongyeop Kang
ACL 2022

It is mainly based on transformers.

Installation

The following command installs all necessary packages:

pip install -r requirements.txt

The project was tested using Python 3.7.

HuggingFace Integration

We uploaded both our datasets and model checkpoints to Hugging Face's repo. You can directly load our data using datasets and load our model using transformers.

# load our dataset
from datasets import load_dataset
dataset = load_dataset("wanyu/IteraTeR_human_sent")

# load our model
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("wanyu/IteraTeR-PEGASUS-Revision-Generator")
model = AutoModelForSeq2SeqLM.from_pretrained("wanyu/IteraTeR-PEGASUS-Revision-Generator")

You can change the following data and model specifications:

"wanyu/IteraTeR_human_sent": sentence-level IteraTeR-HUMAN dataset;
"wanyu/IteraTeR_human_doc": document-level IteraTeR-HUMAN dataset;
"wanyu/IteraTeR_full_sent": sentence-level IteraTeR-FULL dataset;
"wanyu/IteraTeR_full_doc": document-level IteraTeR-FULL dataset;
"wanyu/IteraTeR-PEGASUS-Revision-Generator": PEGASUS model fine-tuned on sentence-level IteraTeR-FULL dataset, see usage example here;
"wanyu/IteraTeR-BART-Revision-Generator": BART model fine-tuned on sentence-level IteraTeR-FULL dataset, see usage example here;

We also provided a demo code for how to use them to do iterative text revision.

Datasets

You can load our dataset using Hugging Face's datasets, and you can also download the raw data in datasets/.
We splited IteraTeR dataset as follows:

	Document-level			Sentence-level
Dataset	Train	Dev	Test	Train	Dev	Test
IteraTeR-FULL	29848	856	927	157579	19705	19703
IteraTeR-HUMAN	481	27	51	3254	400	364

All data and detailed description for the data structure can be found under datasets/.
Code for collecting the revision history data can be found under code/crawler/.

Models

Intent classification model

Model checkpoints

Model	Dataset	Edit-Intention	Precision	Recall	F1
Roberta	IteraTeR-HUMAN	Clarity	0.75	0.63	0.69
Roberta	IteraTeR-HUMAN	Fluency	0.74	0.86	0.80
Roberta	IteraTeR-HUMAN	Coherence	0.29	0.36	0.32
Roberta	IteraTeR-HUMAN	Style	1.00	0.07	0.13
Roberta	IteraTeR-HUMAN	Meaning-changed	0.44	0.69	0.53

Model training and inference

The code and instructions for the training and inference of the intent classifier model can be found under code/model/intent_classification/.

Generation models

Model checkpoints

Model	Dataset	SARI	BLEU	ROUGE-L	Avg.
BART	IteraTeR-FULL	37.28	77.50	86.14	66.97
PEGASUS	IteraTeR-FULL	37.11	77.60	86.84	67.18

Model training and inference

The code and instructions for the training and inference of the Pegasus and BART models can be found under code/model/generation/.

Human-in-the-loop Iterative Text Revision

This repository also contains the code and data of the following paper:

Understanding Iterative Revision from Human-Written Text
Wanyu Du¹, Zae Myung Kim¹, Vipul Raheja, Dhruv Kumar and Dongyeop Kang
First Workshop on Intelligent and Interactive Writing Assistants (ACL 2022)

The IteraTeR_v2 dataset is larger than IteraTeR with around 24K more unique documents and 170K more edits, which is splitted as follows:

	Train	Dev	Test
IteraTeR_v2	292929	34029	39511

Human-model interaction data in R3: we also provide our collected human-model interaction data in R3 in dataset/R3_eval_data.zip.

Citation

If you find this work useful for your research, please cite our papers:

Understanding Iterative Revision from Human-Written Text

@inproceedings{du2022iterater,
    title = "Understanding Iterative Revision from Human-Written Text",
    author = "Du, Wanyu and Raheja, Vipul and Kumar, Dhruv and Kim, Zae Myung and Lopez, Melissa and Kang, Dongyeop",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
    year = "2022",
    publisher = "Association for Computational Linguistics",
}

Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision

@inproceedings{du2022r3,
    title = "Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision",
    author = "*Du, Wanyu and *Kim, Zae Myung and Raheja, Vipul and Kumar, Dhruv and Kang, Dongyeop",
    booktitle = "Proceedings of the First Workshop on Intelligent and Interactive Writing Assistants",
    year = "2022",
    publisher = "Association for Computational Linguistics",
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
code		code
dataset		dataset
.gitignore		.gitignore
IteraTeR_demo.ipynb		IteraTeR_demo.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IteraTeR: Understanding Iterative Revision from Human-Written Text

Installation

HuggingFace Integration

Datasets

Models

Intent classification model

Model checkpoints

Model training and inference

Generation models

Model checkpoints

Model training and inference

Human-in-the-loop Iterative Text Revision

Citation

Understanding Iterative Revision from Human-Written Text

Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision

About

Uh oh!

Releases

Packages

Languages

License

dykang/IteraTeR

Folders and files

Latest commit

History

Repository files navigation

IteraTeR: Understanding Iterative Revision from Human-Written Text

Installation

HuggingFace Integration

Datasets

Models

Intent classification model

Model checkpoints

Model training and inference

Generation models

Model checkpoints

Model training and inference

Human-in-the-loop Iterative Text Revision

Citation

Understanding Iterative Revision from Human-Written Text

Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages