Detecting Text Overlap with Work in arXiv
=========================================

Submissions are sometimes marked with an "arXiv admin note" indicating
text overlap with other arXiv articles. Determination of significant
text overlaps is based on a statistical analysis of the existing arXiv
corpus, with overlaps classified according to whether the overlapping
articles have coauthors in common and whether one cites the other.

The indication of text overlap has been added to arXiv as a service for
readers, who frequently find it useful to know when an article draws
heavily from another or supersedes an earlier work. A text overlap note
may therefore help identify related content. It can also be of
assistance to authors who may not be aware that importing large sections
of text either from their earlier articles or from articles by others is
not common practice. Lastly, it may serve as a quality flag. There is a
statistically significant correlation between the amount of reused
content in an article and a smaller number of citations received years
later. For a recent scientific analysis of text reuse within arXiv, see
Citron and Ginsparg, Patterns of Text Reuse in a Scientific Corpus, PNAS
2014, DOI:
[10.1073/pnas.1415135111](http://doi.org/10.1073/pnas.1415135111)
([arXiv:1412.2716](http://arxiv.org/abs/1412.2716)).

An arXiv admin note indicating text overlap does not suggest misconduct
on the part of the author or that an article does not contain original
work. In particular, these notes are not an attempt to detect or
indicate
"[plagiarism](http://digitalliteracy.cornell.edu/integrity/dpl3320.html),"
which is the unattributed use of the words or ideas of others. arXiv
admin notes indicating text overlap are simply factual statements about
the textual overlap of materials within arXiv. Note that arXiv may
reject or withdraw papers that contain the unattributed use of another
author's work.

The threshold for the addition of a text overlap admin note is set quite
high so that many articles with smaller amounts of detected overlap are
not noted. A submission flagged as having text overlap with an article
"by other authors" must have at least multiple consecutive sentences in
common with the earlier work. Overlap between articles with at least one
coauthor in common is permitted an even higher threshold. For articles
having a coauthor in common, there are additional exceptions. In
addition, certain classes of articles naturally encompass other source
articles, either in part or in full, such as review articles, theses,
conference proceedings, and book contributions. Articles that are marked
by authors in the ["Comments" field](/help/prep#comments)
as belonging to this class (i.e.,being identified as review articles, theses, 
conference proceedings, book contributions, etc.) are not noted as having text 
overlaps with their source articles, even though they may still be marked as
having overlap with other documents.

## User supplied overlap notes 

Submitters can preempt the addition of a text overlap admin note by
marking any known overlaps in advance in the 
["Comments" field](/help/prep#comments)
(as long as the earlier appearance of the text has a coauthor in common): for
example, "this article draws heavily from arXiv:x, arXiv:y," or "this
article supersedes arXiv:z."

## Appeals

A submitter who believes that an admin note indicating text overlap has
been incorrectly applied to their article should [contact](contact)
arXiv moderation with a detailed explanation or justification.
