BROWN
CORPUS
INTRODUCTION
The Brown corpus (full name Brown University
Standard Corpus of
Present-Day American English) was the first text
corpus of American English. The original corpus was
published in 1963–1964 by W. Nelson Francis and
Henry Kučera (Department of Linguistics, Brown
University Providence, Rhode Island, USA). The
purpose of the corpus was to investigate the linguistic
features of the American English. It helped establish
the methodology for compiling representative corpora
that could be used to study differences between
languages and variations over time.
Origin
1960s Corpus Projects:
University of Edinburgh (Scotland): Developed a 300,000-word spoken corpus of
British English. Challenges included slow transcription and lack of computers.
Brown University (USA): Henry Kucera and W. Nelson Francis created the one-
million-word Brown Corpus.
Impact of the Brown Corpus
Gained popularity for linguistic research.
Sparked realization of the value in analyzing large, structured
text datasets for identifying language patterns.
CHARACTERISTICS
The origin and Time period
(all the texts
Balanced Accessibility to
composition of computer
the text (the selected for representat
processing
author had to be the corpus ion of (special
a native speaker were first
different markings to
of American published in convey graphic
English) 1961) genres
features of the
text).
Size
The Brown Corpus was the first large
computerized corpus of American
English. The corpus consists of 1
million words (500 samples of 2000+
words each) of running text of edited
English prose printed in the United
States during the year 1961 and it was
revised and amplified in 1979.
Balance
Balanced corpora are
collections of texts sampled
from various genres to
represent average language
use. A famous example is the
Brown Corpus, which includes
texts from 15 major genres such
as newspapers, religious and
professional literature, popular
science, fiction, business prose,
scientific writing, and more.
Representativeness
Balanced corpora are collections of
texts from various genres to reflect
average language use. A well-known
example is the Brown Corpus, which
features texts from 15 key genres,
such as newspapers, literature, and
business prose.
TYPE OF BROWN CORPUS
The Brown Corpus is a generalised The Brown Corpus is a static
corpus. It includes a wide range of corpus. It was compiled at a specific
text types and genres, making it point in time (1961) and does not get
representative of general American updated with new texts.
English usage in the 1960s.
It is a monolingual corpus. The
All the texts in the Brown Corpus
texts are all in American English
are written samples.
The Brown Corpus is a synchronic The Brown Corpus is a native
corpus. It captures the state of corpus. It consists of texts
American English at a specific time (the written by native speakers.
year 1961).
APPLICATIONS
Brown corpus can be used in a variety of ways. Examples:
Different linguistic research
Educational Purposes
(frequency analysis, lexical studies
because it helps in studying (creating teaching materials
vocabulary usage, to examine for linguistics and language
syntactic structures and grammatical courses, using for student
patterns)
projects)
Natural Language Lexicography (search for
Processing authentic examples of
word usage and to identify
(training language
new words and meanings)
models)
www.reallygreatsite.com
The Brown Corpus, created in the early 1960s by
Conclusion
Henry Kučera and W. Nelson Francis at Brown
University, was the first large computerized corpus
of American English. Comprising 1 million words
from various genres published in 1961, it was
designed with careful attention to balance and
representativeness. By including texts from a wide
range of categories, the corpus set a standard for
linguistic research and enabled meaningful
analysis of language patterns. Its methodology for
compiling a diverse and representative sample of
texts has significantly influenced the field of
corpus linguistics.
Carole Tiberius, Ondřej Matuška, Iztok Kosem and Vojtěch Kovář (2022).
Introduction to Corpus-Based Lexicographic Practice. Version 1.0.0. DARIAH-
Campus. [Training module].
REFERENCES
https://campus.dariah.eu/id/50Ga8xo5uDgdrz2eIIifW
Practical Corpus Linguistics: An Introduction to Corpus-Based Language
Analysis / edited by Martin Weisser. USA: Wiley-Black well, 2016. 312 p.
EPRA International Journal of Multidisciplinary Research (IJMR) - Peer
Reviewed Journal Volume: 9| Issue: 4| April 2023|| Journal DOI:
10.36713/epra2013 || SJIF Impact Factor 2023: 8.224 || ISI Value: 1.188
“ENGLISH CORPORA MAKING: HISTORICAL OVERVIEW”
Henri Kauhanen. (2011-03-20) Research unit for variation, contacts and
change in English (The Standard Corpus of Present-Day Edited
American English (the Brown Corpus))
https://varieng.helsinki.fi/CoRD/corpora/BROWN/tags.html
Reinhard Rapp, Aix-Marseille Université Laboratoire d'Informatique
Fondamentale 163 Avenue de Luminy, 13288 Marseille, France “Using
https://www.sketchengine.eu/brown-corpus/
Thank you for your time