Discourse Analysis
Discourse Analysis
1.1 Defining text and discourse. What is Text Linguistics? What is Discourse
Analysis?
To define and describe the scope of study of Text Linguistics and Discourse Analysis and to establish
the differences between them both is not an easy task. Suffice it to say that the terms text and discourse
are used in a variety of ways by different linguists and researchers: there is a considerable number of
theoretical approaches to both Text Linguistics and Discourse Analysis and many of them belong to very
different research traditions, even when they share similar basic tenets.
In everyday popular use it might be said that the term text is restricted to written language, while
discourse is restricted to spoken language. However, modern Linguistics has introduced a concept of text
that includes every type of utterance; therefore a text may be a magazine article, a television interview, a
conversation or a cooking recipe, just to give a few examples.
Crystal (1997) defines Text Linguistics as “the formal account of the linguistic principles governing
the structure of texts”. De Beaugrande and Dressler (1981) present a broader view; they define text as a
communicative event that must satisfy the following seven criteria:
EE }
1) Cohesion, which has to do with the relationship between text and syntax. Phenomena such as
conjunction, ellipsis, anaphora, cataphora or recurrence are basic for cohesion.
ponle.it =
2) Coherence, which has to do with the meaning of the text. Here we may refer to elements of
linguistics ¥ knowledge or to cognitive structures that do not have a linguistic realization but are implied by
the language used, and thus influence the reception of the message by the interlocutor.
}
3) Intentionality, which relates to the attitude and purpose of the speaker or writer.
4) Acceptability, which concerns the preparation of the hearer or reader to assess the relevance or
¥7
usefulness of a given text.
Discourse 5) Informativity, which refers to the quantity and quality of new or expected information.
Analysis 6) Situationality, which points to the fact that the situation in which the text is produced plays a
crucial role in the production and reception of the message.
É 7) Intertextuality, which refers to two main facts: a) a text is always related to some preceding or
simultaneous discourse; b) texts are always linked and grouped in particular text varieties or
genres (e.g.: narrative, argumentative, descriptive, etc.) by formal criteria.
In spite of the considerable overlap between Text Linguistics and Discourse Analysis (both of them
are concerned with the notion of cohesion, for instance) the above criteria may help us make a distinction
between them.
Tischer et al. (2000) explain that the first two criteria (cohesion and coherence) may be defined as
text-internal, whereas the remaining criteria are text-external. Those approaches oriented towards ‘pure’
Text Linguistics give more importance to text-internal criteria, while the tradition in Discourse Analysis
has always been to give more importance to the external factors, for they are believed to play an essential
role in communication.
Some authors, such as Halliday, believe that text is everything that is meaningful in a particular
situation: “By text, then, we understand a continuous process of semantic choice” (1978:137). In the
“purely” text-linguistic approaches, such as the cognitive theories of text, texts are viewed as “more or
less explicit epi-phenomena of cognitive processes” (Tischer et al., 2000: 29), and the context plays a
subordinate role.
It could be said that the text-internal elements constitute the text, while the text-external ones
constitute the context. Schiffrin points out that all approaches within Discourse Analysis view text and
context as the two kinds of information that contribute to the communicative content of an utterance, and
she defines these terms as follows:
I will use the term “text” to differentiate linguistic material (e.g. what is said, assuming a verbal channel)
from the environment in which “sayings” (or other linguistic productions) occur (context). In terms of
utterances, then, “text” is the linguistic content: the stable semantic meanings of words, expressions, and
sentences, but not the inferences available to hearers depending upon the contexts in which words,
expressions, and sentences are used. […] Context is thus a world filled with people producing utterances:
people who have social, cultural, and personal identities, knowledge, beliefs, goals and wants, and who
interact with one another in various socially and culturally defined situations. (1994: 363)
Text in context
Thus, according to Schiffrin, Discourse Analysis involves the study of both text and context. One might
conclude, then, that Text Linguistics only studies the text, while Discourse Analysis is more complete
because it studies both text and context. However, as has been shown, there are definitions of text (like de
Beaugrande’s) that are very broad and include both elements, and that is why it would be very risky to
talk about clear-cut differences between the two disciplines. De Beaugrande’s (2002) definition of Text
Linguistics (herinafter TL) as “the study of real language in use” does not differ from many of the
definitions of Discourse Analysis (hereinafter DA) presented by Schiffrin within its functional approach,
some of which are the following:
The study of discourse is the study of any aspect of language use (Fasold, 1990: 65).
The analysis of discourse is, necessarily, the analysis of language in use. As such, it cannot be restricted to
the description of linguistic forms independent of the purposes or functions which these forms are designed
to serve in human affairs (Brown & Yule, 1983: 1).
Discourse… refers to language in use, as a process which is socially situated (Candlin, 1997: ix).
Thus, we see that the terms text and discourse are sometimes used to mean the same and therefore one
might conclude that TL and DA are the same, too. It can be said, nevertheless, that the tendency in TL
has been to present a more formal and experimental approach, while DA tends more towards a functional
Discourse
textlingvisH.es analysis
TL DA
approach. Formalists are apt to see language as a mental phenomenon, while functionalists see it as a
predominantly social one. As has been shown, authors like Schiffrin integrate both the formal and the
functional approaches within DA, and consequently, DA is viewed as an all-embracing term which would
include TL studies as one approach among others.
Slembrouck points out the ambiguity of the term discourse analysis and provides another broad
definition:
The term discourse analysis is very ambiguous. I will use it in this book to refer mainly to the linguistic
analysis of naturally occurring connected speech or written discourse. Roughly speaking, it refers to
attempts to study the organisation of language above the sentence or above the clause, and therefore to study
larger linguistic units, such as conversational exchanges or written texts. It follows that discourse analysis is
also concerned with language use in social contexts, and in particular with interaction or dialogue between
speakers. (2005:1)
Another important characteristic of discourse studies is that they are essentially multidisciplinary, and
therefore it can be said that they cross the Linguistics border into different and varied domains, as van
Dijk notes in the following passage:
…discourse analysis for me is essentially multidisciplinary, and involves linguistics, poetics, semiotics,
psychology, sociology, anthropology, history, and communication research. What I find crucial though is
that precisely because of its multi-faceted nature, this multidisciplinary research should be integrated. We
should devise theories that are complex and account both for the textual, the cognitive, the social, the
political and the historical dimension of discourse. (2002: 10)
Thus, when analyzing discourse, researchers are not only concerned with “purely” linguistic facts;
they pay equal or more attention to language use in relation to social, political and cultural aspects. For
this reason, discourse is not only within the interests of linguists; it is a field that is also studied by
communication scientists, literary critics, philosophers, sociologists, anthropologists, social psychologists,
political scientists, and many others. As Barbara Johnstone puts it:
… I see discourse analysis as a research method that can be (and is being) used by scholars with a variety of
academic and non-academic affiliations, coming from a variety of disciplines, to answer a variety of
questions. (2002: xi)
As noted above, not all researchers use and believe in the same definition of text and discourse. In
this book, we are going to adopt the general definition of DA as the study of language in use, and we
shall follow Schiffrin in including both text and context as parts of discourse, in which case we will
consider the term text in its narrow sense, not in the broad sense that could place it on a par with the term
discourse.
I
• Cognitive Linguistics
• Sociolinguistics
• Pragmatics
• Text Linguistics
• Discourse Analysis
All these new disciplines are interrelated, and sometimes it is very difficult to distinguish one from the
other, due to the fact that all of them have common denominators. Bernárdez (1999: 342) explains the
basic tenets of these disciplines, which are summarized here as follows:
tenetsotall disciplinas
|
a) Language only exists in use and communication. It always fulfils certain functions in human
interaction.
b) Language use is necessarily social.
c) Language is not autonomous. It shares some characteristics with other social and cognitive
phenomena.
d) The description of language must account for the real facts of language. It should not postulate
hidden entities only motivated by the needs of the formal system utilized.
e) Linguistic structures should be closely linked to the conditions of language use.
f) Language is natural and necessarily vague and inaccurate; therefore any prediction can only be
probabilistic.
When performing DA, then, researchers may also engage themselves in Functional Grammar,
Sociolinguistics, Pragmatics or Cognitivism, because all these fields are interrelated and have common
tenets. As regards TL and DA, we may speak of a progressive “integration” of both disciplines, for, if we
observe the evolution of language research through time, it will be noticed that many scholars have
moved from TL into DA as part of the natural flow of their beliefs and ideas, as is the case with van Dijk,
who, in his biographical article of 2002, explains how his research evolved from Text Grammar to
Critical Discourse Analysis2. This author points out that the main aim of his studies in the 1970s was to
give an explicit description of the grammatical structure of texts, and the most obvious way of doing so
was by accounting for the relationship among sentences. A very important concept for Text Grammar at
that time was the introduction of the notion of macrostructure (van Dijk, 1980). Another fundamental
notion was that of coherence and the idea that texts are organized at more global descriptive levels than
that of the sentence. Later on, and under the influence of the cognitive theories, the notion of strategic
understanding was developed, which attempted to account for what the users of a language really do
when they understand a given text. Van Dijk also notes how several other new concepts were introduced
in TL studies, such as socio-cultural knowledge and mental models (Johnson-Laird, 1983), as well as all
the ideas and concepts coming from the field of Pragmatics. In his particular case, he took interest in the
study of power and ideology, which places him within the DA stream-of-thought known as Critical
Discourse Analysis3.
Thus, after the early and uniform stage of “Text Grammar”, TL went through a series of more open
and diversified stages. The “textuality” stage emphasized the global aspects of texts and saw the text as a
functional unit, larger than the sentence. This stage led into the “textualization” or “discourse processing”
stage, where analysts “set about developing process models of the activities of discourse participants in
interactive settings and in ‘real time’” (de Beaugrande, 1997: 61-62).
The current aim now in DA is to describe language where it was originally found, i.e. in the context of
human interaction. In this respect, it is important to point out that this interaction often involves other
media besides language. Examples of these other semiotic systems may be gesture, dance, song,
photography or clothing, and it is also the discourse analyst’s job to explain the connection between these
systems and language. In order to achieve these aims, different researchers have taken different
approaches. We now turn to them.
t
1) Anything beyond the sentence
2) Language use
3) A broader range of social practice that includes non-linguistic and non-specific instances of
language. (2001: 1)
Authors such as Leech (1983) and Schiffrin (1994) distinguish between two main approaches: 1) the
formal approach, where discourse is defined as a unit of language beyond the sentence, and 2) the
functional approach, which defines discourse as language use. Z. Harris (1951, 1952) was the first
linguist to use the term discourse analysis and he was a formalist: he viewed discourse as the next level in
2
Another example can be found in de Beaugrande (1997: 68) when he comments on how his concepts of text and
discourse evolved over a series of studies and expanded beyond the linguistic focus he first encountered.
3
This approach is presented and studied in Chapter 10.
a hierarchy of morphemes, clauses and sentences. This view has been criticized due to the results shown
by researchers like Chafe (1980, 1987, 1992), who rightfully argued that the units used by people in their
speech can not always be categorized as sentences. People generally produce units that have a semantic
and an intonational closure, but not necessarily a syntactic one.
Functionalists give much importance to the purposes and functions of language, sometimes to the
extreme of defending the notion that language and society are part of each other and cannot be thought of
as independent (Fairclough, 1989; Focault, 1980). Functional analyses include all uses of language
because they focus on the way in which people use language to achieve certain communicative goals.
Discourse is not regarded as one more of the levels in a hierarchy; it is an all-embracing concept which
includes not only the propositional content, but also the social, cultural and contextual contents.
As explained above, Schiffrin (1994) proposes a more balanced approach to discourse, in which both the
formal and the functional paradigms are integrated. She views discourse as “utterances”, i.e. “units of
linguistic production (whether spoken or written) which are inherently contextualized” (1994: 41). From
this perspective, the aims for DA are not only sequential or syntactic, but also semantic and pragmatic.
Bodily hexis
Within the category of discourse we may include not only the “purely” linguistic content, but also
sign language, dramatization, or the so-called ‘bodily hexis’ (Bordieu, 1990), i.e. the speaker’s disposition
or the way s/he stands, talks, walks or laughs, which has to do with a given political mythology. It can
thus be concluded that discourse is multi-modal because it uses more than one semiotic system and
performs several functions at the same time.
Wetherell et al. (2001) present four possible approaches to DA, which are summarized as follows:
1. The model that views language as a system and therefore it is important for the
researchers to find patterns.
2. The model that is based on the activity of language use, more than on language in
itself. Language is viewed as a process and not as a product; thus researchers focus
on interaction.
3. The model that searches for language patterns associated with a given topic or
activity (e.g. legal discourse, psychotherapeutic discourse, etc.).
4. The model that looks for patterns within broader contexts, such as “society” or
“culture”. Here, language is viewed as part of major processes and activities, and as
such the interest goes beyond language (e.g. the study of racism or sexism through
the analysis of discourse).
In spite of these categorizations, it would not be unreasonable to say that there are as many approaches
to discourse as there are researchers devoted to the field, for each of them proposes new forms of analysis
or new concepts that somehow transform or broaden previous modes of analysis. However, it would also
be true to say that all streams of research within the field are related to one another, and sometimes it is
difficult to distinguish among them. Precisely with the aim of systematizing the study of discourse and
distinguishing among different ways of solving problems within the discipline, different traditions or
schools have been identified. It would be impossible to embrace them all in only one work, and for that
reason, in this book we are only going to concentrate on the main ideas and practices within some of the
best-known schools, which are the following:
1. Pragmatics ( Chapter 3)
2. Interactional Sociolinguistics (Chapter 4)
3. Conversation Analysis (Chapter 5)
4. The Ethnography of Communication (Chapter 6)
5. Variation Analysis and Narrative Analysis (Chapter 7)
6. Functional Sentence Perspective (Chapter 8)
7. Post-structuralist Theory and Social Theory (Chapter 9)
8. Critical Discourse Analysis and Positive Discourse Analysis (Chapter 10)
9. Mediated Discourse Analysis (chapter 11)
A common characteristic of all these schools of thought is that they do not focus on language as an
abstract system. Instead, they all tend to be interested in what happens when people use language, based
on what they have said, heard or seen before, as well as in how they do things with language, such as
express feelings, entertain others, exchange information, and so on. This is the main reason why the
discipline has been called “Discourse Analysis” rather than “language analysis”.
These are just a few examples reflecting the concerns of discourse analysts, but they are sufficient to
demonstrate that researchers in DA are certainly concerned with the study of language in use. As
students/readers progress through the different chapters of this book, they will encounter several other
examples of possible DA areas of interest.
It is worth noting that, as Johnstone (2002) remarks, the discipline is called discourse analysis (and
not, for instance, “discourseology”) because it “typically focuses on the analytical process in a relatively
explicit way” (2002: 3). This analysis may be realized by dividing long stretches of discourse into parts
or units of different sorts, depending on the initial research question, and it can also involve looking at the
phenomenon under study in a variety of ways, by performing, for instance, a given set of tests.
Thus, discourse analysts have helped (and are helping) to shed light on how speakers/writers organize
their discourse in order to indicate their semantic intentions, as well as on how hearers/readers interpret
what they hear, read or see. They have also contributed to answer important research questions which
have lead, for instance, to the identification of the cognitive abilities involved in the use of symbols or
semiotic systems, to the study of variation and change, or to the description of some aspects of the
process of language acquisition.
In order to carry out their analyses, discourse analysts need to work with texts. Texts constitute the
corpus of any given study, which may consist of the transcripts of a recorded conversation, a written
document or a computerized corpus of a given language, to name a few possibilities. The use of corpora
has become a very widespread practice among discourse researchers, and for that reason it is necessary
for any discourse analyst to acquire some basic knowledge of how to handle the data and how to work
with corpora. Chapter 2 is devoted to this enterprise.
1. The terms text and discourse have been –and still are– used ambiguously, and they are defined in
different ways by different researchers. In this book we are going to use the term text to refer to the
‘purely’ linguistic material, and we are going to consider discourse in a broader sense, defining it as
language in use, composed of text and context.
2. Text Linguistics and Discourse Analysis share some basic tenets and, while some authors make a
distinction between them, others use both terms to mean the same. However, it may be said that
“purely” Text Linguistic studies are more concerned with the text-internal factors (i.e. cohesion
and coherence), while Discourse Analysis focuses its attention more on the text-external factors,
without disregarding the text-internal ones. The history of these disciplines shows that research has
evolved, in many cases, from the narrower scope of Text Grammar (and later, Text Linguistics) into
the broader discipline of Discourse Analysis, and therefore both disciplines have merged. For this
reason and for clarifying and practical purposes, we shall consider DA as a macro-discipline that
includes several sub-approaches, among which the ‘purely’ text-linguistic ones can also be found.
3. In this book we are going to touch on the main theoretical and practical tenets of the following
traditions identified within discourse studies: Pragmatics, Conversation Analysis, Interactional
Sociolinguistics, Ethnography of Communication, Variation Analysis and Narrative Analysis,
Functional Sentence Perspective, Post-structural and Social Theory, Critical Discourse
Analysis/Positive Discourse Analysis and Mediated Discourse Analysis.
4. In order to learn about a given discipline, it is useful to look at what practitioners do. Discourse
analysts explore the language of face-to-face conversations, telephone conversations, e-mail
messages, etc., and they may study power relations, the structure of turn-taking, politeness strategies,
the linguistic manifestation of racism or sexism, and many, many other aspects of language in use.
The sky is the limit.
5. Discourse analysts are interested in the actual patterns of use in naturally-occurring texts. These
natural texts, once transcribed and annotated, are known as the corpus, which constitutes the basis for
analysis. Thus, discourse analysts necessarily take a corpus-based approach to their research
16120
Choose the answer that best suits the information given in Chapter 1.
:
1. Modern Linguistics has introduced a concept of text that…
a) is very restrictive.
b) includes all types of utterances.
c) includes only written discourse.
÷
4. The tradition in Discourse Analysis has always been to…
a) give more importance to the text-external criteria of intentionality, acceptability, informativity,
situationality and intertextuality.
b) give more importance to the text than to the context.
c) consider context as playing a subsidiary role.
:
7. The tendency in Text Linguistics has been to…
a) present a more formal approach than that of Discourse Analysis.
b) present a more functional approach than that of Discourse Analysis.
c) be less formal than any other approach.
Í
10. Discourse studies are…
a) restricted to the field of Linguistics.
b) devoted mainly to social phenomena.
c) essentially multidisciplinary.
✓
13. The current and main aim in Discourse Analysis is to…
a) study the formal aspects of texts.
b) discover the functions of language.
\
c) describe language in the context of human interaction.
:
16. Discourse is multi-modal because it…
a) embodies one semiotic system.
b) includes laughter in its study.
c) uses more than one semiotic system.
:
19. In general, we may say that discourse analysts are…
a) only interested in different types of conversations.
b) not interested in the written language.
c) mainly concerned with the study of language in use.
A) READING: After reading the contents of this chapter, Choose ONE of the following chapters
from books on Discourse Analysis and read it:
Thus, when it comes to data collection, our goals will guide us in the
selection process and they are likely to lead us to choose different
procedures, such as recording and transcribing spoken discourse, keying
texts in, scanning, using texts which are stored in machine-readable form,
downloading material from the internet, etc.
1Taylor explains that, in the most idealized form, naturally occurring talk “would
probably refer to informal conversation which would have occurred even if it was
not being observed or recorded, and which was unaffected by the presence of the
observer and/or recording equipment” (2001: 27).
rather than with data collected by means of research interviews 2. Some
analysts include information about the text, such as genre, date and
place of publication, etc. Others include information about the
pronunciation and intonation patterns, or about the speakers (sex, age,
occupation, social class, etc.). They can also assign labelled brackets to
each constituent of a sentence (parsing) or signal some features of spoken
language such as laughter, interruptions or hesitations. In general, and as
Johnstone (2005:20) notes, it is crucial to be able to uncover the many
ways in which texts are shaped by contexts and the many ways in which
texts shape contexts”.
For the purpose of illustration, we will now examine the attempts made
by a few authors to annotate their data.
collection, by means of which the researcher initiates talk ‘about’ something and
conducts an interview for the specific purpose of the research. The interviewer
usually works with a prepared questionnaire or list of topics.
how to label them, whereas they will disagree on less clear cases. Hence
the ideal procedure would be for the analyst to start from a consensual set
of categories and only use his/her own for the cases in which there is no
agreement whatsoever. In order to do this, it is first necessary to examine
previous systems of annotation designed by other researchers.
Artist Female
55 42
Some important information about the speakers, necessary for analyzing their
discourse.
i
• systematically discriminable
• exhaustive
• systematically contrastive
• systematic
• predictable.
2.2.1.1. Notation used in the London Lund Corpus (Svartvik & Quirk,
1980)
Transcription conventions:
A) PROSODY: # End of Tone Group ^Yes Beginning of Tone
Group
Tones
Pitch
Stress
Pauses
B) SPEAKERS
A Speaker identity
(A) Speaker continues where s/he left off
A, B A and B
VAR Various speakers
? Speaker identity unknown
a (low case letter) Non-surreptitious speaker
The following data have been taken from D. Schiffrin and R. Lakoff’s
Data Packet for their “Discourse” class at Berkeley and Georgetown
Universities (Spring 1998). As will be noticed, this notation has its
peculiarities and is different from that used in the London Lund Corpus
above. For example, this author uses square brackets ([]) to signal speech
overlap, and a dot (.) to represent a falling intonation followed by a pause.
Debby: D Zelda: Z
When speech from B occurs during what can be heard as a brief silence
from A, then B’s speech is under A’s silence:
The examples in 2.2.1.1. and 2.2.1.2. show only two possible ways of
annotating corpora. Other authors have chosen different symbols or have
taken into account some other, additional, variables. For instance,
Jefferson (1979) marks the gaze of the speaker with a line above the
utterance and the gaze of the addressee with a line below it. The line
indicates that the interlocutor marked is gazing toward the other, while the
lack of a line indicates the absence of gaze. Commas are used to indicate
the dropping of gaze. Besides, some movements like head nodding are
marked when they occur:
Ann: ____________________________________
Karen has this new hou:se. en it’s got all this
Jefferson also marks applause by using strings of X’s with lower- (for
quiet applause) and uppercase (for loud applause) letters. In the following
example, the amplitude of the applause increases at the end:
Audience: xxxxxxxxxXXXXXXXXXXXXXXXXX
STEVE: I think it’s basically done damage to children. That what good
it’s done is outweighed by the damage
Corpora are excellent tools for discourse analysts, for they facilitate the
investigation of language in use. Studies of language use require empirical
analyses of large databases of authentic texts, a requirement that has been
possible to meet, obviously, thanks to the aid of corpus linguistics. Using
corpora allows researchers to analyze patterns of use, i.e. how some
linguistic features are used in association with other linguistic and non-
linguistic features. Linguistic and non-linguistic association patterns
interact; they are not independent (Biber et al., 1998). For instance, if we
consider the lexical associations for thin, skinny and slim, we can also
consider their distribution across different registers. Thus, corpus-based
studies aim at characterizing registers, dialects, etc. in terms of their
linguistic association patterns.
Although some scholars (especially generative grammarians) have
pointed to the limitations of corpus-based analysis (e.g. that it is limited to
samples of performance only, or that no corpus can contain information
about all areas of language), it cannot be denied that the use of corpora has
proved to present considerable advantages when analyzing discourse: it
has allowed researchers to deal with larger and more varied texts, bringing
about a reliability of analysis never reached before; it has enabled them to
make more objective and accurate descriptions of usage than would be
possible through mere introspection. It also allows them, for instance, to
come to reliable conclusions based on frequency of use of a given
linguistic feature or pattern, to make comparative analyses about usage in
different varieties, or to arrive at a total account of the linguistic features in
any of the texts contained in the corpus. And, most important of all, a
well-constructed general corpus can be an inexhaustible source of
hypotheses about the way language works.
All the above advantages have been mainly made feasible thanks to the
construction, in modern times, of computerized corpora, which permit the
storage and analysis of a much greater number of natural language texts
than would be possible if we had to store and analyze them by hand.
However, the first large corpus of English-language data was entirely
transcribed by hand and stored on index cards which were processed
manually. This corpus was originally known as the Survey of English
Usage, a project which started in the 1960s and which consisted of a
million words comprising 200 texts of spoken and written material of
5,000 words each. The whole survey has now been computerized, and is
currently known as the London-Lund Corpus 4.
The first computerized corpus in the history of linguistics was the Brown
University Corpus of American English. It was created in the 1960s by
Henry Kucera and W. Nelson Francis, and it aimed to represent a wide
range of genres of published written text in American English produced
during a single year. The Lancaster-Oslo/Bergen (LOB) Corpus of British
English was compiled in the 1970s to match the Brown corpus using
British English texts.
Ever since the 1980s, increasingly large corpora have been compiled
(especially of English) and are used in different fields, such as in the
development of natural language processing software and in applications,
including lexicography, machine translation, speech recognition, etc.
Three examples of modern corpora are The British National Corpus
(BNC), The International Corpus of English (ICE) and The Bank of
English. Some online corpora can be found, such as the Experimental BNC
Website (which offers a BNC online service allowing everyone with
access to the internet to register for an account on the BNC server) or the
Shakespeare Online Corpus. In addition, researchers can now benefit from
concordance programs, i.e. programs which turn the electronic texts into
databases which can be searched. Some examples of these programs are
the Word Cruncher (which you get, for example, when you buy the
ICAME corpora of modern and medieval English), TACT (a well-known,
freeware program), SARA (specifically made for searches of the BNC) and
WordSmith Tools (a program widely used by linguists, lexicographers and
discourse analysts nowadays. It offers several possibilities, such as
querying, searching for word combinations within a specified range of
words, looking up substrings or parts of words, or accessing collocates and
frequency lists).
4 See 2.2.1.1.
samples of both). Reich (1998) offers the following taxonomy, which
classifies corpora according to medium, national varieties, historical
variation, geographical/dialectal variation, age, genre, open-endedness
and availability:
l
• Medium: spoken corpora (eg. London-Lund corpus) vs. written
corpora (e.g. Lancaster Oslo/Bergen corpus (LOB)) vs. mixed
corpora (British National Corpus (BNC) or Bank of English)
• National varieties: British corpora (e.g. Lancaster Oslo/Bergen
corpus) vs. American corpora (e.g. Brown corpus) vs. an
international corpus of English.
• Historical variation: diachronic corpora (Helsinki corpus, cf.
the ICAME home page) vs. synchronic corpora (Brown, LOB,
BNC) vs. corpora which cover only one stage of language
history (corpus of Old or Middle English, Shakespeare corpora)
• Geographical variation/dialectal variation: corpus of dialect
samples (e.g. Scots) vs. mixed corpora (The BNC spoken
component includes samples of speakers from all over Britain)
• Age: corpora of adult English vs. corpora of child English
(English components of CHILDES)
• Genre: corpora of literary texts vs. corpora of technical English
vs. corpora of non-fiction (e.g. news texts) vs. mixed corpora
covering all genres
• Open-endedness: closed, unalterable corpora (e.g. LOB,
Brown) vs. monitor corpora (Bank of English)
• Availability: commercial vs. non-commercial research corpora,
online corpora vs. corpora on ftp servers vs. corpora available
on floppy disks or CD-ROMs
This taxonomy takes into account most of the types of corpora which are
currently available, but it is not entirely comprehensive. Other variables
might be considered depending on the research aims, which might bring
about new types.
12115
Choose the answer that best suits the information given in Chapter 2.
:
1. The type of discourse the analyst is going to study depends
mainly on…
a) the research question.
b) what the researcher likes to do.
c) how the data are collected.
:
3. Downloading material from the internet…
a) may be a procedure for data collection.
b) is a method of Discourse Analysis.
c) is the best method for data collection.
4. A transcript is…
a) a process of data collection.
b) a document that reflects the spoken discourse to be analyzed.
c) a complex type of discourse research.
:
5. Transcriptions …
a) are always completely neutral and objective.
b) try to show the different variables that intervene in the discourse
studied.
c) always include contextual factors.
6. Each analyst…
a) uses various notation systems.
b) uses the notation system that best suits his/her objectives.
c) includes tone groups in the notation used.
:
7. Transcription conventions…
a) should always be explained and made clear to the reader.
b) should always be used in the same way by all researchers.
c) should be different for each study.
:
10. Corpus-based analysis …
a) normally uses both quantitative and qualitative techniques of
analysis.
b) does not normally have an empirical nature.
c) is always theoretical in nature.
í
13.
a)
b)
c)
Concordance computer programs…
turn electronic texts into talk.
transform the texts into databases that can be searched.
are not used much by linguists nowadays.
::
14. The BNC is a corpus…
a) that has been classified in terms of a national variety.
b) of spoken British English.
c) showing mainly historical variation.