Lexis, Collocations and
Grammar
By
Mohammed El Amine
What are the basic units of
language (grammar)?
Traditional answer is that words (or
morphemes) are the basic units. (This is what
we teach in Intro Linguistics)
Understanding the question. “Basic” has
several meanings. When we join pieces of
language to make (or understand) an
utterance, what are those pieces?
= building blocks (in use)
= units of storage (mental representation)
Pre-Chomskyan approaches
Interest in the sequence of words
Information theory -- using transition
probabilities to predict “the next word”
E.g. Why don’t you come and __
E.g. Why __
What follows why ?
why I 2
why he 3
why interest 2
why investors 2
why it 6
why prices 2
why shouldnص t 3
why the 17
why they 7
why we 13
What follows why?
why I 2 3.5
why he 3 5
why interest 2 3.5
why investors 2 3.5
why it 6 10
why prices 2 3.5
why shouldnص t 3 5
why the 17 29
why they 7 12
why we 13 23
Chomsky changed the focus
Chomsky said that local dependencies
between words were not so interesting
Long distance dependencies --
agreement -- wh- words had to be
explained
Creativity of language had to be
explained
Generative view
Basic units of syntax are words
Rules (or constraints or principles) control the
combination of words to create grammatical
sentences
This approach explains the creative aspect of
language -- the creation of novel sentences --
and the ability of a finite grammar to produce
an infinite number of sentences
Generative assumption (Pinker
1999)
A grammar assembles words into phrases
according to the words’ part-of-speech
categories, such as noun and verb. … By
specifying a string of kinds of words rather
than a string of actual words, the rules allow
us to assemble new sentences on the fly and
not regurgitate preassembled cliches—and
that allows us to convey unprecedented
combinations of ideas.
Pinker’s view
Syntax involves combination lexical and
syntactic categories, N, NP, V, etc.
No links between words themselves
Mental representation of
language
Basic units -- words
Operations on those words -- syntax
Knowledge of language consists of “rules”
(grammar/syntax) and words (vocabulary)
An elegant system that accounts for several
aspects of human language
Well-known that there are idioms, set
expressions etc. that are outside this system
Basic phrase structure
(simplified)
Series of rules
S --> NP VP
NP --> Det N/Pronoun/Proper Noun
VP --> V
VP --> V NP
VP --> V NP PP
Etc.
Plus subcategorisation -- hit ___ NP
Plus selection restrictions
Problems for the traditional
view
View works well if you use intuition data that
distinguishes the set of sentences that are
grammatical from the set of sentences that
are ungrammatical
However, corpus data (usage data) presents
problems because language does not seem to
be based on combinations of words, but
rather on many fixed/semi-fixed expressions.
Corpus data
evidence of extensive multi-word units
various names: collocations, fixed
expressions, chunks, pre-fabricated
units (prefabs), lexical bundles
let us consider simple versions of fixed
expressions:
Corpus data: collocates of
high
106 21.8557% high school
33 6.8041% high standards
27 5.5670% high level
23 4.7423% high stakes
20 4.1237% high priority
17 3.5052% high end
14 2.8866% high on
12 2.4742% high and
9 1.8557% high schools
9 1.8557% high levels
8 1.6495% high degree
7 1.4433% high expectations
6 1.2371% high quality
6 1.2371% high importance
5 1.0309% high probability
Corpus data
thing: sort of thing, kind of thing, the thing to
do, the thing is
change: change in attitude, change of attitude,
change of heart, change in policy, change over
time, time for a change, pace of change, rate of
change, subject to change
Note: focus is on spoken language -- primary
object of study in linguistics. Variety of
expression is, of course, greater in written
language
Claim
an expression like change of heart is a
unit
it is used as a single phrase, not
constructed from the component words
must be learned as a phrase
has its own idiomatic meaning
frequent co-occurrence of component
words leads to unit status
Claim
much of the language we speak is
made up of units larger than the word
the words making up the phrases are
accessible, but in normal use, the
phrase or chunk is the basic unit
How to adapt to chunks
Generative approach
Pre-constructed tree fragments
Richer lexical structure
make … decision
Mini-summary
Basic units of language are chunks and
words
Chunks are stored in the mental lexicon
and are used in language production
and comprehension
Words and chunks are concatenated to
produce utterances
Chunk assumption (Bolinger
1961)
Is grammar something where speakers
produce (i.e. originate) constructions, or
where they reach for them from a pre-
established inventory, when the
occasion presents itself?
Refinements
In the expression change of heart, we have
both the chunk change of heart and the
words change, heart, of
A lot of redundancy in storage of language
Also change of heart can be analysed by
speakers as Noun Prep Noun
Ability to analyse chunks into components
and to form more abstract categories
Pattern Grammar --
Hunston and Francis
patterns v n n, v adj, v amount, n of n
for each pattern different meaning
groups are identified.
E.g., there are three structures for v
amount and for each one there are
several meaning groups -- handout
Pattern grammar
Sentence or utterance consists of
adjacent or overlapping patterns
“I wanted to ensure that you could
send me a university award form”
V … to-inf V … that V … n ……..n
Schema approach
Related to schema theory in reading
Schemas have a form and a meaning
and they are linked to form a network.
[change of heart] -- meaning
Abstract schema [N of N] -- abstract
meaning
Internal structure analysed to get
“complete change of heart”
Problem
Most chunks are variable
Language consists of semi-fixed
expressions
Does this mean that we have to take
words as the basic building block
An alternative -- take chunks as the
basic building block and allow
modification via blending
Blending
Blending of lexical chunks provides the
creative, constructional aspect of
language (within a grammar that
consists of a massive inventory of
chunks)
Blending
Blending is a general cognitive process
involving the merger of formal and
conceptual structures to produce new
structures (Turner and Fauconnier 1995,
Fauconnier and Turner 1996, and
Turner 1996).
What is blending?
a series of words --- a b x y z ---
traditional view is that the basic units
are [a][b][x][y][z]
or [a b y] which is merged with [x _ z]
or [a b x w z] with y replacing w
Latter two are instances of blending
Blends
Blending in words is usually visible:
automobilia (memorabilia and automobile)
digerati (literati and digital)
smog
Blending in syntax is more difficult to detect
Claim: creativity is mainly due to blends of
chunks rather than to a rule system
Evidence for blending
Very difficult
No evidence of mental representation or
the inputs to language production
(lexical blends are obvious -- smog)
Look for evidence
historical change
idiom evidence
unusual syntax
Historical change
(1) a. Both sides claimed the
victory(1722)
b. These instances of kindness
claim my most grateful
acknowledgments.(1775)
Historical change
(2) a. Because a Council of the other Side
asserted it was coming down.(1712)
b. As they confidently assert that the
first inhabitants of their Island were fairies, so
do they maintain that these little people have
still their residence among them.(1726-31)
Historical change
(3) a. He claimed that his word should be
law.(1850)
b. Watt claimed that Hornblower …
was an infringer upon his patents.(1878)
c. I claim that we are before them in the
matter of uncapping machines [for
honeycombs].(1886)
d. He was afraid to bet and crawfished
out of the issue by claiming that he didn't
drink.(1888)
Idioms
(4) S [make hay while the sun shines]
make hay while the sun shines
take advantage of favourable conditions
make hay chunk combined with description
of current situation
Simple combinations
(5) a.“We have got to make hay while
the sun shines,” he said.
b. Long-shot Oscar nominees often
try to make hay while the sun shines,
lining up as many projects as possible
between the announcement of their
nomination and probable
disappointment on Oscar night.
Variations
(6) Could it be that Raymond Blanc, with
his recently acquired three Michelin stars, is
making hay while the sun shines?
(7) VP [make [hay] [while [the sun
[shines]]]]
(8) the big food groups are opening selected
stores 24 hours a day in a bid to make hay
while consumer confidence continues to
improve.
(9) a. Opponents of the Tory Right have made
hay since the Conservative leadership election last
month.
b. During his life, they made hay. No piece
detailing the marriage of the millionaire chairman
of Hanson and the former model Miss Tucker was
complete without carping references to the four
decades separating them.
c. … This field is wide open to Labour. If
Tony Blair cannot make hay in such political
sunshine, how will he fare when winter comes?
make hay
(10) … one of the areas where the group has
made hay in recent years.
(11) However, they survived to make hay
against far less experienced operators.
Butcher hit 20 fours and a six in his …
(12) The GOP intends to make hay with
whatever sunshine the committee provides.
Odd syntax -- double copula
(13) a. So the thing is, is that's the kind of level
of comparison that you get, like it or like it not,
at the fourth grade level.
b. My point is, is that their objection is a
red herring.
(14) Tuggy (1996: 715) suggests a blend of
structures such as a and b
a. So the thing is, it's not a diagnostic.
b. Yes, the important thing is that the long
informational would be way too much for them.
Odd syntax II
(15) So thanks very much to my
Steering Committee.
Blending evidence –
(Hunston and Francis 1999.)
[die of N], 3519 [expire of N], 6
[warn of N], 330 [foretell of N], 5
18 instances of [attempt + ing]
Paul did not attempt qualifying for
Wimbledon;
8 instances of [confess +ing]
…any officer who confesses being corrupt;
Blends (Reported in Moon
1998)
(16) a. I stuck my neck out on a limb
b. In one ear and gone tomorrow
(Peters 1983:106)
(17) a. It’s no sweat off our backs
b. something along those veins
c. How would you like to eat
humble crow? (Tannen 1989:41)
Schema-based grammar
How are the basic units of grammar
combined?
Answer: use of a discourse framework,
concatenation, and blending
Composition
formulation of general discourse using
large chunks as markers/signposts
content expressed by schema
combination and modification
a lexicon-grammar consisting of a large
number of prefabricated chunks can be
used creatively if blending or
modification of the large units occurs
Composition
Discourse units
It’s one thing to ----
on the one hand …
at the end of the day
in other words,
as far as X is concerned
the fact/truth is that …
I have to say …
Sentence units – blended/modified schemata
Composition: discourse
planning
= “sentence”content
= discourse anchor
(e.g., it’s one thing to X it’s another Y)
Composition: Sentence
production
Planning --
= schema (chunk)
= discourse anchor
Composition: Sentence
production
blended/modified
Planning -- schema
= schema (chunk)
= discourse anchor
Composition: Sentence
production
Planning --
= schema (chunk)
= discourse anchor
Composition: Sentence
production
Planning --
= schema (chunk) = patch
= discourse anchor
Overview
Corpus analyses show that there are
many many chunks or collocations in
languages
Suggestion -- chunks are stored in the
mental grammar
To communicate about a new situation,
chunks are modified via a process of
blending
Thank you!