Andrew Carnie-Syntax
Andrew Carnie-Syntax
Citation
Huang, C-T. James, and Ian Roberts. "Principles and Parameters of Universal Grammar." In The
Oxford Handbook of Universal Grammar, 306-354. Oxford, UK: Oxford University Press, 2016.
Published Version
doi:10.1093/oxfordhb/9780199573776.013.14
Permanent link
https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37367394
Terms of Use
This article was downloaded from Harvard University’s DASH repository, and is made available
under the terms and conditions applicable to Other Posted Material, as set forth at http://
nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Accessibility
OUP UNCORRECTED PROOF – FIRSTPROOFS, Fri Aug 12 2016, NEWGEN
Chapter 14
Principl e s a nd
Paramet e rs of
U niversal G ra mma r
14.1 Introduction
The Principles and Parameters Theory (P&P), which took shape in the early 1980s,
marked an important step forward in the history of generative grammatical studies.1 It
offered a plausible framework in which to capture both the similarities and differences
among languages within a rigorous formal theory. It led to the discovery of important pat-
terns of variation across languages. Most important of all, it offered an explanatory model
for the empirical analyses which opened a way to meet the challenge of ‘Plato’s Problem’
posed by children’s effortless—yet completely successful—acquisition of their grammars
under the conditions of the poverty of the stimulus (see chapters 5, 10, 11, and 12).
Specifically, the P&P model led linguists to expand their scope of inquiry and enabled
them to look at an unprecedented number of languages from the perspective of the formal
theory of syntax, not only in familiar traditional domains of investigation; it also opened
up some new frontiers, at the same time raising new questions about the nature of lan-
guage which could not even have been formulated earlier. Another consequence was that
it became possible to discover properties of one language (say English) by studying aspects
of a distinct, genetically unrelated language (say, Chinese or Gungbe), and vice versa.
Most of the original proposals for parameters in the early days of P&P were of the
form that we would now, with the benefit of hindsight, think of as macroparameters.
They have the characteristic property of capturing the fact that parametric varia-
tions occur in clusters. As the theory developed, it became clear that such a model is
1
This work was partly supported by the ERC Advanced Grant 269752 Rethinking Comparative
Syntax (ReCoS), Principal Investigator: I. Roberts.
If everyone is omitted in (1a), the pronoun them cannot correspond to the clowns, while if
everyone is included, this is possible. If we simply change them to the reflexive pronoun
themselves, as in (1b), exactly the reverse results. In (1b), if everyone is included, the pro-
noun themselves must correspond to it. If everyone is left out, themselves must corre-
spond to the clowns. The point here is not how these facts are to be analyzed, but rather
the precision and the subtlety of the grammatical knowledge at the native speaker’s dis-
posal. It is legitimate to ask where such knowledge comes from.
Another striking case involves the interpretation of missing material, as in (2):
Here there is a notional gap following will, which we interpret as go to the party; this is
the phenomenon known as VP-ellipsis. In (3), we have another example of VP-ellipsis:
(3) John said he would come to the party, and Bill said he would—too.
Here there is a further complication, as the pronoun he can, out of context, correspond
to either John or Bill (or an unspecified third party). Now consider (4):
Here the gap is interpreted as loves his mother. What is interesting is that the missing
pronoun (the occurrence of his that isn’t there following does) has exactly the three-
way ambiguity of he in (3): it may correspond to John, to Bill or to a third party. Example
(4) shows we have the capacity to apprehend the ambiguity of a pronoun which we can-
not hear. Again, a legitimate and, it seems, profound question is where this knowledge
comes from.
The cases just discussed are examples of native grammatical knowledge. The basic
point in each case is that native speakers of a language constantly hear and produce
novel sentences in that language, and yet are able to distinguish well-formed sentences
from ill-formed ones and make subtle interpretative distinctions of the kind illustrated
in (4). The existence of this kind of knowledge is readily demonstrated and not in doubt.
But it raises the question of the origin: where does this come from? How does it develop
in the growing person? This is Plato’s problem, as Chomsky called it, otherwise known
as the logical problem of language acquisition (Hornstein and Lightfoot 1981). It is seen
as a logical problem because there appears to be a profound mismatch between the rich-
ness and intricacy of adult linguistic competence, illustrated by the examples given in
(1), and the rather short time taken by language acquisition coupled with small chil-
dren’s seemingly limited cognitive capacities.
This latter point brings us to the argument from the poverty of the stimulus. Here we
briefly summarize this argument (for a more detailed presentation, see chapter 10, a well
as Smith [1999:40–41], Jackendoff [2003:82–87], and, in particular, Guasti [2002:5–18]).
As its name implies, the poverty-of-the-stimulus argument is based on the observation
that there is a significant gap between what seems to be the experience facilitating first-
language acquisition (the input or ‘stimulus’) and the nature of the linguistic knowledge
which results from first-language acquisition, i.e., one’s knowledge of one’s native lan-
guage. The following quotation summarizes the essence of the argument:
The astronomical variety of sentences any natural language user can produce and
understand has an important implication for language acquisition … A child is
exposed to only a small proportion of the possible sentences in its language, thus
limiting its database for constructing a more general version of that language in its
own mind/brain. This point has logical implications for any system that attempts to
acquire a natural language on the basis of limited data. It is immediately obvious that
given a finite array of data, there are infinitely many theories consistent with it but
inconsistent with one another. In the present case, there are in principle infinitely
many target systems … consistent with the data of experience, and unless the search
space and acquisition mechanisms are constrained, selection among them is impos-
sible…. No known ‘general learning mechanism’ can acquire a natural language
solely on the basis of positive or negative evidence, and the prospects for finding
any such domain-independent device seem rather dim. The difficulty of this prob-
lem leads to the hypothesis that whatever system is responsible must be biased or
constrained in certain ways. Such constraints have historically been termed ‘innate
dispositions,’ with those underlying language referred to as ‘universal grammar’
(Hauser, Chomsky, and Fitch 2002:1576–1577).
Hence we are led to a biological model of grammar. The argument from the poverty of
the stimulus leads us to the view that there are innate constraints on the possible form
a grammar of a human language can take; the theory of these constraints is Universal
Grammar (UG). But of course it is clear that experience plays a role; no one is suggesting
that English or Chinese are innate. So UG provides some kind of bias, limit, or schema
for possible grammars, and exposure to people speaking provides the experience caus-
ing this latent capacity to be realized as competence in a given actual human language.
Adult competence, as illustrated for English speakers by the data such as that in (1–4), is
the result of nature (UG) and nurture (exposure to people speaking).
The P&P model is a specific instantiation of this general approach to Plato’s problem.
The view of first-language acquisition is that the child, armed with innate constraints on
possible grammars furnished by UG, is exposed to Primary Linguistic Data (PLD, i.e.,
people speaking) and develops its particular grammar, which will be recognized in a
given cultural context (e.g., in London, Boston, or Beijing) as the grammar of a particu-
lar language, English or Chinese. But it should be immediately apparent that London
and Boston, or Beijing and Taipei, are not linguistically identical. Concepts such as
‘English’ and ‘Chinese’ are highly culture-bound and essentially prescientific. An indi-
vidual’s mature competence, the end product of the process of first-language acquisi-
tion just sketched, is not really ‘English’ or ‘Chinese,’ but rather an individual, internal
grammar, technically an I-grammar. We use the terms ‘English’ or ‘Chinese’ to designate
different variants of I-grammar, but these terms are really only approximations (as are
more narrowly defined terms such as ‘Standard Southern British English’ or ‘Standard
Northern Mandarin Chinese,’ neither exactly corresponds to the I-grammar of Smith,
Roberts, Li, or Huang).
What makes the different I-grammars of, to revert to prescientific terms for con-
venience, English or Chinese? This is where the notion of parameters of UG comes in.
Since UG is an innate capacity, it must be invariant across the species: Smith, Roberts,
Li, and Huang (as well as Saito, Rizzi, and Sportiche) are all the same in this regard. But
these individuals were exposed to different forms of speech when they were small and
hence reached the different final states of adult competence that we designate as English,
Chinese, Japanese, Italian, or French. These cognitive states are all instantiations of UG,
but they differ in parameter-settings, abstract patterns of variation in the restricted set
of grammars allowed by UG. So, on the Principles and Parameters conception of UG,
differing PLD was sufficient to cause Roberts to set his UG parameters one way, so as
to become an English speaker, while Huang set his another way and became a Chinese
speaker, Saito another way, Rizzi still another, and so forth. On this view language acqui-
sition is seen as the process of fixing the parameter values left open by UG, on the basis
of experience determined by the PLD.
The P&P model is a very powerful model of both linguistic diversity and language
universals. More specifically, it provides a solution to Plato’s problem, the logical prob-
lem of language acquisition, in that the otherwise formidable task of language acquisi-
tion is reduced to a matter of parameter-setting. Moreover, it makes predictions about
language typology: parameters make predictions about (possible) language types, as
we will see in more detail in section 14.3 (see also c hapter 15). Furthermore, it sets the
agenda for research on language change, in that syntactic change can be seen as param-
eter change (see c hapter 18). Finally, it draws research on different languages together
as part of a general enterprise of discovering the precise nature of UG: we can discover
properties of the English grammatical system (a particular set of parameter values) by
investigating Chinese (or any other language), without knowing a word of English at all
(and vice versa, of course). Let us now begin to look at the progress that has been made
in this endeavor in more detail.
In this section, we will briefly review some notable examples of parameters that were put
forward in the first phase of research in the P&P model in the 1980s, using the general
framework of Government–Binding (GB) theory.
This parameter regulates one of the most pervasive and well-studied instances of cross-
linguistic variation: the variation in the linear order of heads and complements. Stated
as (5), it predicts that all languages will be either rigidly head-initial (like English, the
Bantu languages, the Romance languages, and the Celtic languages, among many oth-
ers) or rigidly head-final (like Japanese, Korean, the Turkic languages, and the Dravidian
languages). Of course many languages, including notably Chinese, show mixed, or dis-
harmonic word order, suggesting that (5) needs to be relativized to categories, a matter
we return to in section 14.4.1.
The simplest statement of this parameter is along the lines of (5), which assumes, as
was standard in GB theory, that linear precedence and hierarchical relations (defined
in terms of X′-theory) are entirely separate. In fact, X′-theory was held to be invari-
ant, a matter of UG principles (or deriving from UG principles), while linear order was
subject to parametric variation. Since Kayne (1994), other approaches to linearization
have been put forward (starting with Kayne’s Linear Correspondence Axiom), some
of which, like Kayne’s, connect precedence and hierarchy directly. The head param-
eter must then be reformulated accordingly. In Kayne’s (1994) approach, for example,
complement–head order cannot be directly generated, but must be derived by leftward
movement (in the simplest case, of complements). The parameter in (5) must therefore
be restated so as to regulate this leftward movement. Takano (1996), Fukui and Takano
(1998), and Haider (2012:5), on the other hand, propose that complement–head order
is the more basic option, with surface head–complement order being derived by head
movement. In that case, (5) may be connected to head movement (and the availability of
landing sites for such movement, according to Haider).
Since Rizzi (1986), it has been widely assumed that the null subject parameter involves
the ability of Infl (or T, or AgrS) to license a null pronoun, pro, and so can be stated as
in (6):
Perlmutter (1971) observed that languages that allow null subjects also allow wh-
movement of the subject from a finite embedded clause across a complementizer (this
observation has since become known as ‘Perlmutter’s generalization’). Rizzi (1982)
linked this to the possibility of so-called ‘free inversion,’ leading to the following para-
metric cluster:
Rizzi showed that Italian has all of these properties while English lacks all of them.
As with the head parameter, though, this cluster has empirical problems; see Gilligan
(1987), Newmeyer (2005), and section 14.4.1.
Huang argued that the understood object in each case is first topicalized before it
drops. This conception is supported by parallel facts in German. Thus, a similar ques-
tion about Lisi can be answered by either of (9) (see Ross 1982 and Huang 1984 for more
examples):
Note that the missing pronoun ihn ‘him’ (referring to Lisi) is licensed by virtue of
being in the first (hence topic) position, as witnessed by the ill-formedness of *ich
hab’ [e]schon gesehen where the topic position is filled by ich. The missing argument
is thus not licensed by any formal feature of T (as it is in the case of null subjects). The
Null Subject Parameter and the Null Topic Parameter thus jointly distinguish four
language types.
(See Raposo 1985 on European Portuguese, and Sigurðsson 2011b on Swedish and
Icelandic).
(11c) shows the standard, neutral SOV order of Japanese (cf. John-ga Bill-o butta ‘John hit
Bill’ [Baker 2001], while (11a) illustrates that in English the object wh-constituent what
is obligatorily fronted to the SpecCP position and (11b) illustrates wh-in-situ in SVO
Chinese. To be more precise, English requires that exactly one wh-phrase be fronted
in wh-questions. In multiple wh-questions, all wh-phrases except one stay in situ (and
there are intricate constraints on which ones can or must be moved, as well as how they
are interpreted in relation to one another). Some languages require all wh-expressions
to move in multiple questions. This is typical of the Slavonic languages, as (12), from
Bulgarian (Rudin 1988), shows:
There appears to be a further dimension of variation here (but see Bošković 2002 for a
different view).
(13) a. In configurational languages, the projection principle holds of the pair (LS, PS).
b. In nonconfigurational languages, the projection principle holds of LS alone.
Here ‘LS’ refers to Lexical Structure, a level of representation at which the lexical require-
ments of predicates are represented, and ‘PS’ refers to standard phrase structure. The
projection principle requires lexical selection (c-selection and/or s-selection) proper-
ties of predicates to be structurally represented. Hence in nonconfigurational languages,
according to this approach, phrase structure does not have to directly instantiate argu-
ment structure, with the consequence that arguments can be freely omitted, there are
no structural asymmetries among arguments and no syntactic operations ‘converting’
one grammatical function into another. Hale (1983) argued for a number of other conse-
quences of this parameter, focusing in particular on Warlbiri.
Baker’s parameter gives an elegant account of the major typological differences between
languages of the Mohawk type (known as head-marking nonconfigurational languages)
and those of the English/Chinese type.
Languages with this value for the nominal-mapping parameter include Chinese and
Japanese. Nominals appear as bare arguments in these languages as a direct conse-
quence of being of type <e>; hence count nouns can function directly as arguments with
no article or quantifier (giving the equivalent of I saw cat, meaning ‘I saw a/the cat(s)’).
Chierchia argues that all nouns have fundamentally mass denotations, so unadorned
nouns will have property (ii); more generally, there is no mass–count distinction in
these languages. Further, since mass nouns cannot pluralize, there is no plural marking
and, finally, special devices have to be used in order to individuate noun denotations for
counting; this is what underlies the classifier system.
In a [–arg, +pred] language, on the other hand, all nominals are predicates (type
<e.t>). It follows that bare nouns can never be arguments. This, as is well known, is the
situation in French and, with certain complications, the other Romance languages (see
Longobardi 1994). Such languages can have plural marking and lack classifiers.
Finally, [+arg, +pred] languages allow mass nouns and plurals as bare arguments, but
not singular count nouns, have plural marking, and lack classifiers. (Singular bare count
nouns can function as predicates as in We elected John president). This is the English,
and, more broadly, Germanic, setting for this parameter.
Method, has been developed in detail by Giuseppe Longobardi and his associates; see
in particular Gianollo, Guardiano, and Longobardi (2008), Longobardi (2003, 2005),
Colonna et al. (2010) and chapter 16, especially Figure 16.1.
14.4 Macroparameters,
Microparameters,
and Parametric Clusters
Polysynthesis Parameter in (14). Similarly, the DP/NP parameter, more recently proposed
by Bošković (2008), predicts that left branch extraction (as in Whose did you read book?),
adjunct extraction from NP, scrambling, adnominal double genitives, superlatives with a
‘more than half ’ reading, and other properties cluster together. This, as was first pointed
out in Chomsky (1981a), gives macroparameters their potential explanatory value: an
acquirer need only observe one of the clustering properties which express the param-
eter to get all the others ‘for free,’ as an automatic consequence of their UG-mandated
clustering. For example, merely recognizing that finite clauses allow definite pronominal
subjects not to appear overtly automatically guarantees that the much more recondite
property of long wh-extraction of subjects over complementizers is thereby acquired. In
this way, the principles and parameters approach brought biolinguistics and language
typology together; this point emerges particularly clearly when groups of parameters are
presented together as in Table 14.1 and, much more strikingly, Figure 16.1.
However, since the mid-1980s it has gradually emerged that there are problems with
the conception of macroparameters. These are of both a theoretical and an empirical
nature. On the empirical side, it has emerged that many of the typological predictions
made by macroparameters are not borne out. This is particularly clear in the case of the
Head Parameter which, formulated as in (5), predicts that all languages will be either
rigidly, harmonically head-initial or rigidly, harmonically head-final. It is, of course,
well known that this is not true: German, Mandarin, and Latin are all clear examples
of very well-studied languages which show disharmonic orders (and see Cinque 2013
for the suggestion that fully harmonic systems may be very rare). However, at the same
time it is not true that just anything goes: on the one hand, languages tend toward cross-
categorial harmony (as first shown in detail in Hawkins 1983; see also Dryer 1992 and
chapter 15); second, there appear to be general constraints on possible combinations
of head-initial and head-final structures (see for example Biberauer, Holmberg, and
Roberts 2014). Concerning the predictions made by the putative cluster associated with
the classical null subject parameter, see the extensive critique in Newmeyer (2005), and
the response in Roberts and Holmberg (2010). Similar comments could be made about
the other parameters listed in section 14.3.
From a theoretical point of view, there are two basic problems with macroparameters.
First, they put an extra burden on linguistic theory, in that they have to be stated some-
where in the model. The original conception of parameters as variable properties asso-
ciated with invariant UG principles dealt elegantly with this question, but most of the
parameters listed in section 14.3 do not seem to be straightforwardly formulable in this
way. Second, it is not clear why just these parameters are what they are; there is, in other
words, a certain arbitrariness in where variation may or may not occur which is not
explained by any aspect of the theory.
In short, macroparameters, while having great potential merit from the perspective of
explanatory adequacy, have often fallen short in descriptive terms by making excessively
strong empirical predictions. Moreover, there has been no natural intensional charac-
terization of the notion of what a possible macroparameter can be, rendering their theo-
retical status somewhat questionable.
14.4.2 Microparameters
Here the key theoretical proposal is the lexical parameterization hypothesis (Borer 1984;
Chomsky 1995b). This can be thought of, following Baker (2008a:3, 2008b:155–156), as
the ‘Borer–Chomsky conjecture,’ or BCC:
Here are some concrete, rather plausible, examples instantiating the schema in (17):
(18) a. T is [±φ];
b. N is ±Num;
c. T is ±EPP.
(18a) captures the difference between a language in which verbs inflect for person and
number, such as English (in a limited way) and most other European languages, on the
one hand, and languages like Chinese and Japanese, on the other, in which they do not.
This may have many consequences for the syntactic properties of verbs and subjects (cf.
the discussion of Japanese in Fukui 1986 mentioned in section 14.3.8). (18b) captures the
difference between a language in which number does not have to be marked on (count)
nouns, such as Mandarin Chinese, and one in which it does, as in English; this differ-
ence may be connected to the nominal mapping parameter (see section 14.3.7). (18c)
determines the position of the overt subject; in conjunction with V-to-T movement, a
negative value of this parameter gives VSO word order, providing a minimal difference
between, for example, Welsh and French (see McCloskey 1996; Roberts 2005).
This simplicity of formulation of microparameters, along with the general conception
of the BCC, should be compared to the theoretical objections to macroparameters dis-
cussed at the end of the previous section. It seems clear that microparameters represent a
theoretically preferable approach to the macroparametric one illustrated in section 14.3.
Fourth, the microparametric view allows us to put an upper bound on the set of gram-
mars. Suppose we have two potential parameter values per formal feature (i.e., each fea-
ture offers a binary parametric choice as stated in (17)), then we can define the quantity
n as follows:
It then follows that the cardinality of the set of parameter values |P| is 2n and the car-
dinality of the set of grammatical systems |G| is 2n. So, if |F| = 30, then |P| = 60 and
|G| = 230, or 1,073,741,824. Or if, following Kayne (2005a:14), |F| = 100, then |G| = 1,2
67,650,600,228,229,401,496,703,205,376. Kayne states that ‘[t]here is no problem here
(except, perhaps, for those who think that linguists must study every possible language)’
(2005a:14). However, one consequence is clear: the learning device must be able to
search this huge space very efficiently, otherwise selection among such a large range of
options would be impossible for acquirers (see c hapter 11, section 5, for the problems
that this kind of space poses for ‘search-based’ parameter-setting).
It may be, though, that the observation of this extremely large space brings to light a fatal
weakness of the microparametric approach. To see this, consider a thought experiment (var-
iants of this have been presented in Roberts 2001 and Roberts 2014). Suppose that at present
approximately 5,000 languages are spoken and that this figure has been constant throughout
human history (back to the emergence of language faculty in modern homo sapiens; see the
brief discussion of the evolution of language in chapter 1). Suppose further that every lan-
guage changes in at least one parameter value with every generation. Then, if we have a new
generation every 25 years, we have 20,000 languages per century. Finally, suppose that mod-
ern humans with modern UG have existed for 100,000 years, i.e., 1,000 centuries. It then fol-
lows that 20,000,000 languages have been spoken in the whole of human history, i.e., 107 × 2.
This number is 27 orders of magnitude smaller than the number of possible grammatical
systems arising from the postulation of 100 independent binary parameters.
While there are many problems with the detailed assumptions just presented (several
of them related to the Uniformitarian Principle, the idea that linguistic prehistory must
have been essentially similar to recorded linguistic history; see Roberts [forthcoming]
for discussion and a more refined statement of the argument), the conclusion is that,
if the parameter space is as large as Kayne suggests, there simply has not been enough
time since the emergence of the species (and therefore, we assume, of UG) for anything
other than a tiny fraction of the total range of possibilities offered by UG to be realized.
This implies that we could never know whether a language of the past corresponded to
the UG of the present or not, since the overwhelming likelihood is that these languages
could be typologically different from any language that existed before or since, perhaps
radically so. More generally, even with a UG containing just 100 independent param-
eters we should expect that languages appear to ‘differ from each other without limit
and in unpredictable ways’ in the famous words of Joos (1957:96). But of course, we can
observe language types, and note diachronic drift from one type to another.
We conclude that, despite the clear merits of the microparametric approach, it appears
that a way must be found to lower the upper bound on the number of parameters, on a
principled basis.
The exploratory program for linguistic theory known as the Minimalist Program (MP
henceforth) has as its principal goal to go ‘beyond explanatory adequacy,’ that is, beyond
explaining the ‘poverty of stimulus’ problem (see in particular Chomsky 2004a and
chapters 5, 6, and 10). This goal has both theoretical and empirical aspects. On the theo-
retical side, the goal is to fulfill the Galilean ideal of maximally simple explanation (see also
Chen-Ning Yang 1982). On the empirical side, the goal is to explain the ‘brevity of evolution’
problem. Estimates regarding the date of the origin of language vary widely, with anything
between 200,000 and 50,000 years ago being proposed (Tallerman and Gibson 2012:239–
245, and chapter 1). It is not necessary to take a precise view on the date of the origin of lan-
guage here, because anywhere within this range is a very short period for the development
of such a seemingly complex cognitive capacity. It seems that there has been little time for
the processes of random mutation and natural selection to operate so as to give rise to this
capacity, unless we view the origin of the language faculty as due to a relatively small set of
mutations which spread through a small, genetically homogeneous population in a very
short time (in evolutionary terms). Hence, from the biological or neurological perspec-
tive, the core properties of the language faculty must be rather few. Combining this with
the Galilean desideratum just mentioned, we then expect UG, at least the domain-specific
aspects of cognition which are essential to language, to be few and simple.
In trying to approach these goals, then, there has been an endeavor to reduce the ‘size,’
complexity, and the overall contents of UG; see Mobbs (2015) for an excellent discussion
and overview. One important conceptual shift in this direction was Chomsky’s (2005)
articulation of the three factors of language design. These are as follows:
From this perspective, many things which were previously attributed directly to UG as
principles of grammar can be ascribed to the third factor (see in particular chapter 6 for
discussion). Regarding the question of parametric variation, since there are few or no
UG principles to be parametrized along the earlier, GB-style lines, all parameters must
be stated as microparameters, and indeed in general the BCC has been the dominant
view of where parameters fit into a minimalist approach (see in particular Baker 2008b).
More generally, the nature of the rather speculative and, at least in principle, restric-
tive and programmatic proposals of the MP has meant in practice that there are numer-
ous empirical problems that have been known since the GB era or before which have
been largely left untouched. For example, many of the results of the intensive techni-
cal work on phenomena associated with Empty Category Principle in the GB era, par-
ticularly those developing the proposals in Chomsky (1986b), have not been carried
forward, in part because some of the mechanisms and notions introduced earlier have
been made unavailable (notably the various concepts of government, proper govern-
ment, head/lexical government, and antecedent government; see Huang 1982, Lasnik
and Saito 1984, 1992; Cinque 1991; and references given there).
To a degree, the GB notion of parameter, as summarized and illustrated in section
14.3, has suffered a similar fate. Traditional macroparameters cannot be stated within
Minimalist vocabulary, and so all parametric variation must be seen as microparametric
variation, stated as variations in the nature of formal features of individual functional
categories. So the ‘traditional’ macroparameters are completely excluded as such. This,
combined with the empirical problems associated with clusters discussed in section
14.4.1, has led many to conclude that the entire P&P enterprise should have been aban-
doned (see especially Boeckx 2014), although no clear alternative proposals for how to
deal with synchronic and diachronic linguistic diversity have emerged.
So the question that arises is whether macroparameters really exist, and if so, how
they can be accommodated in a minimalist UG. Furthermore, as our brief discussion
of microparameters at the end of the previous section shows, given the large number
of microparameters based on individual formal features, a question we have to ask is
whether Plato’s Problem arises again. How can the acquirer search a space containing
1,267,651 trillion trillion possible grammars in the few years of first-language acquisition
(see chapter 11 on the question of searching the grammatical space, and chapter 12 on the
time-course of first-language acquisition)? Do we not risk sacrificing the earlier notion
of explanatory adequacy in our attempt to go beyond it?
Perhaps surprisingly, these questions have not been at the forefront of theoretical
discussion in the context of the MP. Nonetheless, some interesting views have been
articulated recently. Here we will briefly discuss those of Kayne (2005a, 2013, i.a.), Baker
(2008b), Gianollo, Guardiano and Longobardi (2008), Holmberg (2010b), Roberts and
Holmberg (2010) and Biberauer and Roberts (2012, 2015a,b, forthcoming).
Kayne (2005a, 2013, i.a.) emphasizes the fact that there is no doubt as to the existence of
microparameters. The particular value of this approach lies in the idea that, in looking very
carefully at very closely related languages or dialects (e.g., the Italo-Romance varieties), we
detect many useful generalizations that would not have been visible on a macroparametric
It might also be that all ‘large’ language differences, e.g. polysynthetic vs. non-(cf.
Baker (1996)) or analytic vs. non-(cf. Huang 2010 [=2013]), are understandable as
particular arrays built up of small differences of the sort that might distinguish one
language from another very similar one, in other words that all parameters are micro-
parameters [emphasis added].
This last idea was developed by Roberts (2012); see also the discussion of Biberauer and
Roberts (2015a,b, forthcoming) later in this section.
Baker (2008a,b) argues for the need for macroparameters in addition to microparam-
eters. He argues that certain macroparameters go a long way towards reducing the range
of actual occurring variation:
The strict microparametric view predicts that there will be many more languages that
look like roughly equal mixtures of two properties than there are pure languages,
whereas the macroparametric-plus-microparametric approach predicts that there
will be more languages that look like pure or almost pure instances of the extreme
types, and fewer that are roughly equal mixtures (Baker 2008b:361).
On the other hand, the macroparametric view predicts, falsely, rigid division of all
languages into clear types (head-initial vs. head-final, etc.). Regarding this possibility,
Baker comments (2008b:359) that ‘[w]e now know beyond any reasonable doubt that
this is not the true situation.’
Baker further observes that, combining macroparameters and microparameters,
we expect to find a bimodal distribution: languages should tend to cluster around one
type or another, with a certain amount of noise and a few outliers from either one of the
principal patterns. And, as he points out, this often appears to be the case, for example
regarding the correlation originally proposed by Greenberg (1963/2007) between verb–
object order and preposition–object order. The figures from the most recent version of
The World Atlas of Language Structures (WALS) are as follows (these figures leave aside a
range of minority patterns such as ‘inpositions,’ languages lacking adpositions, and the
cases Dryer classifies as ‘no dominant order’ in either category):
It is very clear that here we see the kind of normal distribution predicted by a combi-
nation of macro-and microparameters. Baker therefore concludes that the theory of
comparative syntax needs some notion of macroparameter alongside microparam-
eters. He also makes the important point that many macroparameters could prob-
ably never have been discovered simply by comparing dialects of Indo-European
languages.
Gianollo, Guardiano, and Longobardi (2008) propose a distinction between param-
eters themselves, construed along the lines of the BCC, and hence microparameters,
and parameter schemata (see also chapter 16, section 9). On this view, UG makes avail-
able a small set of parameter schemata, which, in conjunction with the PLD, create the
parameters that determine the non-universal aspects of the grammatical system. They
suggest the following schemata, where in each case F is a formal feature of a functional
head, lexically encoded as such in line with the BCC:
(In fact, Kayne argued for this view in the 1980s; see Uriagereka 1998:539.) Roberts
and Holmberg (2010:53) combine these last two ideas and suggest that the existence of
parameter variation, and in fact the parameters themselves, are emergent properties,
resulting from the three factors of language design given in (20). They propose that, for-
mally, parameters involve generalized quantification over formal features, as in (23):
(24) reads ‘For some feature D, D is a sublabel of finite T’ (where ‘sublabel’ is understood
as in Chomsky 1995b:268).
On this view, UG does not even provide the parameter schemata. As Roberts and
Holmberg put it:
The role of the second and third factors is developed and clarified in Roberts (2012)
and, in particular, in Biberauer and Roberts (2012, 2015a,b, forthcoming), summarizing
and developing earlier work (see the references given). The third factor principles are
seen as principles manifesting optimal use of cognitive resources, i.e., general computa-
tional conservativity. In particular, the following two acquisition strategies are proposed:
(25) (i) Feature Economy (FE) (see Roberts and Roussou 2003:201):
Postulate as few formal features as possible.
(ii) Input Generalization (IG) (see Roberts 2007:275):
Maximize available features.
The effect of parametric variation arises from this interaction of PLD and FE/IG with
the underspecification of the formal features of functional heads in UG. In further work,
Biberauer (2011) in fact suggests that the formal features themselves may represent
emergent properties, with UG contributing merely the general notion of ‘(un)interpret-
able formal feature’ rather than an inventory of features to be selected from; see also
Biberauer and Roberts (2015a). This clearly represents a further step towards general
minimalist desiderata of overall simplicity, as well as arguably going beyond explana-
tory adequacy.
This emergentist approach has two interesting consequences. One is that it leads to
the postulation of a learning path along the following lines: acquirers will always by
default postulate that no heads bear a given feature F; this maximally satisfies FE and
IG. Once F is detected in the PLD, IG requires that that feature is generalized to all rel-
evant heads (of course this violates FE, but PLD will defeat the third-factor strategies).
As a third step, if a head which does not bear F is detected, the learner retreats from the
maximal generalization and postulates that some heads bear F. This creates a distinction
between the set of heads bearing F and its complement set, and the procedure is iter-
ated for the subset (this procedure is very similar to Dresher’s [2009, 2013] Successive
Division Algorithm, as well as learning procedures observed in other domains, as
Biberauer and Roberts [2014] show in detail).
Related to the NO>ALL>SOME procedure is a finer-grained distinction among
classes of parameters (originating in Biberauer and Roberts 2012), as follows:
Biberauer & Roberts (2015b) illustrate and support these distinctions in relation to para-
metric changes in the history of English.
It is clear that the kinds of parameters defined in (26) fall into a hierarchy.
Beginning with Roberts and Holmberg (2010) and developing through Roberts
(2012), Biberauer and Roberts (2012, 2014, 2015a,b, forthcoming) and numerous
references given there (notably, but not only, Biberauer, Holmberg, Roberts and
Sheehan [2014] and Sheehan [2014, to appear]; see also the references at http://
recos-dtal.mml.cam.ac.uk/papers). One advantage of parameter hierarchies is that
they reduce the space of possible grammars created by parameters by making certain
parameter values interdependent; see Biberauer, Holmberg, Roberts, and Sheehan
(2014) for more discussion. We will return to some further implications of parameter
hierarchies in section 14.9.
So we see that the change in theoretical perspective brought about by the MP does
not, in itself, invalidate the aims, methods, or the results achieved in the GB era, nor is it
inconsistent with P&P theory, once parameters are seen as points of underspecification
in UG, with other aspects of parametrization resulting from the interaction of UG so
conceived with the second and third factors.
In what follows, we give a case study of parametric variation both within varieties of
Chinese (synchronically and diachronically), and between (mostly Mandarin) Chinese
and English. This case study is intended to provide empirical support for the following
claims and proposals:
In the next three sections, we will develop and support each of Points A-C in turn.
(i) Chinese has light-verb constructions where English has (typically denominal)
unergative intransitives:
(27) a. Chinese: da yu ‘do fish’, da dianhua ‘do phone’, da penti ‘do sneeze’ …
b. English: to fish, to phone, to sneeze …
(iii) Chinese typically has compound and phrasal accomplishment verbs, where
English has simple verbs:
(29) a. Chinese: da-po ‘hit-broken,’ nong-po ‘make broken,’ ti-po ‘kick-broken,’ etc.
b. English: break, etc.
(vii) Chinese has wh-in-situ (instead of overt wh-movement), cf. (11a,b), repeated here:
(11) a. What did John eat twhat ?
b. Hufei chi-le sheme (ne)
Hufei eat-asp what Qwh
‘What did Hufei eat?’
(34) Reciprocals:
a. They each criticized the other(s).
b. They criticized each other.
c. Tamen ge piping-le duifang.
they each criticize-PERF other
d. *Tamen piping- le bici.
they criticize-PERF each-other
On the other hand, in Chinese adverbs equivalent to English fast can only modify the
verb, not the derived noun (see Lin and Liu 2005):
(36) a. Zhangsan shi yi-ge (da zi) da-de hen kuai de daziyuan.
Zhangsan be one-CL (type) type very fast DE typist
‘Zhangsan is a typist who types very fast.’
b. *Zhangsan shi yi-ge hen kuai de daziyuan.
Zhangsan be one-CL very fast DE typist.
Regarding adjectival modification, in English (37) is ambiguous (see Cinque 2010 for
extensive discussion):
This example is ambiguous between the reading ‘Jennifer is beautiful and a singer,’ and
‘Jennifer sings beautifully.’ In Chinese, on the other hand, these two readings must be
expressed by quite different structures, in the one case with hen piaolang (‘very beauti-
ful’) modifying ‘singer,’ in the other case with it modifying ‘sing’:
(x) Chinese has no equivalents of English articles (although it has the equivalents of
numeral one and demonstrative this, that).
(xi) Chinese lacks ‘coercion’ in the sense of Pustejovsky (1995). In English, a sentence
like (39a) can be understood, depending on the context and what we know about John,
as any of (39b–d):
On the other hand, in Chinese the equivalent of (39a) is ungrammatical; the implicit
subordinate verb must be overtly expressed (see Lin and Liu 2005):
This phenomenon is commonly attributed to the ‘strong’ nominal nature of the relative-
clause CP and TP (see Ochi 2001 and references there, among many others). In Chinese
the subject cannot bear genitive case:
(xv) Chinese shows a series of syntax–semantics mismatches (see Huang 1997 et seq).
One famous case is when a pseudo-noun incorporation construction is separated by a
low adverbial after the verb is raised:
(45) ta chi-le yi-ge zhongtou (de) fan, hai mei chi-bao.
he eat-PERF one-CL hour (*’s) rice, still not finish
‘He ate for a whole hour, and is still not done.’
(Literally: He ate a whole hour’s rice, and is still not done with eating.)
(xvi) Chinese has analytic passivization, with the so-called ‘bei passive’ being
somewhat akin to the English get-passive. Instead of employing passive morphology
that intransitivizes an active transitive verb, Chinese forms a passive by superimposing
a semi-lexical verb bei (whose meaning approximates ‘undergo’) on the main predicate
without passivizing the latter:
(46) Zhangsan bei [Lisi qipian-le liang ci]
Zhangsan bei Lisi deceived two time
‘Zhangsan got twice deceived by Lisi.’
The important thing to observe here is the clustering of these sixteen properties in
Chinese to the exclusion of them in English. (Other properties could be added to
this list, including those related to argument structure, as argued in Huang 2006 for
Mandarin resultatives, and in Lin [2001 et seq.] on noncanonical subjects and objects;
see also Barrie and Li 2015 for related discussion.) Some of these properties have pre-
viously been attributed to macroparameters (e.g., the Wh-Movement Parameter and
Nominal Mapping Parameters mentioned in section 14.3), but the degree of clustering
shown here had not been observed prior to Huang (2005, 2015) and indicates a macro-
parameter of high analyticity; following Huang (2005, 2015) this macroparameter can
be opposed to Baker’s Polysynthesis Parameter (in fact, in terms of the Biberauer and
Roberts-style NO>ALL>SOME learning path/parameter hierarchy, they can be seen
as representing the two extreme NO vs. ALL options for some UG-underspecified
property; we develop this idea below in section 14.11). So this is a clear case of macro-
parametric clustering.
(i) OC lacks light verbs, but instead has denominalized unergative intransitives: yu
‘to fish’ (instead of da yu);
(ii) OC lacks pseudo-incorporation: fan ‘have rice’ (instead of chi fan ‘eat rice’);
(iii) OC has simplex accomplishments: po ‘break’ (instead of da-po ‘make break’);
(iv) OC does not have overt classifiers for count nouns: san ren ‘3 persons,’ er yang
‘two sheep’ (see Peyraube 1996 among others);
(v) OC does not have overt localizers, as illustrated in the famous line from the
Confucian Analects (Peyraube 2003, Huang 2009 for other examples):
(47) 八侑舞於庭,是可忍也,孰不可忍也?(論語:八侑)
bayu wu yu ting, shi ke ren ye, shu bu ke ren ye?
8x8 dance at hall this can tolerate Prt, what not can tolerate Prt
bayou, not bayu
(Analects: Bayou)
‘To hold the 8x8 court dance in his own court, if this can be tolerated, what else
cannot be tolerated?’
Note yu ting ‘in the court,’ instead of yu ting-zhong ‘at court’s inside.’
(viii) OC relatives involve operator movement of a relative pronoun, the particle suo:
(50) 魚, 我所欲也;熊掌,亦我所欲也。(孟子:告子)
yu, wo suo yu ye; xiongzhang, yi wo suo yu ye.
fish, I which want Prt; bear-paw, also I which want Prt
(Mancius: Gaozi)
‘Fish is what I want; Bear paws are also what I would like to have.’
(xiii) OC allows extensive use of gerundive constructions with genitive subjects, again
close up
revealing the nominal nature of the embedded CP:
(55) 寡人之有五子,猶心之有四支。(晏子.內篇諫上) close up, replace
guaren zhi you wu zi, you xin zhi you si dot
zhi.with colon
Self Gen have five son, like heart Gen have four support
‘My having five sons is like the heart’s having four supports.’
We observe the same clustering of properties in OC that distinguish this system en bloc
from MnC. In fact, OC seems to pattern consistently like English regarding these prop-
erties, and against MnC. Again, this clustering is macroparametric.
We conclude, on the basis of the evidence presented in this and the preceding section
that macroparametric variation exists. Therefore our theory of variation must capture
these kinds of clusterings of properties.
(56) yi ben shu ‘one classifier book’ → yi ben shu ‘classifier book’
The dialects of Chinese vary as to the syntactic positions which allow for this kind of
deletion. In Mandarin, it is allowed in object position but not subject position:
This looks rather similar to the distribution of bare nominals in European languages: Italian
allows them in object but not subject position (Longobardi 1994): *Latte è buono/Qui si beve
latte (‘Milk is good/Here one drinks milk’); Germanic allows them in both positions: Milk
is good/I drink milk; French doesn’t allow them in either position: *Lait est bon/*Je bois lait
(equivalent to the English examples just given). There may thus be a parallel between the
incidence of bare nominals in European languages and the incidence of classifier stranding
in Chinese varieties. Clearly this observation merits further explanation.
Second, dialects differ in the extent to which they make use of postverbal suffixes.
Mandarin has some aspectual suffixes (e.g., the progressive zhe, the perfective le, and
the experiential guo). Cantonese has a considerably more elaborate system, employing
additional postverbal suffixes like saai, dak, and ngaang for expressions of exhaustivity,
exclusivity, and obligation (see Tang 2006:14–15):
Some of the suffixes may stack, indicating the considerable height of the verb, for exam-
ple with the exhaustive on the experiential:
On the other hand, TSM is much more restricted. While the experiential kuei may argu-
ably be a suffix in TSM as it is in Mandarin, the cognates of Mandarin progressive zhe
and perfective le are not. Instead, the progressive and the perfective are rendered with
preverbal auxiliaries, an analytic strategy:
Third, the dialects vary regarding their verb–object order preferences (see Liu 2002;
Tang 2006). Mandarin allows both OV or VO orders, while Cantonese is strongly VO
and TSM strongly OV. The following patterns of preference are typical:
(63) a. Cantonese: ngo tai-zo (bun) syu. ??ngo (bun) syu tai-zo.
I read-Perf CL book I CL book read-Perf
‘I have read the book.’ ‘??I the book have read.’
b. Mandarin: wo kan-le shu le. wo shu kan-le.
I read-Perf book SFP I book read-Perf-SFP
c. TSM: ??gua khoann-kuei tshe a. gua tshe khoann-kuei a.
I read-Exp book SFP I book read-Exp SFP
Fourth, there is variation regarding the position of the motion verb qu ‘go’ (see Lamarre
2008). Corresponding to the English sentence ‘Zhangsan went to Beijing,’ Mandarin
allows both the ‘analytic’ strategy (64a) and the ‘synthetic’ strategy (64b):
Cantonese allows only the synthetic strategy, whereas Pre-Modern Chinese (as illus-
trated in textbooks used during Ming–Qing dynasties) allows only the analytic strategy.
Assuming that (64b) is derived by V-movement to a null light verb position other-
wise occupied by dao in (64a), this pattern shows that V–v movement is obligatory in
Cantonese, optional in Mandarin, but did not take place in Pre-Modern Chinese.
We conclude then that there is clear empirical evidence from varieties of Chinese that,
alongside macroparameters of the kind illustrated in the previous section, microparam-
eters also exist, with varying (but lesser) degrees of clustering. We will see more exam-
ples of microparameters in section 14.10.5.
The idea that macroparameters are not primitive aspects of UG, but rather derive from
more primitive elements, was first suggested in Kayne (2005a:10). It is also mentioned
by Baker (2008b:354n2). However, it has been developed in various ways in recent
work, starting from Roberts and Holmberg (2010) and Roberts (2012), by Biberauer and
Roberts (2012, 2014, 2015a,b, forthcoming), Biberauer, Holmberg, Roberts, and Sheehan
(2014), Sheehan (2014, to appear); see again the references at http://recos-dtal.mml.cam.
ac.uk/papers.
On this view, macroparameters are seen as aggregates of microparameters with cor-
relating values: a macroparametric effect arises when a group of microparameters act
together (clearly, meso-parameters, as in (26), can be defined in a parallel fashion).
Hence macroparameters are in a sense epiphenomenal; each microparameter that
makes up a macroparameter falls under the BCC, limiting variation to formal features
of functional heads.
The microparameters act in concert for reasons of markedness, related to the gen-
eral conservatism of the learner, and therefore arguably to the third factor (see
chapter 6). The two principal markedness constraints are Feature Economy and Input
Geeralization, as given in (25), repeated here:
(25) (i) Feature Economy (FE) (see Roberts and Roussou 2003:201):
Postulate as few formal features as possible.
(ii) Input Generalization (IG) (see Roberts 2007:275):
Maximize available features.
Together these constitute a minimax search and optimization strategy: assume as little
as possible and use it as much as possible. As Biberauer and Roberts (2014) show, there
are analogs to this strategy in phonology (Dresher 2009, 2013) and in other cognitive
domains (see in particular Jaspers 2012). Note also that IG generalizes the known to the
unknown, and so can be seen as a form of bootstrapping. The interaction of FE and IG
give rise to the NO>ALL>SOME learning path described in section 14.5. We can now
present that idea in a more precise fashion as follows (see also Biberauer, Holmberg,
Roberts, and Sheehan 2014:111):
Here h designates functional heads, and F is the predicate ‘feature-of,’ so F(h) means
‘formal feature of a head H.’ As we have said, the procedure in (65) says that acquirers
first postulate NO heads bearing feature F. This maximally satisfies FE and IG. Then,
once F is detected in the PLD, that feature is generalized to ALL relevant heads, satisfy-
ing IG but not FE. This step, in other words the operation of the third-factor strategy
IG, gives rise to clustering effects, i.e., aggregates of microparameters acting in concert
as macroparameters. The existence of macroparameters and clustering, and therefore
many large-scale typological generalizations such as the tendency towards harmonic
word order, or high analyticity as in MnC, follows from the interaction of the three fac-
tors in language design in a way which is entirely compatible with both the letter and the
spirit of minimalism. This establishes Point B in section 14.5.
The idea of a hierarchy of parameters was first put forward in Baker (2001:170). Baker
suggested a single hierarchy, and, while his specific proposal had some empirical prob-
lems, the proposal had two principal merits, both of which are intrinsic to the concept of
a hierarchy. First, it forces us to think about the relations among parameter settings, both
conceptually in terms of how they interact in relation to the architecture of the grammar
(do we want to connect parameters of stress to parameters of word order, for example?
See c hapter 12 for relevant discussion in relation to first language acquisition), how they
interact logically (it is impossible to have inflected infinitives in a system which lacks
infinitives, for example), and empirically on the basis of typological observations (e.g.,
to account for the lack of SVO ergative languages, as observed by Mahajan 1994, among
others). Second, parameter hierarchies can restrict the space of possible grammars, and
hence reduce the predicted amount of typological variation and simplify the task for a
search-based learner (see chapter 11). Given a hierarchical approach, the cardinality of
G, the set of grammars, is equivalent to the cardinality of P, the set of parameters, plus 1,
to the power of the number of hierarchies. So, if, for example, there are just 5 hierarchies
with 20 parameters each. Then |G| is 215, or 4,084,101 for 5 × 20 = 100 possible choice
points. Compared to 2100, this is a very small number, entailing the concomitant simpli-
fication of the task of a search-based learner (see again chapter 11, section 6).
Roberts and Roussou (2003:210–213) suggested organizing the following set of
options relating to a given formal feature F on the basis of their proposal that grammati-
calization is a diachronic operation affecting functional categories:
(66) F? (formal feature?)
No yes
No yes
No Yes
(head-initial) (head-final)
No Yes No Yes
Synthesis Polysynthesis
Notice how this hierarchy derives the four traditionally recognized morphological types
(Sapir 1921). It also connects analyticity and head-initiality on the one hand, and aggluti-
nation and head-finality on the other (see also Julien 2002 on the latter).
Gianollo, Guardiano, and Longobardi (2008, see chapter 16) developed the Roberts
and Roussou approach¸ and, as we have seen, introduced the very important idea that
the parameters are not primitives of UG, but created by the hierarchies (‘schemata’ in
their terminology). Roberts and Holmberg (2010) proposed two distinct hierarchies
for word order and null argument phenomena, and Roberts (2012) and Biberauer,
Holmberg, Roberts, and Sheehan proposed three more, dealing with word structure
Sheehan (2014, to appear) shows that a hierarchy of this kind applies to F an inherent
Case feature of v (for ergativity), F a feature of Appl (causatives/ditransitives) and F a
feature of Voice (passives; see Sheehan and Roberts 2015). Other hierarchies have been
proposed for Person, Tense, and Negation (on the latter, see Biberauer 2011).
These hierarchies are empirically successful in capturing wide typological varia-
tion of both the macro-and microparametric kind (for example, Sheehan and Roberts’
passive hierarchy covers Yoruba, Thai, Yidiɲ, Turkish, Dutch, German, Latin, Danish,
Norwegian, Hebrew, Spanish, French, English, Swedish, Jamaican Creole, and Sami).
As already mentioned, this hierarchical organization of the elements of parametrization
reduces the potential number of options that a child has, thereby easing the learning
procedure. Hence, Plato’s problem is solved.
It is important to emphasize that the macroparameters, and the parameter hier-
archies, are not primitives: they are created by the interaction of FE and IG. UG’s role
is reducible to a bare minimum: it simply leaves certain options open. In this way, we
approach the minimalist desideratum of moving beyond explanatory adequacy (see
chapters 5 and 6). Note also that if Biberauer’s (2011, 2015) proposal that the formal fea-
tures themselves are emergent properties resulting from the interaction of the three fac-
tors is adopted, then a still further step is taken in this direction.
We now illustrate these ideas concretely, taking the variation discussed in section 14.6
in Modern Chinese, Old Chinese, and Modern Chinese dialects as case studies.
(2005) critique. Second, such a view makes use of concepts unavailable in the theoretical
vocabulary of a minimalist grammar: what are the features [±analytic], [±synthetic]?
While it may have been possible to countenance such features in GB, it is against the
spirit, and arguably the letter, of minimalist theorizing.
DPEA v’
v NP
DO telephone
DO fish
DO peel
DPEA v’
v VP
CAUSE
… break
And for the transitive version of denominal verbs, we have a further CAUSE head above
vP as in (70), for the transitive feed, i.e., ‘EA causes IA to do food/eat’:
(70) vP
DPEA v’
v VP
CAUSE DPIA V’
V NP
DO
food
English v may be in the form of a phonetically null light verb DO or CAUSE, which
are assumed to have the following properties: they both have formal features which
need to Agree, do not contain EPP, and do trigger head movement (these properties
may all be connected in terms of the general approach to head movement developed
in Roberts 2010d). Head movement equates to synthesis, and English abounds in
simplex denominal verbs like telephone, fish, peel and simplex causatives like break
or feed.
In Modern Chinese, on the other hand, v is occupied by an overt light verb such as da
for an unergative or a ‘cognate verb’ for pseudo-incorporation:
(71) vP
DPEA v’
v NP
da dianhua ‘telephone’
da yu ‘fish’
bo ‘peel pi ‘skin’
nian ‘read’ shu ‘book’
For causatives, either an inchoative verb combines with a light/cognate verb to form a
compound (rather than moving into a null v forming a simplex causative):
(72) vP
DPEA v’
v VP
da/nong
‘do/make’ … po
‘break’
Or we have a periphrastic causative, with heavy verbs like shi ‘cause,’ rang ‘let,’ and
so forth.
(73) vP
DPEA v’
v VP
DPIA V’
rang
let
V NP
chi
eat fan
rice
Unlike English, Chinese does not have the phonetically null CAUSE and DO. Instead,
it resorts to lexical (light or heavy) verbs which do not trigger head movement (though
they may trigger compounding), leading to high analyticity. Instead of simplex
denominalized action verbs or simplex causatives, Chinese resorts to more complex
expressions, and abounds in light verb constructions, pseudo-incorporation, resul-
tative compounds or phrases, and periphrastic causatives. The high analyticity of
Chinese derives from the absence of incorporation into the abstract DO and CAUSE.
These labels are really shorthand for certain event-and θ-role-related features of v,
whose exact nature need not detain us here; these features are lexically instantiated in
Chinese by verbs such as da and rang which, as lexical roots in this language, repel head
movement.
Let us now look at how IG can give rise to macroparametric clustering. By IG, if v
can attract a head, then, all other things being equal, n, a, and p also have that property
(this represents the unmarked option as it conforms to IG). Chinese has lexical classi-
fiers, nominal localizers, an adjectival degree marker, and (discontinuous) prepositions,
while English generally has such categories in null or affixal form. So high analyticity
generalizes across all the principal lexical categories in Chinese.
Looking at the specific cases, Chinese count nouns are formed by an overt ‘light noun’
(i.e., a classifier):
By IG, the light noun does not trigger head movement, so ben shu is the Chinese
‘count noun,’ i.e., an analytic ‘count noun phrase.’ On the other hand, English count
nouns are formed by incorporating the noun root into an empty CL-head (see
Borer 2005):
By IG, CL has a formal feature that Agrees, has no EPP and triggers head movement, so
the count noun is synthetic.
As we saw in section 14.6, Chinese forms locational NPs with overt localizers (see also
Biggs 2014):
The word nali means ‘place.’ Here too there is no head movement and so the locative
expression is analytic in the sense we have defined. English forms such NPs by incorpo-
rating silent PLACE (see Kayne 2005b):
Chinese adjectives have lexical hen (‘very’), which marks absolute degree: hen hao
(‘very good’). Kennedy (2005, 2007) proposes treating a gradable adjective as being
headed by a Deg0 in the form of covert pos, e.g., [DegP pos [AP happy]], which we may
think of as HEN, the covert counterpart of hen. English adjectives incorporate into null
HEN and are synthetic, but Chinese adjectives do not incorporate but remain analytic.
The Deg0 head hen or HEN turns a state adjective into a degree word, which is then able
to combine with comparatives and superlatives, much as a classifier turns a mass or kind
into a count noun so it can be combined with a number word. (See Dong 2005 and Liu
2010 for relevant discussions.)
Chinese complex PPs take a ‘discontinuous’ form:
Again, this is an analytic construction. English complex PPs are formed by incorpora-
tion, as can be fairly transparently seen in some cases, e.g., beside:
(80) Zhangsan [ASP PERF] [VP zuotian qu-le Kaohsiung] (PERF Agrees with le)
Zhangsan yesterday go-le Kaohisung
‘Zhangsan went to Kaohisung yesterday.’
English T and Asp heads are similar to Mandarin in this respect. These clausal heads
are functional, they enter into Agree with the inflected verb and they do not attract the
inflected verbs:
(81) John [T TNS] [VP often kisses Mary in the kitchen] (TNS Agrees with kisses)
In Romance languages, as has been well known since Pollock (1989), T and Asp attract
lexical verbs (see Schifano 2015 for an extensive analysis of verb movement across a
range of Romance languages, which effectively supports this conclusion, with some
important provisos). Thus, while English is synthetic in the v-domain, it is not syn-
thetic in the T-domain: only some, but not all, Fs trigger head movement (in this respect
English may be more marked than either Romance or Mandarin; Biberauer, Holmberg,
Roberts, and Sheehan [2014:126] arrive at the same conclusion comparing English to
other languages). Chinese is more consistently analytic than English is synthetic; hence
it is less marked in this regard than English.
14.10.4 Old Chinese
Let us turn now to Old Chinese, looking first at the lexical domain. In this domain,
Old Chinese is similar to English (as we observed in section 14.6.2). Like English, Old
Chinese possessed null DO and null CAUSE as higher lexical heads (both reconstructed
as *s-by Tsu-Lin Mei [1989, 2012] and references given there) which trigger head move-
ment (see also Feng [2005, 2015] for extensive other examples of head movement in
OC). This gives rise to the following properties:
(83) a. No overt classifiers for count nouns (no need for ‘light noun’);
b. No need for overt localizers (no need for ‘light noun’).
Turning now to the clausal functional heads, Old Chinese TP differs from Modern
Chinese in the nature of at least one clausal functional head (probably more than one)
in the TP region, immediately below the subject. Let us call this FP (possibly standing
for focus phrase). F has an unvalued feature that requires it to Agree with an appropriate
element and an EPP feature requiring XP movement. This gives rise to the following XP
movements in OC:
(84) a. Wh-movement;
b. suo-movement for relatives;
c. focus-movement (of only-phrases);
d. postverbal adjuncts.
Furthermore, it is possible that F also triggered head movement, giving rise to canoni-
cal gapping (Wu 2002, He 2010), assuming, following Johnson (1994) and Tang (2001),
that gapping is across-the-board V-movement from a coordinated v/VP. The MnC–OC
contrast follows from the general lack of v-movement beyond vP in MnC, and the avail-
ability of such movement (e.g., into FP) in OC.
(i) Classifier stranding. As mentioned more generally in section 14.7, while Mandarin
allows deletion of an unstressed yi ‘one’ in certain positions thereby stranding a
classifier, TSM does not allow classifier stranding. Compare the following, repeated
from (57a) and (59a):
(85) Mandarin: wo yao mai (yi) ge roubaozi lai chi.
I want buy (one) CL meat-bun to eat
‘I want to buy a meat bun to eat.’
TSM: gua be boe *(tsit) liap bapao-a lai tsia.
I want buy *(one) CL meat-bun to eat
(ii) Aspectual suffix vs. auxiliary. While the perfective aspect in Mandarin employs the
suffix le, TSM resorts to a lexical auxiliary u ‘have.’ Compare:
(86) Mandarin: ni chi-bao-le ma?
you eat-full-Perf Q
‘Have you finished eating?’
TSM: li u tsia-pa bou?
you have eat-full Q?
That is, in Mandarin the Asp holds an Agree relation with the verb, in TSM a lexical
auxiliary does away with the Agree relation. The use of u ‘have’ as an auxiliary is in fact
generalized to all other categories, expressing existence of the main predicate’s denota-
tion. Thus, as an auxiliary of a telic vP, it expresses perfectivity (as in (86)). It may also be
used with an atelic VP, or with an AP, PP, or AspP predicate, expressing existence of the
relevant eventuality:
(iii) Aspectual suffix vs. resultative verb. While Mandarin perfective le is a suffix
denoting a viewpoint aspect, the corresponding item in TSM liau is still a resultative
verb meaning ‘finished.’
(88) Mandarin: ta chi-le fan le.
he eat-Perf rice Prt
‘He has eaten /He ate.’
TSM: i chia-liau peng a.
he eat-finished rice SFP
‘He finished the rice.’
(iv) Null vs. lexical light verb. In Mandarin, there is an interesting ‘possessive agent’
construction, illustrated here: add space between
(89) a. ni tan nide gangqin,
lines ta kan tade xiaoshuo.
you play your piano, he read his novels
‘You did your playing piano; he did his reading novels.’
b. ta ku tade, ni shui nide.
he cry his, you sleep your
‘He did his crying; you did your sleeping.’
In (89a), the possessives nide ‘your’ or tade ‘his/her’ do not denote the possessor of the
NP they modify (a piano or a novel). And in (89b), the possessives are presented with-
out a possessee head noun. In each case, the genitive pronoun is understood as the agent
of an event, represented as a gerundive phrase in the translation. Huang (1997) argued
that these sentences involve a null light verb DO taking a gerundive phrase as its com-
plement. The surface form is obtained when the verb moves out of the gerund into the
position of DO.
(90) a. ni DO nide [GerundP [VP tan gangqin]]
you DO your play piano
(In the TSM example, placing the object tshe after the verb would render it non-
referential, meaning ‘he didn’t find any book.’)
(vi) Objects of verb-resultative constructions. In Mandarin they may appear after the
main verb, but in TSM they are strongly preferred in preverbal position with ka:
(93) Mandarin: wo ma-de ta ku-le qi-lai.
I scold-to he cry-Perf begin
‘I scolded him to tears.’
TSM: gua ka yi me-ka khau a.
I ka he scold-to cry Prt
‘I scolded him to tears.’
(*?gua me-ka yi khao.)
(viii) Outer objects and applicative arguments. In Mandarin the verb may raise above
an outer or applicative object, but in TSM it must be licensed by the applicative head ka
preverbally:
(95) Mandarin: wo da-le Zhangsan yi-ge erguang.
I hit-PERF Zhangsan one-CL slap
‘I slapped Zhangsan once.’
TSM: gua ka Abing sian tsit-e tshui-phuei.
I KA Abing slap one slap
‘I slapped Abing once.’
(*gua sian Abing tsit-e tshui-phuei.)
I slap Abing one Slap
(ix) Noncanonical double-object construction. Both Mandarin and TSM have double-
object constructions in the form of V-DP1-DP2. In Mandarin, DP1 can denote a
recipient (the canonical DOC) or an affectee (the ‘noncanonical DOC,’ after Tsai 2007).
TSM, however, has only the canonical DOC. Thus, (96) in Mandarin has both the ‘lend’
and ‘borrow’ reading, but (97) in TSM has only the ‘lend’ reading:
For the ‘borrow’ meaning, the affectee (or source) DP1 must be introduced by the appli-
cative ka head:
The contrast shows that the main verb may raise to a null applicative head position in
Mandarin, but not in TSM.
(x) ka vs. ba. The above observations also lead us to the fact that, although the
Mandarin ba (as used in the well-known ba-construction) is often equated with, and
usually translates into TSM ka, the latter has a much wider semantic ‘bandwidth’ than
the former. Generally the Mandarin ba-construction is used only with a preverbal
low-level object (Theme or Patient), but the TSM ka-construction occurs with other,
‘non-core’ arguments, including affectees of varying heights—low and mid applicatives
as illustrated above, and high applicatives—adversatives or (often sarcastically)
benefactives, as illustrated here:
(99) i tshittsapetsa to ka gua tsao-teng-khi.
i 7-early-8-early already KA me go-back
‘He quit and went home on me at such an early time!’
(100) li to-ai ka gua kha kuai-le o.
you should KA me more obedient SFP
‘You should be more obedient for my sake, okay?’
We see then that while certain higher functional heads in the vP domain may be null in
Mandarin, they seem to be consistently lexical in TSM.
Arguably, in all these cases of differences between Mandarin and TSM, we see some
small-scale clustering. In fact, we may be dealing here with one or two mesoparame-
ters as defined in (26). Again, we see the pervasive effects of IG. If we take each differ-
ence as indicative of one microparameter, then we have observed ten microparameters.
Logically there could be 210 = 1,024 independent TSM dialects that differ from each
other by at least one parameter value. But it is unlikely that these parametric values are
equally distributed. Rather, the likely norm is that they cluster together with respect to
certain values. Hence here we have a mesoparameter, expressing special cases of TSM as
consistently more analytic than Mandarin, i.e., a range of heads in TSM lacks the formal
features giving rise to Agree or head movement in the corresponding cases in Mandarin.
Finally, not all speakers agree on the observations made in the preceding discus-
sion, thus reflecting dialectal and idiolectal differences. This is not surprising, as micro-
variations typically arise among individual speakers. Here we may also find cases of
nanovariation.