Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
47 views12 pages

Animal Pattern-Learning Experiments: Some Mathematical Background

The document discusses animal learning experiments that aim to distinguish the ability to learn different classes of patterns in strings, such as finite-state vs. context-free patterns. It argues that the experiments may be using the wrong distinction, as there are subclasses of finite-state patterns that have been overlooked. The paper aims to provide an introduction to some of these subclasses, such as strictly local and star-free patterns, in order to further understanding of animal learning abilities.

Uploaded by

Shekel Denino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views12 pages

Animal Pattern-Learning Experiments: Some Mathematical Background

The document discusses animal learning experiments that aim to distinguish the ability to learn different classes of patterns in strings, such as finite-state vs. context-free patterns. It argues that the experiments may be using the wrong distinction, as there are subclasses of finite-state patterns that have been overlooked. The paper aims to provide an introduction to some of these subclasses, such as strictly local and star-free patterns, in order to further understanding of animal learning abilities.

Uploaded by

Shekel Denino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Animal Pattern-Learning Experiments: Some Mathematical Background∗

Geoffrey K. Pullum and James Rogers


Radcliffe Institute for Advanced Study, Harvard University
Putnam House, 10 Garden Street, Cambridge MA 02138

1 Introduction In at least some of these studies, the boundary be-


tween finite-state and context-free stringsets is explored using
A number of recent studies on the learning of invented stringsets that are not good exemplars of the former class.
patterns in symbol sequences by animals such as cotton- This appears to be a result of the mistaken assumption that
top tamarins (Hauser, Chomsky & Fitch 2002:1578; Fitch the finite-state stringsets are at the lowest level of the relevant
& Hauser 2004; Hauser 2005; O’Donnell, Hauser & Fitch hierarchy, hence the smallest class of stringsets that needs to
2005), songbirds (Gentner 2005), and undergraduates (Fitch be considered. But the finite-state stringsets are by no means
& Hauser 2004; Perruchet & Rey, in press) have shown in- the smallest relevant stringset class. There are infinite hierar-
terest in the contrast between two classes of sets of strings: chies of interesting classes with general mathematical charac-
the finite-state and the context-free. The claim of Fitch & terizations and operational characteristics that are potentially
Hauser (2004) is that tamarins can learn characteristic finite- interesting from a cognitive perspective, each containing in-
state string patterns but not those that are context-free, while finitely many infinite stringsets, and each a proper subset of
humans can learn both. Perruchet & Rey dispute the latter the finite-state class. Some of these classes are learnable in
claim about humans; Gentner suggests that starlings show the specific formal sense defined by Gold (1967).
signs of being able to learn distinctively context-free patterns Moreover, in some of the cited studies the experiments are
of recursion (self-embedding); and so on. apparently designed to distinguish the ability to learn strictly
Showing that a mechanism can learn some stringset in a local relationships from the ability to learn unbounded or
certain class does not, of course, show that it can learn every ‘long-distance’ dependencies. But this distinction does not
stringset in that class (unless that stringset is complete for correspond to the distinction between the finite-state and
that class in the technical sense of computational complex- context-free stringsets. First, strict locality characterizes only
ity theory). But it is possible to provide evidence that the a very weak subclass of the finite-state stringsets. Second,
learnability boundary for some learning mechanism (the line finite-state stringsets can exhibit clearly unbounded depen-
between the stringsets it can and cannot learn) may fall be- dencies.
tween two hierarchically related stringset classes. One could That these points are not better known may be due to the
do this by utilizing a pair of stringsets: one in the stronger fact that so few mathematical studies were done on proper
(larger) class which cannot be learned (showing that not all subclasses of the finite-state stringsets in the early years of
stringsets in the stronger class are learnable) and one in the formal language theory. The situation changed radically dur-
weaker (smaller) class which can be learned (showing that at ing the 1960s, though (as noted by McNaughton & Papert
least one stringset in the weaker class is learnable). 1971:xiii). The stringset classes that are proper subclasses
But one would want to find a pair of stringsets that were of the finite-state are now fairly well understood. An under-
relatively close to each other with respect to the hierarchy. standing of their properties would appear to be highly rele-
One would not, for instance, choose a finite stringset and vant to the work on animal learning of regularities, and thus
a strictly context-sensitive stringset if the aim was to find perhaps to broader studies of animal precursors of the human
evidence that some mechanism could learn context-free but ability to recognize grammaticality over indefinitely large sets
not context-sensitive stringsets. of expressions.
∗ This is the draft of January 25, 2006. It is being circulated to a few
Our goal in this paper is to provide an introduction to some
interested scholars in relevant fields, but has not at present submitted for interesting proper subclasses of the finite-state class, with par-
publication. Please contact the authors — [email protected] and ticular attention to their possible relevance to the problem
[email protected] — before citing or quoting this paper. Com- of characterizing the capabilities of language-learning mecha-
ments will be welcomed. Our work has been partially supported by
nisms. We survey a sequence of these classes, strictly increas-
the Radcliffe Institute for Advanced Study at Harvard University. We
are grateful to Marc Hauser, Gary Marcus, Andrew Nevins, and Tim ing in language-theoretic complexity, discuss the character-
O’Donnell for relevant conversations and correspondence. Special thanks istics of the classes both in terms of their formal properties
to Jean Davidson, Gerald Gazdar, Andràs Kornai, Barbara Scholz, Stu- and in terms of the kinds of cognitive capabilities they corre-
art Shieber, Dmitri Tymoczko, and Tom Wasow, who read earlier drafts,
spond to, and suggest contrasting patterns which could serve
made substantive contributions, spotted errors, and supplied critical
comments. They deserve credit for many improvements but bear no to distinguish the adjacent classes in language learning exper-
responsibility for any shortcomings. iments.

1
We also note that the sense in which ‘recursion’ may be set of all strings consisting of zero or more concatenated
thought of as distinguishing finite-state from context-free elements of A is referred to here as asteration). The psy-
stringsets is much more subtle than seems to be normally chological ability that is added when we move from the
assumed. star-free to the finite-state is essentially the ability to do
modular arithmetic: to recognize that a certain stimulus
occurred exactly n times in excess of some multiple of a
2 Preliminary overview modulus m (as in the way clocks count hours) — that is,
to be able to count up to a set threshold and then re-set
We begin with a brief and somewhat psychology-oriented the counter and start again.
overview, in the hope that the reader will find it useful to
— The context-free (CF) stringsets are a well-known
have a non-mathematical, intuitive introduction to the classes
proper superset of the finite-state stringsets that allow
of stringsets we are concerned with, stating up front what
for what can be computationally modeled by operations
we take to be the key features that might have psychological
on an unbounded-depth pushdown stack memory. (Un-
relevance. The main classes of stringsets we deal with are
like finite-state automata, pushdown automata do not
these (we proceed from weaker to stronger classes):
have a fixed finite bound on the amount of working mem-
ory needed for computations.) Context-free stringsets
— The strictly local (SL) stringsets are described solely have sufficient expressive power to cover such operations
in terms of adjacency relations — which fixed substrings as integer addition in unary notation (for instance, the
are allowed to be immediately next to which other fixed set of all strings like ‘1+1=11’, ‘1+11=111’, ‘11+1=111’,
substrings. (For each number k, there is a class of strictly ‘11+11=1111’, and all others where the third block of 1
local stringsets in which the maximum length of the men- symbols is as long as the first two joined together, can
tioned substrings is k symbols.) In terms of psychology, be described with a context-free grammar.)
these stringsets have a clear connection with the capacity
Fleshing out these intuitive characterizations so that their
to associate immediately adjacent stimuli — not neces-
potential cognitive implications can be seen more clearly is the
sarily to recall whether a certain stimulus was present
purpose of the following sections, in which we offer a catalogue
at all, but to recognize a pair of stimuli when they are
of formal language theory results pertaining to the stringset
presented together in sequence.
classes that fall below the finite-state in the hierarchy. We
— The locally testable (LT) stringsets include all the attempt to minimize notational technicalities, and we keep the
strictly local ones together with those that can be de- overall structure modular enough that the reader will be able
fined out of them through the set-theoretic operations of to skip technical details of one subsection without rendering
union (the union of two sets contains every string that the next unintelligible. But we do make use of a number of
is either in the first or in the second), intersection (the standard notations from elementary logic and formal language
intersection of two sets contains every string that is both theory. We write:
in the first and in the second), and complement (the
complement of a set over some vocabulary contains ev- — ‘iff’ for ‘if and only if’;
ery string over the same vocabulary that is not in that — {w | φ(w)} for the set of all strings w such that the
set). In terms of psychology, this turns out to correspond condition φ(w) holds;
to the capacity to verify that a certain stimulus contains — #a (w) for the number of as in the string w;
some perceptible element at some point. (This is not
nearly so easy to see, but it will become clearer as we — ε for the unique ‘empty’ string of zero length;
proceed.) — ab for the string consisting of a followed by b;
— The star-free (SF) stringsets include all the locally — an for a string of n successive as;
testable stringsets together with all those that can be — (ab)n for n successive ab sequences;
defined from them by means of four operations: union, — {a, b}∗ for the asteration of {a, b} — the set of all strings
intersection, complement, and concatenation. (The op- consisting of those symbols (any length, ≥ 0), and
eration of concatenation on two stringsets LA and LB
— {a, b}+ for the positive asteration of {a, b} (all strings
forms the set LA LB of all strings that have a first part
with length ≥ 1).
from LA and a second part from LB .) The corresponding
psychological ability is the capacity to recognize that a For the asteration of a singleton set we often omit set brackets;
certain stimulus occurred at least n times for some fixed so we may write a∗ for {a}∗ or (ab)+ for {ab}+ .
number n — counting only up to some constant limit and Other notations will be explained as we come to use them.
thereafter losing track. Notice that although we maintain a terminological distinc-
— The finite-state (FS) stringsets include all the star-free tion between stringsets, classes of stringsets, and hierarchies
stringsets plus those that can be defined out of them of classes, this is purely for expository purposes. Mathemati-
through the operation of repeating substrings arbitrarily cally, all of our classes and hierarchies are just sets (hierarchies
many times (the operation that forms from a set A the being ordered by proper inclusion).

2
3 Strictly Local making up SL, because for each limit k on factor length there
is a different class of stringsets — the class of strictly k-local
We begin with a highly limited class of stringsets, one that stringsets for that choice of k, denoted by SLk . A description
really is limited to purely local dependencies in strings (as of an SLk stringset is simply a finite set of factors each no
the finite-state stringsets are not): the strictly local (SL) more than k symbols in length. The class SL is the infinite
stringsets. These are defined by reference to dependencies union of all the SLk classes.1
that are purely local in the sense that the only thing men- The strings in the description for an SLk stringset divide
tioned in describing them is the composition of their strings into four types:
in terms of fixed-length substrings. They provide a formal
reconstruction of a notion that will be familiar to anyone who (1) σ1 · · · σk (a string of k symbols)
knows the history of 20th-century psychology: the strict as- oσ 1 · · · σ k−1 (the left end-marker plus a string of
sociationism that takes learning to be simply a matter of be- k − 1 symbols)
coming sensitized to which sensorily perceptible phenomena σ 1 · · · σ k−1 n (a string of k − 1 symbols plus the
occur in temporal proximity to which others — a process of right end-marker)
coming to associate two stimuli if they regularly occur adja- o σ 1 · · · σ i n (a string of not more than k − 2 sym-
bols flanked by the two end-markers)
| {z }
cently. An animal that could learn to recognize patterns in i≤k−2
symbol strings only if they were definable in SL terms could
Letting Fk (w) denote the set of k-factors occurring in the
indeed be said to be limited to strictly local dependencies,
string w, the stringset licensed by an SLk description Γ is:
in a sense that can be made quite precise (we return to the
implications of this point in section 8). (2) {w | Fk (own) ⊆ Γ}
Hauser, Chomsky & Fitch (2002:1577) may be alluding to
the SL class when they say: Setting k = 1 gives a degenerate case where we cannot de-
fine a grammaticality distinction at all. If both endmarkers
At the lowest level of the hierarchy are rule systems are included in Γ, all strings over the symbols mentioned in
that are limited to local dependencies, a subcate- Γ are defined as grammatical (so over that vocabulary every-
gory of so-called “finite-state grammars.” Despite thing is grammatical); if either endmarker is missing we get
their attractive simplicity, such rule systems are in- the empty stringset ∅ (i.e., nothing is grammatical); and there
adequate to capture any human language. are no other possibilities.
The smallest non-degenerate case is where k = 2, giving the
The SL stringsets are, indeed, a very limited proper subset of
strictly 2-local stringsets, SL2 . These could be called the
the finite-state stringsets.
bigram stringsets, since their descriptions are simply sets
of bigrams — two-symbol factors that are stipulated to be
3.1 Definition permissible subsequences of strings.
A strictly local stringset over a vocabulary Σ is a subset of Σ∗
that is fully describable by means of a finite list Γ of strings 3.2 Abstract characterization
not longer than some upper length limit k. In order to belong
The following theorem provides a characterization of the class
to the stringset that Γ describes (the stringset that we denote
SL in terms of a property of strings:
by L(Γ)), a string in Σ∗ must consist wholly of substrings in
Γ. The sequences appearing in Γ (and the substrings against (3) Definition: Suffix Substitution Closure
which they are matched) are often called factors (since formal A stringset L is strictly local (SL) iff there exists some
language theorists write a · b or ab to denote a followed by b, k ≥ 1 such that for all x ∈ Σk−1 and v, w, y, z ∈ Σ∗ ,
and a3 for three as concatenated together, the same notation the following holds:
used in algebra for multiplication). By definition, every k-
length substring of every string in L(Γ) is a member of Γ. w · x · y ∈ L and v · x · z ∈ L implies w · x · z ∈ L.
Permitting reference to beginnings and ends of strings in
Γ can be accomplished in various ways, but here we will add That is, the SL stringsets are exactly those for which, beyond
two pseudo-symbols (intuitively, end-markers) to the symbol a certain factor length, what can precede a factor is entirely
set: o for the beginning and n for the end. Hence Γ will independent of what can follow it.
actually be a set of strings over Σ ∪ {o, n}, with o occurring That is, in SL stringsets there is a length beyond which
only initially and n occurring only finally. A string ow means the description is unable to impose conditions on both what
that a string in the stringset being defined may begin with w; precedes a factor x (of the necessary length, k − 1 symbols
and zn means that a string may end with z. There is no other long for a stringset in SLk ) and what follows it; so it allows
determinant of grammaticality for a stringset that is strictly anything that can precede v to co-occur with anything that
k-local other than that every substring of length k must be can follow.
on the list of factors. 1 Strictly speaking we didn’t define infinite unions. But we don’t really
We can distinguish more finely among the stringsets in SL. need to appeal to them, because we could just say SL = {L | ∃k ≥ 2[L ∈
There are actually infinitely many distinct stringset classes SLk ]}.

3
Thus all you have to do to prove a stringset is not SLk is {abc, cba} is a two-member finite set that has no bigram de-
to find two strings wxy and vxz (x being k − 1 symbols long) scription and thus is not in SL2 .
that are in the set, where wxz does not belong. The following
facts suffice to show that the set E of strings of words that
are grammatical in English cannot be SL2 : 4 Locally Testable
(4) a. I absented myself. We turn now to a class of strictly more complex stringsets, the
b. You absented yourself. locally testable or LT stringsets. They are straightforwardly
c. *I absented yourself. definable from the SL class.

Let w = I, x = absented, y = myself, v = You, z = yourself.


Since (4a) and (4b) show that wxy and vxz are in E, wxz
4.1 Definition
should be too if the set is SL2 ; but (4c) indicates that it is The locally testable (LT) class of stringsets is the closure of
not. Therefore E is not SL2 . the strictly local class under the boolean operations. That is,
every SL stringset is in the class LT, and so is every stringset
3.3 Canonical example of non-membership that can be formed by taking the union of two stringsets in LT,
the intersection of two stringsets in LT, or the complement of
The hallmark configuration that guarantees a stringset will a stringsets in LT. We can use this as our basic definition:
not be SL is for there to be some required factor that must
occur in strings arbitrarily far from the ends. For example, (6) Definition: Locally Testable
the set denoted a∗ (ba∗ )+ is a very simple finite-state stringset The class of locally testable (LT) stringsets is the
containing all and only those strings over {a, b} that contain smallest class that (i) contains every SL stringset, and
at least one b, but it has no SLk description for any k. The (ii) is closed under the operations of union of two sets
apparently simple notion ‘contains a b’ is not expressible. in the class, intersection of two sets in the class, and
Given that every English transitive clause contains a verb, complement of a set in the class.
and given that both subject and object noun phrases may be Again we have an infinite hierarchy of stringset classes: for
arbitrarily long, and the verb must lie between them, we now each k, the class LTk is the closure of SLk under the boolean
know that the set of English clauses, conceived as a stringset, operations. The class LT is the union of all the LTk (that is,
is not SLk for any k. To see this, take a transitive verb V and S
LT = k [LTk ]).
a noun phrase N , where N is at least k − 1 words long and
can function either as the subject of or as the direct object of
V . Since N can start a clause with V as its verb, and it can 4.2 Example
end such a clause, an SLk description will wrongly permit a An example of an LT stringset that is not SL is a+ (ba+ )+ , a
transitive clause to consist of just N on its own. (In this case, minor variation on the example of Section 3.3. This is the set
w = ε, x = N , y = V · N , v = N · V and z = ε.) But a noun of strings in {a, b}+ in which in which at least one b occurs,
phrase on its own is not a transitive clause. Hence the set of and b always occurs alone, preceded and followed by at least
transitive clauses in English is not SL. one a. This set is LT (in fact LT2 ), because it is the intersec-
tion of three SL2 sets: (i) the set of strings in {a, b}+ which
3.4 Hierarchies both start and end with a; (ii) the set in which bb does not
occur (an SL2 description can simply permit all pairs of as
There are infinitely many strictly k-local classes of stringsets: and bs except for bb); and (iii) the set in which ab does occur
the bigram stringsets (SL2 ), the trigram stringsets (SL3 ), and (SL in virtue of being the complement of the set in which ab
2
so on. Each is properly included in the next one up: SL2 ( does not occur). But it is not SL for any k, since a·ak−1 ·bak
k
SL3 ( · · · SLk ( SLk+1 ( · · ·. The class SL is the infinite and ak b · ak−1 · a are both in the set but a · ak−1 · a is not.
union of all of them: SL = SL2 ∪ SL3 ∪ SL4 · · ·.
The relation between the SL stringsets and the finite ones
is slightly subtle. The class Fin of finite stringsets is a proper 4.3 Abstract characterization
subset of SL, but not of SLk for any k. Nonetheless, for any The following statement provides an abstract characterization
given finite stringset L you can find some k such that L is of the LT class.
strictly k-local. That is, both the following are true:
(7) Definition: Local Test Invariance
(5) a. There is no length limit k such that Fin is a subset A stringset L ⊆ Σ∗ is LT iff there exists some k > 0
of SLk . such that for all strings w, v ∈ Σ∗
b. For every stringset L in Fin there is a length limit
(Fk (own) = Fk (ovn)) ⇒ (w ∈ L ⇔ v ∈ L).
k such that L is in SLk .

A corollary of this is that for every k we can find finite sets This says that a stringset is LT iff there is a number k such
of strings with no strictly k-local description. For example, that membership of a string in the set is determined solely by

4
which k-factors occur in the string. Any two strings that share (10) Definition: Star-Free Stringsets (SF). The class of
the same set of k-factors will either both be members of the star-free stringsets over a vocabulary Σ is the smallest
set or both be non-members. Notice that every LT stringset class of stringsets that (i) includes the empty set ∅, the
must be LTk for some finite number k, and the definition singleton set {ε} containing the empty string, and for
asserts the existence of that k. each symbol a in Σ the singleton set {a}, and (ii) is
closed under boolean operations and concatenation.
4.4 Canonical example of non-membership (The reader already familiar with finite-state stringsets will
It is impossible to ensure in an LT stringset that some factor notice how similar this is to the definition of that∗class. How-
occurs at least n times for some n > 1, while allowing each ever, it lacks the mention of closure under the ‘ ’ operation
instance to occur arbitrarily far apart and arbitrarily far from — hence the name ‘star-free’ — and it includes closure under
the ends of the string. So this stringset (the set of all strings complement in the definition.)
over {a, b} in which a occurs at least twice) is not LT: There is a straightforward relation between SF and LT.
Closing LT under concatenation and the boolean operations
(8) {w ∈ {a, b}∗ | #a (w) ≥ 2} yields a class of stringsets known under the name k-locally
testable with order (LTOk ), and the infinite union of these
Likewise, the variants with #a (w) = 2 or #a (w) < 2 are not
for all k is the class LTO. So we have this definition:
LT.
It is also impossible to require two factors to occur in some (11) Definition: Locally Testable with Order (LTO). The
particular order if the two may occur arbitrarily far apart and class of LTO stringsets is the smallest class that con-
arbitrarily far from the ends of the string. So this stringset is tains all the LT stringsets and all those that can be
not LT: formed from LTO stringsets by means of the union, in-
tersection, complement, and concatenation operations.
(9) {u ab v ba w | u, v, w ∈ {a, b, c}∗ }
McNaughton & Papert (1971) prove that this gives us exactly
4.5 Hierarchies the same class as the previous definition, i.e., that LTO = SF.

Again we have an infinite hierarchy of stringset classes making


5.1 Example
up the entire class LT:
An example of an SF stringset that is not LT is a+ ba+ . This
LT2 ( LT3 ( · · · LTk ( LTk+1 ( · · · is the set of strings in {a, b}+ in which exactly one b occurs.
And again, the relationship to the finite stringsets is not sim- It is SF since it is the concatenation of the set of strings in
ple: it is true that L ∈ Fin ⇒ L ∈ LTk for some k, so {a, b}+ in which the only b is the final symbol (an SL2 set
Fin ( LT. However, there is no k such that Fin ⊆ LTk . Thus licensed by the pairs {oa, aa, ab, bn}) and the set of strings
for every integer k ≥ 2, there are finite stringsets that are not in which only a occurs (similarly an SL2 set). It is not LTk
LTk . for any k since the strings ak bak and ak bak bak share the same
The relations between the SL and LT hierarchies are as set of k-factors but the first is a member of the stringset while
follows. For every k, SLk is a proper subset of LTk ; but for the second is not.
no k is SLk+1 a subset of LTk .2 On the other hand, there is By similar analyses, each of the canonical non-LT stringsets
no k such that LTk a subset of SLk+1 . In fact it is not even given in Section 4.4 can be shown to be SF.
the case that LT2 ⊆ SL, so for no k is LTk a subclass of SL.3 survey of this material. One has been called the ‘dot-depth hierarchy’.
We start with a base class of stringsets like SL or the even more primitive
‘definite event’ stringsets (in which grammatically is determined entirely
5 Star-Free by the content of the last k symbols for some fixed k), and we form the
class of stringsets obtainable by concatenating any two of them, and then
we close under boolean operations. This gives us dot-depth 1. Then we
We now define an important proper superset of all of the repeat with the dot-depth 1 stringsets as the base, to obtain the dot-
classes mentioned so far: the star-free (SF) stringsets.4 depth 2 class. An infinite strict hierarchy of stringset classes is obtained.
We first define the class SF inductively: Another way to go is to generalize the LT capability for recognizing that
a substring x occurs at least once, and replace 1 by a constant k. That
2 To see this, note that for any k there is an SL
k+1 description for is, we allow for recognizing of a string w that a substring x occurs at
{a, b}∗ − ({a, b}∗ {a}k+1 {a, b}∗ ) — the set of all strings over {a, b} with least j times, for j ≤ k. This yields a class of stringsets that are called
no string of k + 1 adjacent occurrences of a; but this stringset will not ‘locally threshold testable’ in Straubing (1994:47) and ‘generalized lo-
be in LTk , because the k-length factors of oak bak n are identical with cally testable’ in Thomas (1982:372). For each k ≥ 1 there is a class of
the k-length factors of oak+1 bak+1 n. k-locally threshold testable (or generalized k-locally testable) stringsets.
3 This can be seen from the fact that L ∗ ∗
ab = {a, b} {ab}{a, b} , the set Again the hierarchy is strict, and the infinite union of its members for all
of all strings over {a, b} in which ab occurs, is LT2 (it’s the complement k makes up the locally threshold testable (or generalized locally testable)
of the set of strings in which ab does not occur). But it cannot be SL, class that we could call LTT. An interesting model-theoretic characteri-
for this reason: for any choice of k, the string bk abbk cannot belong to zation is given by Straubing (1994:46–50) and Thomas (1982:372–373):
an SLk set unless bk also belongs, so for all k we have a counterexample LTT is exactly characterized by first-order logic, interpreted on string
to the claim that Lab is SLk ; and that means Lab is not SL at all. models, if there is only one relation symbol and it is interpreted as ‘is
4 At least two other infinite hierarchies of stringsets are left undis- immediately followed by’. All these classes are proper subclasses of the
cussed here, though it would be appropriate to include them in a fuller better-known star-free class to which we turn in this section.

5
5.2 Abstract characterization Consider the string abab. It corresponds to a string struc-
ture with D = {n1 , n2 , n3 , n4 } and R = {≺, Pa , Pb }, where ≺
It is possible to characterize the same class in a different and (binary) is immediate precedence (we assume n1 ≺ n2 , n2 ≺
more abstract way, in terms of a direct condition on what n3 , and n3 ≺ n4 ), Pa picks out the as (so it has the extension
strings they contain and exclude. {n1 , n3 }), and Pb picks out the bs (its extension is {n2 , n4 }).
(12) Definition: Non-Counting Stringsets. A stringset L Various first-order formulae will be true in this structure; for
over a vocabulary Σ is non-counting5 iff there exists example, the closed formula ∀x∃y[Pa (x) ⇒ (x ≺ y ∧ Pb (y))]
some n > 0 such that for all strings u, v, w ∈ Σ∗ and says that every a immediately precedes an occurrence of b,
for all i ≥ 1, the presence of uv n w in L implies that and the structure just described satisfies this condition.
uv n+i w is also in L. For any first-order sentence, there will be some set of finite
string structures that satisfy it. The following remarkable
In other words, in a stringset that is non-counting there is result about first-order logic on string models was proved in
a threshold length beyond which repeated substrings cannot McNaughton & Papert (1971) (for more modern approaches
be counted: wherever there are n consecutive occurrences of to proving the result, see Straubing 1994, Ebbinghaus & Flum
any substring v, it would also be grammatical to have n + 1 1999, and Libkin 2004):
consecutive occurrences of v. (14) A stringset is star-free iff it corresponds to the set of
Schützenberger (1965) proves that the non-counting all the finite string structures satisfying some closed
stringsets are exactly the star-free stringsets. So that yields formula of first-order logic.
a third distinct characterization of the SF class.
Thus the star-free stringsets can be characterized exactly by
first-order logic descriptions interpreted on string structures.
5.3 Canonical example of non-membership And we can put that another way: having the ability to learn
A representative example of a kind of stringset that is not in arbitrary star-free stringsets from presented examples would
SF is that of the stringsets in which some factor must occur be tantamount to being able to learn any arbitrary property
an even number of times. Thus the following stringset is not of strings that is definable in the first-order predicate calculus.
SF: This would be a truly remarkable ability to attribute to any
animal.
∗ Yet so far we are only talking about the star-free stringsets.
(13) {w | w ∈ {a, b} | #b (w) is even}
The finite-state stringsets are a superset with a still richer
Notice that there is no number so large that beyond that structure. We now turn to them.
number of b tokens divisibility by 2 ceases to matter. However
many occurrences of b may occur in a string in this set, the
same string with one extra b will not be in the set. To define 6 Finite-state
the set you have to be able to count modulo 2. But in SL
stringsets you can’t count modulo n for any n > 1. The asteration operation permits the definition of a proper
superset of the star-free stringsets, and of all the other classes
just discussed, the finite-state or regular stringsets, first
5.4 Model-theoretic characterization defined by Kleene (1956) under the name ‘regular events’ (he
was thinking of events in neural nets). These have a large
One other distinction of the star-free stringsets that is partic-
array of distinct characterizations that have been of enormous
ularly important in our view, though we touch on it only very
importance in theoretical computer science and continue to be
lightly here. There is a particularly simple way of characteriz-
important in computational linguistics. They are much better
ing these stringsets using the model-theoretic techniques used
known than the former stringset classes, so we will just very
to provide semantics for logical languages.
briefly summarize.
Strings can be equated with relational structures of the
sort familiar in model theory. A relational structure is a set
of elements D known as the domain together with a set R of 6.1 Definition
relations on D. Thus a graph is a relational structure: D is the The class under discussion can be characterized directly by
set of nodes, and there is one relation in R, the relation that means of an inductive definition, using singleton stringsets as
holds between two nodes if they are immediately connected by the basis:
a edge in the graph. A string of symbols can be seen as a finite
graph in which the edge relation strictly orders the domain (15) Definition: Regular stringsets
and a set of unary relations (i.e., properties, one for each The class of regular stringsets over a vocabulary Σ is the
distinct symbol) partition it. Call these string structures. smallest class of stringsets that (i) includes the empty
set, the singleton set containing the empty string, and
5 The term used in McNaughton & Papert (1971) is ‘counter-free’. We
for each symbol in Σ the singleton set containing only
avoid this term simply because its initials are unfortunately the initials
of the phrase ‘context-free’, which we use frequently below. The term
that symbol, and (ii) is closed under union, concatena-
‘aperiodic’ may also be encountered in the literature. tion, and asteration.

6
6.2 Automaton characterization (18) Definition: Nerode Equivalence
A string w is Nerode equivalent to u with respect to
Now consider the (apparently) quite distinct notion of a a stringset L ⊆ Σ∗ , and we write w ≡L u, iff for all
finite-state automaton — the most powerful possible com- strings v ∈ Σ∗ , wv ∈ L ⇔ uv ∈ L.
puting device that is finite in every respect including the total
amount of working memory used in computations, and thus Intuitively, two strings are Nerode equivalent with respect to
in a sense a model for any physical computing device we can L iff they have exactly the same set of possible continuations
build in a finite universe. in L — any continuation to one of them that would make a
string in L would also work for the other, and any continua-
(16) Definition: Finite-State Automaton. tion string that would make one of them ungrammatical in L
A finite-state automaton (FSA) is a system of five com- would do the same to the other. We now define the Nerode
ponents: equivalence class of a string w with respect to a stringset
— a finite set Q of states, L as the set of all strings u that are Nerode-equivalent to it
(with respect to L):
— a finite set Σ of symbols,
— a function T from Q × Σ to Q, (19) Definition: Nerode Equivalence Class
The Nerode equivalence class of w with respect to L is
— a distinguished start state q0 ∈ Q, and def
[w]L = {u ∈ Σ∗ | w ≡L u}.
— a distinguished set of final states F ⊆ Q.
So the Nerode equivalence class with respect to L for a string
The intuition here is that the automaton corresponds to a contains all and only the strings that are Nerode equivalent
machine that starts in state q0 reading a string of symbols to it with respect to L.
from Σ on a tape or other ordered digital medium, changing Now we can state an additional characterization of the FS
state as dictated by the function T (if the value of T (qi , σk ) stringsets in terms of how many Nerode equivalence classes
is qj , then reading σk while in state qi makes the machine there are (this characterization derives from the results re-
switch to state qj ). A sequence of states that the machine ported in Myhill 1957 and Nerode 1958):
goes through while reading a string is called a run, and if it
ends with a state that belongs to F , it is a successful run. (20) Myhill–Nerode Theorem
The stringset recognized by an FSA A is A = hQ, Σ, T, q0 , F i A stringset L ⊆ Σ∗ is finite-state iff ≡L partitions Σ∗
— that is, the set of all and only those strings σ1 . . . σn over into finitely many Nerode equivalence classes.
Σ on which A has a successful run.
This says that a stringset over Σ is FS iff its Nerode equiva-
We call a stringset finite-state (FS) iff there is an FSA
lence relation breaks up the entire universe of strings over Σ
which recognizes it. Kleene (1956) connected the notion of
into just a finite number of Nerode equivalence classes. That
recognizability by FSA to the class of regular stringsets de-
will mean that you can make a finite list of the kinds of con-
fined above:
tinuation string and say for each whether they will lead to
(17) Theorem (Kleene) grammaticality or ungrammaticality in L. For example, let
The finite-state stringsets (those recognizable by finite- Σ = {a, b} and L = a∗ b∗ . Then there are just Nerode equiv-
state automata) are exactly the regular stringsets. alence classes: (i) strings in a∗ (these can be continued with
with any string in a∗ b∗ to yield a string in L); (ii) strings in
a∗ b+ (these can be continued with with any string in b∗ to
6.3 Example yield a string in L); and (iii) all other strings (any contin-
uations of these always lead to strings not in L). Since the
An example of a finite state stringset that is not SF is (aa)∗ . number of Nerode classes is finite, L is FS.
This is the set of all even length strings containing just a.
That is, it is simply the asteration of the result of concate-
nating two copies of the singleton stringset {a}. To see that 6.5 Grammar characterization
it is not SF, note that, for any n, either the string aan a or
Another characterization of the FS stringsets can be given
aan aa is in the stringset but, in both cases, the result of iter-
in terms of a rewriting system or generative grammar. A
ating the a one additional time is not. Similarly, example 13
rewriting system consists of a finite set N of nonterminals
of Section 5.3 is regular.
(categories), a finite set Σ of terminals (symbols or words that
appear in strings), a designated start symbol S that belongs
6.4 Abstract characterization to N , and a finite set of rules. The regular grammars
are those rewriting systems in which the rules are all of the
There is a more abstract algebraic characterization of the following two forms (where X, Y ∈ N , σ ∈ Σ, and ‘→’ means
same class. We start with the notion of Nerode Equivalence, ‘can be rewritten as’):
which is an equivalence relation involving sharing of continu-
ation strings: (21) a. X → σY b. X → σ

7
When a grammar has rules only of the forms shown in (21), finite sets of elements of the domain. Büchi (1960) proved the
the set of all and only those sequences of symbols in Σ that following theorem:
can be obtained by starting with S and rewriting according to
rules chosen at random will always be FS, and every stringset (23) A stringset is finite-state iff it corresponds to the set
can be described by some grammar of this form. (In fact of all the finite string structures satisfying some closed
such a grammar can be constructed directly from an FSA in formula of weak monadic second-order logic.
such a way that it will generate the stringset that the FSA Again, then, what this means is that being able to learn arbi-
recognizes. It is not too hard to see the similarity between trary FS stringsets from presented examples would be tanta-
rules and automaton instructions: a rule ‘X → σY ’ means mount to being able to induce any arbitrary property of strings
that reading σ while in state X makes the machine switch to that can be defined in weak monadic second-order logic. This
state Y .) seems to us extraordinarily unlikely for any species of animal.
Grammars of this sort can also be called finite-state
grammars, and for convenience when quoting Hauser et al.
below, we’ll follow them in using this term, and also the ab- 7 Identification in the limit
breviation FSG. The following are two examples of rewriting
rule sets for FSGs: There is one obvious, very specific sense in which it is certainly
impossible to have a general capacity to learn FS stringsets:
(22) a. S → aA, A → aS, A → a the class FS is not ‘identifiable in the limit from text’ in the
b. S → aA, S → bS, S → b, A → aS, A → bA, A → a technical sense of Gold (1967).
Identifying a stringset in this context is trivial: to identify
The grammar with the rules in (22a) generates all and only a stringset is simply to output a description for it. So every
the even-length strings of repetitions of a. The rules in (22b) r.e. stringset can be identified instantly by an algorithm: if
generate all and only the strings over {a, b} with an even G is a grammar that generates L, an algorithm which simply
number of a occurrences. Those stringsets are thus shown to outputs G is an algorithm that identifies L. Matters only
be FS.6 become non-trivial when we talk about classes of stringsets,
and the question posed is whether some procedure can detect
6.6 Canonical example of non-membership for any stringset in the class which stringset it is, and identify
it by (for example) naming a generative grammar for it.
The classic case of a configuration that identifies a stringset A class L of stringsets is identifiable in the limit from text
as not being FS is that of a dependency between a certain iff there exists a single mechanical procedure meeting this
number n of symbol occurrences that need to be matched condition: When presented with an infinite sequence of strings
by exactly n occurrences of another symbol elsewhere in the from an arbitrary target stringset L in L — that is, a sequence
string. The simplest such case is {an bn | n ≥ 0}, for which the that contains each string in L at some point (possibly with
syntactic requirement is simply that there be n occurrences repetitions) — produces a sequence of ‘guesses’ at the correct
of a followed by exactly the same number of bs. The Nerode description for L (one after each presented string), guaranteed
equivalence relation for this stringset yields infinitely many to converge after a finite amount of time on a correct guess
equivalence classes (in fact every string of the form ai bj is — a fully accurate description of L — and never diverge from
in its own unique equivalence class: if ai bj w is in the set for it after that, no matter what further strings are presented.
some i, j ≥ 0 and so is ai bj u, then w = u), so by the Myhill– Gold’s proof that the FS class is not identifiable depends on
Nerode theorem it cannot be FS. Other examples of non-FS an elementary fact about the class: it is superfinite, which
stringsets include the set of palindromes over a given vocab- means it contains every finite stringset over Σ plus some infi-
ulary or alphabet (the strings that read the same forwards as nite ones as well. His theorem to the effect that no superfinite
they do backwards), and the set of non-palindromes. class is identifiable in the limit from text implies that the SF,
LT, and SL classes aren’t identifiable either. Notice, though,
6.7 Model-theoretic characterization that his proof does not go through for the SLk or LTk class
given some fixed k, because each such class excludes some
As with the star-free stringsets, the FS stringsets have a log- finite stringsets.
ical characterization that we will state without detailed dis- In fact it is easy to see that for any k, there is an algorithm
cussion. It makes reference to weak monadic second-order that can be guaranteed to eventually identify from text the
logic (wMSO), which is like first-order predicate calculus ex- correct description for any SLk stringset. The procedure is
cept that there are additional variables for quantifying over just to keep recording k-length factors and adding them to the
6 We are now in a position to provide something that we did not current guess at what the right description is. For example,
provide in section (3) because we had not introduced grammars: we can if the SL3 identification algorithm were presented with the
give a simple grammar characterization for the SL2 class. A regular string abba, it would add to its currently guessed description
grammar generates an SL2 stringset iff the relation ‘is followed by in
the factors oab, abb, bba, and ban, and guess that the result
some rule’ on the symbol set is a function — that is, if for any given
terminal there is at most one nonterminal that can follow it in the right is correct. The set of factors will simply grow until all the
hand side of a rule. right factors are in the description, and if the algorithm is

8
presented with a text in Gold’s special sense then eventually just “concatenate items”; crucially, they can iterate on se-
the set of factors will be the correct one, and from then on it quences. In consequence, complex strings can be embedded
will not undergo any further changes (because it will already within other strings, and complex unbounded dependencies
contain every 3-length factor of every string in the set). For may hold.
more detail on this see Garcı́a & Vidal (1990). For example, Pullum & Gazdar (1982) point out that a
The LTk class turns out also to be learnable from positive finite-state grammar can easily describe an infinite subset of
data for any fixed k, although the matter is more complex; English in which fronted wh-phrases agree with verbs arbi-
see Garcı́a and Ruiz (2004). trarily far away from them, as seen in the contrast between
No animal is going to learn patterns by identifying such pairs as these:
stringsets in the manner suggested by Gold’s paper, which
involves a brute force procedure of working methodically (24) Which girls do they think that you think that he thinks
through an enumeration of an entire (typically infinite) class (. . . ) were responsible?
of grammars, ruling them out one after another until the right Which girl do they think that you think that he thinks
one is found. His method was never intended to emulate a (. . . ) was responsible?
learning process; it is a tool for showing that in principle suc-
cessful learning processes for the class do exist (or that they The finite-state stringsets are in fact a very rich and di-
don’t). But for any experimenter on pattern learning by ani- verse class. Imagine setting an animal (or indeed, a human
mals it is worth keeping in mind what we know about this area being) the task of learning the pattern shared by the following
so far. For every fixed k, the entire class of stringsets with strings:
SLk descriptions is identifiable, so we know that in that sense
we are in the realm of stringsets for which inductive learning (25) lo me la
is possible in principle. The same is true for LTk . But for the lu me lu
classes SL, LT, SF, FS, and classes higher in the hierarchy, lo ki ki la
learning in the sense of identification in the limit from text — lu me ki ki lu
roughly, effective induction on the basis of a finite sequence lo me ki me ki la
of exemplars — is impossible in principle. lu me me ki ki lu
lo me ki me me ki me la
lu me me ki me ki me lu
8 Iteration, recursion, and infinitude lo ki me me ki me ki me ki me la
lu me ki ki me me ki me ki me lu
There is a misconception found in the literature to the effect lo me ki me me me ki me ki me ki me la
that the finite-state stringsets allow for only local relation- lu me me ki me me me ki me me ki me ki me lu
ships and cannot accommodate long-distance dependencies ···
between symbols. In Fitch & Hauser (2004) we read:
Here is the intended solution: these strings are all of the form
The weakest class in [the Chomsky] hierarchy are lo · · · la or lu · · · lu where ‘· · ·’ is a sequence of arbitrary length
finite state grammars (FSGs), which can be fully (n.b., an unbounded dependency) composed of ki and/or me
specified by transition probabilities between a finite in which the count of (not necessarily adjacent) tokens of ki
number of ”states” (e.g., corresponding to words or is exactly divisible by 2. It seems highly unlikely that any
calls). . . . In addition to concatenating items like animal would be able to learn this pattern. But the infinite
an FSG, a PSG can embed strings within other stringset it characterizes is finite-state: if we let
strings, thus creating complex hierarchical struc-
tures (“phrase structures”), and long-distance de- R = ({me}∗ · {ki} · {me}∗ · {ki} · {me}∗ )∗
pendencies.
then the stringset in question is the union of {lo} · R · {la}
Set aside the fact that standard FSGs do not depend on or
and {lu} · R · {lu}, which is easily seen to meet the definition
keep track of probabilities; that is just a matter of replacing
of regular stringsets in (15).
probabilities (real values between 0 and 1) with possibilities
The most important contrast studied in recent animal
(0 or 1). However, it is not true that the states in an FSA or
experimentation on stringset learning, particularly with re-
the nonterminals in an FSG correspond to words or symbols
spect to cotton-top tamarins, has been the contrast be-
in a string. An automaton may be able to move to any of a
tween stringsets isomorphic to (ab)+ (containing strings like
number of different states on reading a certain symbol, and
abababab) and stringsets isomorphic to {an bn |n ≥ 1} (con-
may have many different symbols that it can read while in
taining strings like aaaabbbb). The former has been taken to
a given state.7 It is also not really true that FSGs do not
be typical of the FS stringsets, and the latter of those that
7 If an FSG meets the condition mentioned in footnote 6, however,
are CF. But as we have seen, the finite-state stringsets are a
then in a sense the states do correspond to terminals — for a given ter-
nowhere near the bottom of the hierarchy of stringset classes.
minal there is only one state that can be next. This is another indication
that although Fitch & Hauser talk about FSGs, they may tacitly have And (ab)+ is in all of the classes we have discussed, all the way
SL stringsets in mind. down to SL2 . (A bigram set that defines it is {oa, ab, ba, bn}.)

9
So it can hardly be thought a representative member of the where n ≥ 4 are not grammatical, and provides arguments
class of finite-state stringsets. There is no reason to think, that the stringset of English is SF but not LT.)
simply on the basis that an animal has been shown to be ca- The stringsets on which Fitch and Hauser (2004) actually
pable of recognizing a pattern like (ab)+ , that it is capable tested both their animals and their human subjects were fi-
of learning anything beyond strictly 2-local sets, a very lim- nite (and very small) sets — sets with the homomorphic im-
ited class of stringsets indeed. Far from being an established ages {abab, ababab} and {aabb, aaabbb}. There is no language-
lower bound, the finite-state stringsets might be far and away theoretic complexity difference here: an SL7 description suf-
beyond the animal’s learning capacity. fices in each case. It is only the infinite extensions to (ab)+
Similarly, the CF stringsets, as a class, may be far and and {an bn |n ≥ 1}, respectively, that separate them in com-
away beyond the learning capacity of human beings. The plexity terms; and they turn out to be separated very widely
class of CF stringsets includes such stringsets over {a, b, c} indeed.
as the following (and of course infinitely many others, some Hauser, Chomsky & Fitch (2002) appear to hold that the
much more complicated): generalization to infinite extensions is the fundamental hall-
mark of the unique capacity for language differentiating hu-
(26) {w1 · · · wk | ∀i ≤ k ∃n ≥ 0[(wi = an bn ) ∨ (wi = bn cn )]} mans from all other animals. The key notion is “recursion”.
(all and only those strings made up of consecutive substrings They give nothing detailed about what this means, but on on
each composed either of zero or more as followed by an equal p. 1571 they say this about the human faculty of language in
number of bs or of zero or more bs followed by an equal the narrow sense (“FLN”):
number of cs)
All approaches agree that a core property of FLN is
(27) {cw1 c · · · cwk c | ∀i ≤ k[#a (wi ) = #b (wi )]}
recursion, attributed to narrow syntax in the concep-
(all and only those strings composed of an arbitrary number
tion just outlined. FLN takes a finite set of elements
of c-separated substrings over {a, b} each having a number
and yields a potentially infinite array of discrete ex-
of a occurrences equal to the number of b occurrences)
pressions.
(28) {xcy | x ∈ {a, b}∗ ∧ y ∈ {a, b}∗ ∧ x 6= y}
This suggests that recursion is nothing more than whatever
(all and only those strings in which nonidentical initial and
permits the generation of infinite stringsets on finite vocabu-
final substrings over {a, b} are separated by a single c)
laries. They go on to say:
Whether humans could be said to be capable of learning pat-
Natural languages go beyond purely local structure
terns of arbitrary CF types such as these has not been inves-
by including a capacity for recursive embedding of
tigated.
phrases within phrases, which can lead to statisti-
Fitch & Hauser (2004) do report success in teaching college
cal regularities that are separated by an arbitrary
students to recognize the pattern {an bn | n ≥ 1}. Perruchet
number of words or phrases. Such long-distance, hi-
& Rey (in press) are skeptical, and claim to have shown that
erarchical relationships are found in all natural lan-
humans do not in fact learn the pattern involved (as opposed
guages for which, at a minimum, a “phrase-structure
to spotting which of a very small finite set of strings are the
grammar” is necessary. It is a foundational observa-
ones the investigator is concerned with). This seems plausible
tion of modern generative linguistics that, to capture
to us, in light of the well-known fact that humans find pro-
a natural language, a grammar must include such
cessing such patterns is almost impossible for n ≥ 3. Consider
capabilities . . .
this string, of an bn form:

(29) People people people left left left. This suggests that recursion is whatever distinguishes the
non-FS CF stringsets from the FS.
Few English speakers see this as grammatical before the trick What seems to us to be the standard interpretation of the
has been explained to them. The trick is to see that left can term ‘recursion’ in formal language theory is that it refers to
mean either “departed” or “‘abandoned”, and people people the rewriting of a string φ1 Aφ2 so that at a later (not neces-
left can mean people whom people abandoned. The sentence sarily immediately following) stage it looks like φ1 ψ1 Aψ2 φ2 ,
in (29) is standardly regarded as grammatical, but mainly where ψ1 and ψ2 are not both empty and A is a nonterminal
for theoretical reasons. For example, it corresponds to the that yields terminals in at least some derivations of this sort
perfectly acceptable passive People [who were] left by people and so do at least some of the nonterminals in either ψ1 or
[who were] left by people left, with the fully understandable ψ2 . In other words, an A constituent is allowed to be properly
meaning “People who were abandoned by people who were contained in another, longer, A constituent.
abandoned by people departed.” The standard view is that The property of exhibiting recursion in this sense distin-
expressions like (29) exemplify a structural regularity that the guishes grammars that generate infinite stringsets from those
grammar of English has to permit but that the parsing capa- that generate only finite stringsets. As such, it cannot be
bilities of human beings cannot cope with. (A very interesting characteristic of the distinction between mechanisms that
non-standard view is provided by Kornai 1985, who regards can learn only FS stringsets and those that can learn CF
it as completely obvious that strings of the form Peoplen leftn stringsets. Both of the regular grammars in (22) are recursive

10
in this sense. Indeed, the same is true for every grammar for Merely pointing out that recursion “yields a potentially in-
a non-finite SL stringset. finite array” or allows configurations to be “separated by an
The notion of recursion that actually distinguishes regular arbitrary number of words or phrases” is not enough to delin-
from CF stringsets is the property called ‘self-embedding’ in eate a difference between animal pattern-learning capacities
Chomsky (1959). Self-embedding involves derivations that go and human linguistic abilities. Many of the distinctions that
from φ1 Aφ2 (not necessarily directly) to φ1 ψ1 Aψ2 φ2 , where these experiments appear to be designed to illuminate, turn
neither ψ1 nor ψ2 is empty and A yields terminals in at least out, in fact, to be characteristic of distinctions between classes
some derivations of this sort and so do at least some of the at the very bottom end of this range. We hope that this brief
nonterminals in both ψ1 and ψ2 . Self-embedding could be said survey may inspire some research exploring some of the terri-
to set up potentially unbounded dependencies: a grammar tory between SL and the finite-state stringsets.
with the rules S → aSb and S → ab produces derivations like

S aSb aaSbb aaaSbbb aaaabbbb 9 Some criterial contrasts


in which for each j ≥ 1 it could be said that the presence of a We close with a short list of crucial cases of stringsets that
b at a position j symbols to the right of the middle of a string distinguish between classes mentioned above, together with
in the set depends on there being an a at a position j symbols an indication of the cognitive ability to which they intuitively
to the left of the middle, with no upper bound on j, and thus correspond.
no upper bound on the distance between the two.
But the presence of self-embedding in a grammar is not in Strictly Local vs. Locally Testable
itself a sufficient condition to yield a non-finite-state stringset.
Consider a CF grammar containing (at least) these rules: — While (ab)+ is strictly local, a+ (ba+ )+ is locally testable
but not strictly local.
(30) a. S → ABC — Criterial test pair: an SLk description cannot distinguish
b. A→a between all cases of ak bak and a2k+1 , but an LTk descrip-
c. B → bBb tion can. That is, acceptance cannot require the presence
d. B → bB of a b unless it occurs within the first or last k stimuli.
e. B→b — Psychological correlate: ability to recognize that every
f. C→c a is immediately followed by a b versus ability to detect
Such a grammar has the self-embedding property. It not only that at least one b was present somewhere.
embeds phrases within phrases, it embeds phrases labeled B
within larger phrases labeled B, with material both to the left Locally Testable vs. Star-Free
and to the right (notice rule c), and non-trivially so, since all
— While a+ (ba+ )+ is locally testable, a+ ba+ is star-free but
nonterminals yield terminal strings. Certainly this grammar
not locally testable.
can be said to allow for configurations that are “separated by
an arbitrary number of words or phrases”: every well-formed — Criterial test pair: an LTk description cannot distinguish
string must begin with a that is followed by a final c, with in- between all cases of ak ba2k+1 and ak bak bak (it cannot
definitely many bs between (an unbounded dependency). Yet guarantee that there will only one b), but an SF descrip-
(as the reader can determine by a little experimentation with tion can.
bigrams) the stringset generated by this grammar (the con- — Psychological correlate: ability to recognize whether a b
catenation of {a}, {b}+ , and {c}) is not just finite-state but is present versus ability to recognize whether b occurred
in fact SL2 . We do not offer this as an important or surprising just once, or at most k times for some k > 1.
fact; we are merely underlining the point that notions like ‘re-
cursion’, ‘embedding’, and ‘unbounded dependency’ need to Star-Free vs. Finite-State
be much more carefully defined than they have been in some
of the recent literature. — While the set {a, b}∗ of all strings consisting of a and
The condition on CF grammars that guarantees that a b is star-free (despite the star we use for convenience in
stringset will not be finite-state is that if all the CF grammars denoting it here!), the set
for some stringset have the self-embedding property, then the {w | w ∈ {a, b}∗ ∧ #b (w) ∼ = 0 mod 2}
stringset is not FS. This turns out to be a particularly difficult
condition to work with. It is true that often one can prove (which contains all and only the strings consisting of a
of a stringset either that it is FS (and hence does not require and b that have an even number of b occurrences) is finite-
self-embedding) or that it is CF (and hence does require it); state but not star-free.
but there is no algorithm for determining whether arbitrary — Criterial test pair: a star-free description cannot dis-
CF grammars generate FS stringsets or not. That is, whether tinguish between all cases of (ak b)2j+1 ak and (ak b)2j ak ;
a given CF stringset has a non-self-embedding grammar is not a finite-state description (e.g., a finite-state automaton)
algorithmically decidable. can.

11
— Psychological correlate: ability to recognize whether b oc- Hauser, Marc D. 2005. The evolution of the language faculty:
curred at least k times versus ability to recognize whether semantics, syntax, and interfaces. Presented at the Soci-
b occurred a number of times that is divisible by some ety for Language Development’s Annual Society Sympo-
number n (i.e., to count up to some threshold n and then sium on Prerequisites to Language in Animal Cognition,
reset the counter). Boston University, 3 November 2005.
Hauser, Marc D., Noam Chomsky, and W. Tecumseh Fitch.
2002. The faculty of language: What is it, who has it,
Finite-State vs. Context-Free and how did it evolve? Science 298, 5598 (22 November
— While the set {ai bj | i + j ∼ = 0 mod 2} of all strings 2002): 1569–1579.
consisting of an even total of a and b occurrences is finite- Kleene, Stephen C. 1956. Representation of events in in nerve
state, the set {ai bj | i = j} of all strings consisting of an nets and finite automata. In Claude E. Shannon and J.
equal number of a and b occurrences is CF but not FS. McCarthy (eds.), Automata Studies, 3–42. Princeton, NJ:
— Criterial test pair: a FS description cannot distinguish Princeton University Press.
between all cases of ai−k bi+k and ai bi (for all i and for Kornai, Andràs. 1985. Natural languages and the Chom-
all k); but a CF description can. sky hierarchy. Proceedings of the 2nd European Confer-
ence of the Association for Computational Linguistics,
— Psychological correlate: ability to do modulo arithmetic
ed. by Margaret King, 1–7. Facsimile reproduction at
versus ability to match brackets, or to do arbitrary inte-
http://acl.ldc.upenn.edu/E/E85/E85-1001.pdf.
ger addition in unary.
Kozen, Dexter. 1997. Automata and Computability. Berlin:
Springer.
Each of these contrasts indicates a point at which some Libkin, Leonid. 2004. Elements of Finite Model Theory.
relevant distinction between cognitive capacities might be lo- Berlin: Springer.
cated, and thus suggests the topic for a potentially rewarding McNaughton, Robert and Seymour Papert. 1971. Counter-
experiment. Free Automata. Research Monograph No. 65. Cambridge,
MA: MIT Press.
Myhill, John. 1957. Finite automata and the representation
of events. WADD TR-60-165, 112–137. Wright Patterson
References Air Force Base, Dayton, OH.
Nerode, Anil. 1958. Linear automaton transformations. Pro-
Büchi, J. Richard. 1960. Weak second-order arithmetic and ceedings of the American Mathematical Society 9: 541–
finite automata. Zeitschrift für Mathematische Logik und 544.
Grundlagen der Mathematik 6: 66–92. O’Donnell, Timothy J., Marc D. Hauser and W. Tecumseh
Chomsky, Noam. 1959. On certain formal properties of gram- Fitch. 2005. Using mathematical models of language ex-
mars. Information and Control 2: 137–167. perimentally. TRENDS in Cognitive Sciences 9.6: 284–
Chomsky, Noam and Marcel-Paul Schützenberger. 1963. The 289.
algebraic theory of context-free languages. In P. Braffort Perruchet, Pierre and Arnaud Rey. In press. Does the mas-
and D. Hirschberg (eds.), Computer Programming and tery of center-embedded linguistic structures distinguish
Formal Systems, 118–161. Amsterdam: North-Holland. humans from nonhuman primates? To appear in Psycho-
Ebbinghaus, Heinz-Dieter and Jörg Flum. 1999. Finite Model nomic Bulletin and Review.
Theory. Berlin: Springer. Pullum, Geoffrey K. and Gerald Gazdar. 1982. Natural lan-
Fitch, W. Tecumseh and Hauser, Marc D. 2004. Computa- guages and context-free languages. Linguistics and Phi-
tional constraints on syntactic processing in a nonhuman losophy 4: 471–504.
primate. Science 303, 5656 (16 January 2004), 377–380. Schützenberger, M.-P. 1965. On finite monoids having only
Gentner, Timothy Q. 2005. Recursive syntactic pattern learn- trivial subgroups. Information and Control 8: 190–194.
ing in songbirds. Presented at the Society for Language Straubing, Howard. 1994. Finite Automata, Formal Logic,
Development’s Annual Society Symposium on Prerequi- and Circuit Complexity. Boston: Birkhäuser.
sites to Language in Animal Cognition, Boston University, Thomas, Wolfgang. 1982. Classifying regular events in sym-
3 November 2005. bolic logic. Journal of Computer and System Sciences 25:
Garcı́a, Pedro and Enrique Vidal. 1990. Inference of k- 360–376.
testable languages in the strict sense and applications to
syntactic pattern recognition. IEEE Transactions on Pa-
tern Analysis and Machine Intelligence 12 (9): 920–925.
Garcı́a, Pedro and José Ruiz. 2004. Learning k-testable and
k-piecewise testable languages from positive data. Gram-
mars 7: 125–140
Gold, Mark. 1967. Language identification in the limit. In-
formation and Control 10: 447–474.

12

You might also like