Language Acquisition
Fall 2010/Winter 2011
Morphology & Syntax
Afra Alishahi, Heiner Drenhaus
Computational Linguistics and Phonetics
Saarland University
Rules that Govern Form
• Moving from fixed forms (e.g. ‘apple’) to derivational forms
play → plays, played, playing
I, you, admire → “I admire you”
• Morphology and syntax
• In all languages, the formation of words and sentences follows
highly regular patterns
• How are the regulations and exceptions represented?
• The study and analysis of language production in children
reveals common and persistent patterns
2
U-shaped Learning Curves
• Observed U-shaped learning curves in children
• Imitation: an early phase of conservative language use
• Generalization: general regularities are applied to new forms
• Overgeneralization: occasional misapplication of general patterns
• Recovery: over time, overgeneralization errors cease to happen
• Lack of Negative Evidence
• Children do not receive reliable corrective feedback from parents
to help them overcome their mistakes (Marcus, 1993)
3
Case Study: Learning English Past Tense
• The problem of English past tense formation:
• Regular formation: stem + ‘ed’
• Irregulars do show some patterns
• No-change: hit → hit
• Vowel-change: ring → rang, sing → sang
• Over-regularizations are common: goed
• These errors often occur after the child has already produced the
correct irregular form: went
• What causes the U-shaped learning curve?
4
A Symbolic Account of English Past Tense
• Dual-Route Account: two qualitatively different mechanisms
Output past tense
Blocking
List of exceptions Regular route
(Associative memory) (Rule-based)
Input stem
• Prediction:
• Errors result from transition from rote learning to rule-governed
• Recovery occurs after sufficient exposure to irregulars
5
A Connectionist Account of Learning
English Past Tense
• A connectionist model (Plunkett & Marchman, 1993)
Output units: phonological features of past tense
hidden units
Input units: phonological features of the stem
• Properties:
• Early in training, the model shows tendency to overgeneralize; by
the end of training, it exhibits near perfect performance
• U-shaped performance is achieved using a single learning
mechanism, but depends on sudden change in the training size
6
None of these strategies are very good initially. Analogy involves more
as examples. A goalthan one reasoning
to determine the pas
step and is only successful if a suitable example is retrieved. The retrieve strategy needs
examples before it can be successful. The zero rule always succeeds, but does not produce
A Hybrid, Analogy-based Account
a past tense that can be distinguished from the present tense. Before the model can do
PAST-TENSE-GOAL23
anything useful beyond producing a past tense that is identical ISA PAST
to the stem, it has to
perceive some examples in the environment. Note that there OFisWALK
no production rule for
STEMofNIL
the regular rule yet, ACT-R will learn it later on as a specialization the analogy strategy.
• Taatgen & strategies
These initial Anderson (2002):
are similar an rational
to those proposed bymodel SUFFIXof learning
MacWhinney NIL
(1978), who also
past tense
suggested thatbased onrule
the regular theis ACT-R
formed onarchitecture
the basis of analogy.
The goal is of type PAST (indicated by th
slot (WALK itself is also a declarative
• Declarative memory chunks represent past
4.1. A detailed description of the model
tenses, both as a goal
SUFFIX, set to NIL, indicating that they
andmodel
The as examples
uses declarative-memory chunks to represent pasttwo
tense, the tenses, bothslots,
empty as a STEM
goal andand
as examples. A goal to determine the past tense of walk looks like:the chunk is stored in dec
accomplished,
PAST-TENSE-GOAL23 PAST-TENSE-GOAL23
ISA PAST ISA PAST
OF WALK OF WALK
STEM NIL STEM WALK
SUFFIX NIL SUFFIX ED
goal
The goaltois determine accomplished
As has has
of type PAST (indicated by the “ISA PAST”), beenthe goal,
mentioned,
value WALK the models starts
in its OF
past
slot tenseitself
(WALK of walk
is also a declarative chunk), stored
andthehas in
zero theBoth
its rule.
other memory
two retrieval and zero
slots, STEM andrul
SUFFIX, set to NIL, indicating that they have no value yet. In order to produce a past
tense, the two empty slots, STEM and SUFFIX, have to be filled. Once this goal is
accomplished, the chunk is stored in declarative memory, and looks like:
7
A Hybrid, Analogy-based Account
• The analogy strategy is implemented by two production
rules, based on simple pattern matching:
RULE ANALOGY-FILL-SLOT
IF! the goal has an empty suffix slot
AND there is an example in which suffix has a value
THEN! set the suffix of the goal to the suffix value of
the example
RULE ANALOGY-COPY-A-SLOT
IF! the goal has an empty stem slot and the of slot has a
certain value
AND in the example the values of the of and stem slots are
equal
THEN! set the stem to the value of the of slot
8
production memory with production rules. ACT-R is a so-called hybrid architecture, in
the sense that it has both symbolic and sub-symbolic aspects. We will introduce these
components informally. Table 1 provides a formal specification of some critical aspects of
ACT-R Equations
Table 1
ACT-R equations a
Equation Description
Activation
A ¼ B 1 context 1 noise The activation of a chunk has three parts: base-level activation,
spreading activation from the current context and noise. Since
spreading activation is a constant factor in the models discussed,
we treat activation as if it were just base-level activation.
Base-level activation
P
BðtÞ ¼ log nj¼1 ðt 2 tj Þ2d n is the number of times a chunk has been retrieved from
memory, and tj represents the time at which each of these
retrievals took place. So, the longer ago a retrieval was, the less
it contributes to the activation. d is a fixed ACT-R parameter
that represents the decay of base-level activation in declarative
memory.
Retrieval time
Time ¼ Fe2fA Activation determines the time required to retrieve a chunk. A is
the activation of the chunk that has to be retrieved, and F and f
are fixed ACT-R parameters. Retrieval will only succeed as long
as the activation is larger than retrieval threshold t , which is
also a fixed parameter.
Expected outcome
Expected outcome ¼ Pp G 2 Cp 1 noise Expected outcome is based on three quantities, the estimated
probability of success of a production rule (P), the estimated
cost of the production rule (C), and the value of the goal (G).
a
These equations are simplified versions of the original Anderson and Lebiere (1998) equations. 9
A Hybrid, Analogy-based Account
• ACT-R’s production rule mechanism learns new rules by
combining two rules that have fired consecutively into one:
RULE LEARNED-REGULAR-RULE
IF! the goal is to find the past tense of a
word and slots stem and suffix are empty
THEN! set the suffix slot to ED and set the
stem slot to the word of which you want the
past tense
10
A Hybrid, Analogy-based Account
140 N.A. Taatgen, J.R. Anderson / Cognition 86 (2002) 123–155
11
Innateness of Language
• Central claim: humans have innate knowledge of language
• Assumption: all languages have a common structural basis
• Argument from the Poverty of the Stimulus (Chomsky 1965)
• Linguistic experience of children is not sufficiently rich for
learning the grammar of the language, hence they must have
some innate specification of grammar
• Assumption: knowing a language involves knowing a grammar
• Universal Grammar (UG)
• A set of rules which organize language in the human brain
12
Principles & Parameters
• A framework for representing UG
• A finite set of fundamental principles that are common to all
languages
• E.g., “a sentence must have a subject”
• A finite set of parameters that determine syntactic variability
amongst languages
• E.g., a binary parameter that determines whether the subject of
a sentence must be overtly pronounced
• Learning involves identifying the correct grammar
• I.e., setting UG parameters to proper values for the current
language
13
Computational Implementation of P&P
• Formal parameter setting models for a small set of grammars
• Clark 1992, Gibson & Wexler 1994, Niyogi & Berwick 1996, Briscoe 2000
• General approach:
• Analyze current input string and set the parameters accordingly
• Set a parameter when receiving evidence from an example which
exhibits that parameter (trigger)
• Representative models:
• Triggering Learning Algorithm or TLA [Gibson & Wexler, 1994]
• Structural Triggers Learner or STL [Fodor, 1998]
• Variational Learner or VL [Yang, 2002]
14
Computational Implementation of P&P
trigger
...
• TLA: randomly modifies a parameter value if it cannot
parse the input
• STL: learns sub-trees (treelets) as parameter values
• VL: assigns a weight to each parameter, and rewards or
penalizes these weights depending on parsing success
15
Computational Implementation of P&P
trigger What if it is ambiguous?
...
• TLA: chooses one of the possible interpretations of the
ambiguous trigger
• STL: ignores ambiguous triggers and waits for unambiguous
ones
• VL: each interpretation is parsed and the parameter weights
are changed accordingly
16
Computational Challenges of P&P
• Practical limitations:
• Formalizing a UG that covers existing languages is a challenge
• Learning relies on well-formed sentences as input
• P&P framework predicts a huge space of possible grammars
• 20 binary parameters lead to > 1 million grammars
• Search spaces for a grammar contain local maxima
• I.e. learner may converge to an incorrect grammar
• Most of the P&P models are psychologically implausible
• They predict that a child may repeatedly revisit the same
hypothesis or jump randomly around the hypothesis space
17
Usage-based Accounts of Language
Acquisition
• Main claims:
• Children learn language regularities from input alone, without
guidance from innate principles
• Mechanisms of language learning are not domain-specific
• Verb Island Hypothesis (Tomasello, 1992)
• Children build their linguistic knowledge around individual items
rather than adjusting general grammar rules they already possess
• Children use cognitive processes to gradually categorize the
syntactic structure of their item-based constructions
• General-purpose cognitive tools are used for this purpose:
imitation, analogy, structure mapping
18
Distributional Representation as an
Alternative to Grammar
• Knowing a language is not equated with knowing a grammar
• Knowledge of language is developed to perform communicative
tasks of comprehension and production
• Neural networks for language representation and acquisition
• Different levels of linguistic representation are emergent structures
that a network develops in the course of learning
• E.g., Elman (1990, 1991), Allen (1997), Allen & Seidenberg (1999)
19
Case Study: Elman (1990)
• A model of leaning lexical classes and word order
Network is trained to predict
Output units
the next word as output
Hidden units
Input units Context units
input: 2-3 word sentences A copy of the hidden units is kept as context
20
Word Categories
196 ELMAN
TABLE 3
Categories of Lexical Items Used in Sentence Simulation
Category Examples
NOUN-HUM man, woman
NOUN-ANIM cot, mouse
NOUN-INANIM book, rack
NOUN-AGRESS dragon, monster
NOUN-FRAG glass, plate
NOUN-FOOD cookie, break
VERB-INTRAN think, sleep
VERB-TRAN see, chase
VERB-AGPAT move, break
VERB-PERCEPT smell, see
VERB-DESTROY breok, smash
VERB-EAT eat
TABLE 4
Templates far Sentence Generator
21
VERB-PERCEPT smell, see
VERB-DESTROY breok, smash
VERB-EAT eat
Templates for Sentence Generation
TABLE 4
Templates far Sentence Generator
WORD 1 WORD 2 WORD 3
NOUN-HUM VERB-EAT NOUN-FOOD
NOUN-HUM VERB-PERCEPT NOUN-INANIM
NOUN-HUM VERB-DESTROY NOUN-FRAG
NOUN-HUM VERB-INTRAN
NOUN-HUM VERB-TRAN NOUN-HUM
NOUN-HUM VERB-AGPAT NOUN-INANIM
NOUN-HUM VERB-AGPAT
NOUN-ANIM VERB-EAT NOUN-FOOD
NOUN-ANIM VERB-TRAN NOUN-ANIM
NOUN-ANIM VERB-AGPAT NOUN-INANIM
NOUN-ANIM VERB-AGPAT
NOUN-INANIM VERB-AGPAT
NOUN-AGRESS VERB-DESTROY NOUN-FRAG
NOUN-AGRESS VERB-EAT NOUN-HUM
NOUN-AGRESS VERB-EAT NOUN-ANIM
NOUN-AGRESS VERB-EAT NOUN-FOOD
but there were no breaks between successive sentences. A fragment of the
input stream is shown in Column 1 of Table 5, with the English gloss for
each vector in parentheses. The desired output is given in Column 2. 22
Sample Training Sequence
FINDING STRUCTURE IN TIME 197
TABLE 5
Fragment of Training Sequences far Sentence Simulation
Input output
31-bit vectors formed an input sequence. Each word in the sequence was
input, one at a time, in order. The task on each input cycle was to predict 23
Analysis of Hidden Unit Activation
Patterns
EMAN
LO.-ASS
VERBS
DO.OPT -
DQOBLIG
ANIMALS
ANIMATES
HUMAN
NOUNS
‘-Isi BREAKABLES
I I I I I
2.0 1.5 1.0 0.0 -0.5 24
Learning Grammar from Corpora
• Many computational models show the possibility of
learning a grammar from corpus data
• Machine learning techniques induce a grammar that fits data
• Jones, Gobet, & Pine (2000), Clark (2001), Gobet, Freudenthal, & Pine
(2004), Solan, Horn, Ruppin & Edelman (2004)
• Common properties:
• Most of these models are not incremental
• They mostly focus on the acquisition of syntax (usually a CFG),
but not semantics
25
Case Study: MOSAIC (Jones et al., 2000)
• MOSAIC (Model Of Syntax Acquisition In Children; Jones et al 2000)
• Learns from raw text, and produces utterances similar to what children
produce using a discrimination network
Eat the apple root
See
Eatthe
thepear
pear
Eat See
Similarity link
Common Links
{the pie}
{the apple}
{the apple} {the pear} {the ball}
{the pear}
Case Study: MOSAIC (Jones et al., 2000)
• Underlying mechanisms
• Learning: expand the network based on input data
• production: traverse the network and output contents of the nodes
• Generalization
• Generative links allow limited generalization abilities
• Lack of semantic knowledge prevents meaningful generalization
• Generalized sentences are limited to high-frequency terms
• Evaluation
• The model was trained on a subset of CHILDES
• It was used to simulate verb island phenomenon, optional infinitive
in English, subject omission, ...
27