Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views25 pages

[email protected] - 44

Uploaded by

Brayan Romero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views25 pages

[email protected] - 44

Uploaded by

Brayan Romero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

2 Theory Visualizations for Bilingual Models

of Lexical Ambiguity Resolution

Ben Falandays and Michael J. Spivey

Introduction
We often think of bilingualism as “adding” a new and different language
system to one’s existing language system. There might be a tiny bit of
truth to this metaphor when learning a second language (L2) late in life.
However, it surely must be a poor metaphor for bilinguals who learn their
two languages relatively early in life. Rather than modeling bilingualism as
involving the “adding” of a second processor for the L2, perhaps bilingu-
alism itself can be treated as just another set of dimensions in the massive
state space in which a speaker’s linguistic representations are organized,
along with dimensions of situational context, text genre, grammatical
gender, linguistic register, syntax, phonology, semantics, and so on
(e.g., Onnis & Spivey, 2012). When these various aspects of language
are treated not as submodules within the language module but instead as
dimensions in a single state space, then suddenly new insights can be
gained in understanding how language is processed in general and how
bilinguals process lexical ambiguity in particular. In fact, the very concept
of a lexical representation changes dramatically when one switches from
a computer (or dictionary) metaphor of the lexicon to a dynamical system
account of word knowledge (Elman, 2004).
In this chapter, we review some connectionist models of bilingualism
and discuss how they might deal with lexical ambiguity; but, first, we
examine what lexical ambiguity itself “looks like” in the state space of
a language processing system. By treating the representational parameters
of a model as dimensions in a state space, the range of behaviors (and
regions visited) in that volumetric space can be identified more system-
atically. In a state space that combines a variety of linguistic aspects,
a word representation can be seen as extending not only across
a semantic field (e.g., Lehrer, 1974) but indeed across a lexical field that
combine the semantics, phonology, and situational context of how the
word is typically used (e.g., Elman, 2009; see also Lyons, 1963). By
studying the real-time temporal dynamics of lexical ambiguity resolution

17

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
18 Theoretical and Methodological Considerations

in bilinguals (e.g., Altarriba & Gianico, 2003), it may be possible to better


see inside the structure of this state space in which language is
represented.
Traditional approaches to understanding lexical ambiguity resolution
relied heavily on the computer metaphor of the mind, positing a modular
processor for lexical access followed by a subsequent processor for context
effects (Swinney, 1979; Tanenhaus, Leiman, & Seidenberg, 1979).
Experiencing the uncertainty of reading or hearing a word like bug –
which could mean insect or spy device – was likened to activating two
separate dictionary entries, one of which would soon have to be deacti-
vated by the context processor for comprehension to be successful.
Rather than relying on this box-and-arrow computer metaphor, one can
instead treat the word bug as having one simple circumscribed region in
the phonological dimensions (since it is a homophone) but projecting
onto two disparate regions in the semantic dimensions of the massive
state space of language (since it has two rather different meanings). When
those phonological and semantic dimensions are combined to form one
phono-semantic state space, the region dedicated to the word bug is seen
as a single bounded, but very nonconvex, shape. In fact, when only certain
dimensions are shown and compressed just right, the lexical field for bug
would look roughly shaped like a letter V, as in Figure 2.1(a). It is exactly
this nonconvexity of the shape that allows us an insight into what lexical
representations might “look like” in bilinguals. In this dynamical system
account, when hearing or reading the word bug, the human mind visits
portions of this bounded shape, and the other contextual dimensions
(some semantic, discourse, and situational dimensions not depicted
here) help push the state of the system toward one or the other arm of
that V-shape, to gradually achieve a contextually appropriate understand-
ing of the word.
However, not all ambiguous words have meanings that are unrelated to
one another, like bug. Take for example the verb dusted, in Sentence (2.1)
below.
(2.1) The chef dusted the cake with powdered sugar, but then the maid
dusted it clean.
The verb dusted is typically referred to as polysemous, rather than ambig-
uous, because its different meanings/usages are at least somewhat seman-
tically related to one another (Gibbs & Matlock, 2001). Yet instead of
treating ambiguous words and polysemous words as if they were cate-
gorically different phenomena, a dynamical-systems state space descrip-
tion allows one to visualize the graded similarity between the two
phenomena. Figure 2.1(b) shows how the lexical field for dusted would

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 19

Phonological Dimension(s)
“bug”

(device)
(insect)

Semantic Dimension(s)
b
Phonological Dimension(s)

“dusted”

(cleaning) (baking)

Semantic Dimension(s)
c
Phonological Dimension(s)

“stup...”

(am
az
ing
(dumb) )

Semantic Dimension(s)

Figure 2.1 Theory visualizations of lexical fields in linguistic state space:


(a) lexical ambiguity involves a highly nonconvex shape that covers
unrelated regions of semantic space; (b) polysemy involves a relatively
more convex shape that includes interstitial regions of semantic space;
and (c) temporary phonological ambiguity, as with cohorts, often
involves a highly nonconvex shape again, one that heavily depends on
temporal dynamics

be somewhat similar to that for bug but notably different in that the
semantic regions used for its different meanings are spatially contiguous
with one another, allowing for blends across that semantic spectrum.

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
20 Theoretical and Methodological Considerations

In addition to ambiguous words and polysemous words, another form


of lexical ambiguity arises temporarily during the first couple of hundred
milliseconds of hearing a spoken word. For example, halfway through
hearing the word candle, a listener will briefly exhibit partial activation of
a similar-sounding cohort word like candy (e.g., Marslen-Wilson, 1987;
McClelland & Elman, 1986) and will even look at a picture of a candy
before finally looking at the target object, a candle (Allopenna,
Magnuson, & Tanenhaus, 1998; Spivey-Knowlton, 1996). For example,
if one reads Sentence (2.2) out loud, a listener may find that the context
leading up to the first syllable in the final word steers one somewhat in the
direction of expecting it to turn out to be the word stupid instead of
stupendous; and Figure 2.1(c) provides a rough sketch of what that tem-
porally dynamic lexical field might look like in linguistic state space.
(2.2) In his ridiculous costumes, Sacha Baron Cohen looks just totally
stupendous.
We will revisit those temporally dynamic lexical fields later in our dis-
cussion, after we have reviewed some of the literature on how connec-
tionist models of bilingualism might address lexical ambiguity and the
literature on how bilinguals actually process spoken words. For now, we
return to temporally static treatments of linguistic state space.
When one considers the wide range of idiosyncratic linguistic experi-
ences that each language user undergoes, it seems clear that the topology
of any one person’s linguistic state space will be at least subtly different
from everyone else’s. Individual differences account for a substantial
amount of the variance in language learning and processing in both
monolinguals and bilinguals (e.g., Dörnyei, 2005; Grosjean, 1994). For
example, lexical ambiguity resolution has been shown to function rather
differently for people with high vs. low memory spans (Miyake, Just, &
Carpenter, 1994). (However, some of the variance attributed to memory
span might instead be explained by degree of language experience;
MacDonald & Christiansen, 2002.) People with high memory spans are
able to understand the correct meaning of boxer in Sentence (2.3) more
readily than people with low memory spans.
(2.3) Since Ken really liked the boxer, he took a bus to the pet store to
buy the animal.
Someone with a low memory span (or limited language experience with
English) might have a lexical field for the word boxer that, functionally
speaking, spans across only a narrow relatively convex range of semantic
space (Figure 2.2(a)). Thus, when reading the word boxer, that person
might automatically settle into the pugilist meaning of the word and then

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 21

a packaging

pugilism
factory

semantic dimensions
“boxer”

dog breeds

b packaging
pugilism

factory
semantic dimensions

“boxer”

dog breeds

more semantic dimensions

Figure 2.2 Individual differences in lexical fields: (a) a person with low
memory span or limited English experience would have a functionally
narrow lexical field for the word boxer, whereas (b) a person with high
memory span or extensive English experience would have a more
tentacular lexical field for boxer, with tendrils that stretch into a variety
of semantic spaces

encounter some difficulty understanding the rest of Sentence (2.3). By


contrast, someone with a high memory span (or extensive language
experience with English) might have a lexical field for boxer that stretches
out into a variety of regions of linguistic state space (Figure 2.2b).
Therefore, when reading the word boxer, that person might not settle
too deeply into any one tendril of that lexical field; and when the rest of
the sentence finally provides the disambiguating context, they are ready
and able to transition into the contextually appropriate region of semantic
space.
Similar to memory span and language experience, contextual diversity will
also introduce substantial individual differences in the topology of this
linguistic state space. Contextual diversity measures the frequency with
which a word occurs in significantly different contexts. Take, for example,

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
22 Theoretical and Methodological Considerations

Academic
Fiction

“posterior”

contexts/genres

Newspapers Magazines

Fiction
Academic

b
contexts/genres

“piglet”

Newspapers Magazines

contexts/genres

Figure 2.3 Contextual diversity of lexical fields: (a) one region of


linguistic-genre space in which the lexical field for posterior shows itself
to be nondiverse and rather convex; (b) another region of the same space
in which the lexical field for piglet stretches itself nonconvexly into
diverse contexts

the words posterior and piglet. They both have the same overall lexical
frequency: 240 occurrences each in a 560 million word corpus.
Therefore, traditional approaches in psycholinguistics would predict that
these two words should exhibit equal latency in reading and reaction time
tasks (e.g., Forster & Chambers, 1973). However, about 60 percent of the
occurrences of posterior take place in academic texts, while only 10–15 per-
cent of its occurrences are in fiction, magazine, and newspaper contexts
each – and it almost never shows up in spoken contexts. Therefore, con-
textually speaking, posterior is a relatively nondiverse word. In our linguistic
state space framework, this would mean that its lexical field is relatively
simple and convex (Figure 2.3a). As a result, if the language system started
out in a random or neutral location in state space, and was forced to
traverse its way to the region for posterior, it might have a long distance to
travel, thus producing a somewhat long response time. By contrast, the

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 23

word piglet has a much more evenly distributed pattern of occurrences


across these different contexts. Only 40 percent of its occurrences take
place in fiction, about 20 percent each in magazines and newspapers, and
10 percent each in academic and spoken contexts. Therefore, if the lan-
guage system started out in a random or neutral location in state space, and
was forced to travel to the piglet region, it would likely have a relatively short
distance to travel, and thus produce a short response time – even though it
has the same overall lexical frequency as posterior.
That data pattern is exactly what Adelman, Brown, and Quesada
(2006) found when they reanalyzed the data from six word identification
experiments. Contextual diversity predicted fast and slow response times
more robustly than did lexical frequency. Results like this have been
replicated and extended to word learning (Hills et al., 2010; Johns,
Dye, & Jones, 2016) and to eye movement measures of whole sentence
reading (Chen et al., 2017; Plummer, Perea, & Rayner, 2014). Evidently,
after decades of assuming that word frequency was a bedrock foundation
for psycholinguistics, it appears that the language system does not actu-
ally care how many times a lexical representation has been instantiated; it
cares how far in state space it has to travel right now in order to reach that
lexical representation.
Given these complex transformations of state space that are generated
by individual differences in working memory, or language experience, or
contextual diversity, just imagine the transformations that must take
place as a result of being bilingual. Rather than assuming that bilinguals
process lexical ambiguity in some categorically different way than mono-
linguals do, perhaps this graded range of idiosyncratic state space topol-
ogies (in Figures 2.1–2.3) allows one to consider a bilingual’s linguistic
state space as just another variety of these kinds of individual differences –
but an especially interesting one, to be sure. In this framework, almost
every word that a bilingual hears will have a few extra tendrils in its lexical
field, compared to a monolingual, that provide potential branchings-off
into different regions of linguistic state space. Now that we are equipped
with theory visualizations for the kinds of shapes that lexical ambiguity
can take in the state space of the language system, we turn to discussing
connectionist models of lexical ambiguity resolution and of bilingualism.

Parallel Distributed Processing Models of Word


Recognition
The bilingual interactive activation (BIA) and BIA+ models (Dijkstra & van
Heuven, 2002; van Heuven, Dijkstra, & Grainger, 1998) are extensions
of an earlier parallel distributed processing (PDP) model – the interactive

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
24 Theoretical and Methodological Considerations

activation model (IAM) (McClelland & Rumelhart, 1981). Therefore, to


better understand how the BIA models function, it is worthwhile to first
discuss the IAM and related PDP models more generally. To that end,
this section describes the structure and mechanisms of PDP models in the
simplest case of unambiguous word recognition by monolinguals. Then,
the following section describes how PDP models account for lexical
ambiguity resolution. We then go on to describe how the BIA models
account for bilingual-specific phenomena involving homographs, homo-
nyms, cognates, and interlingual cohorts.
The IAM (McClelland & Rumelhart, 1981; Rumelhart & McClelland,
1982) is a multilevel connectionist architecture originally designed to
simulate the word superiority effect, a classic perceptual phenomenon
whereby identification of visually presented letters is faster when the
letters are inside words rather than nonwords (McClelland & Johnston,
1977). The logic underlying the IAM is that recognition of letters begins
first with recognition of basic visual features, followed by activation of
letters containing those features, which in turn activates words containing
those letters. The IAM simulates the word superiority effect by allowing
feedback connections from words to letters, such that recognition of
letters is facilitated when words become active.
Structurally, the IAM includes three layers of interconnected nodes:
a feature layer, a letter layer, and a word layer (see Figure 2.4a). The solid
lines with arrows indicate connections, while the dashed lines with circles
represent inhibitory connections. Current activation of each node is
represented by the thickness of the border around the node. The feature
layer contains nodes that become active in the presence of specific, simple
visual features, analogous to orientation selective cells in the visual sys-
tem. The letter and word layers contain nodes corresponding to all of the
known letters and words, respectively. In addition, letter position within
a string may be encoded as well, such that there is a node for each letter at
each possible position. Individual feature detectors have excitatory con-
nections with every letter in which they are found. Individual letters, in
turn, have excitatory connections with every word in which they are
found. Meanwhile, each word has inhibitory connections with all other
words; and, crucially, there are both excitatory and inhibitory feedback
connections from the word layer to the letter layer, such that word nodes
send excitation to any letter they contain and inhibition to any they do
not. It is these feedback connections that allow the model to reproduce
the word superiority effect.
On presentation of a visual stimulus, the features contained in the
string of letters first become active. The feature nodes then pass their
activation to nodes representing letters at their specified location in

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 25

the string. This creates a set of candidate letters that the system is
“considering,” consisting of all letters containing features present in
the input. The letter nodes then pass activation to words that are
consistent with those letters. Competition among words, via their
mutual inhibitory connections, results in the most highly consistent
word (or words, in the case that there is ambiguity) becoming more
active, while all other words are suppressed. The active word nodes
then pass activation to the letters they contain. In this way, letters that
are presented in the context of a word receive additional activation,
relative to letters in nonwords, which is how the model accounts for
the word superiority effect.
While the original IAM dealt primarily with visual word recognition,
the TRACE model (McClelland & Elman, 1986; and its reimplementa-
tion, jTRACE: Strauss, Harris, & Magnuson, 2007) extended the same
principles to model spoken word recognition. In TRACE, the letter layer
of the original IAM is replaced with a phoneme layer, and the feature layer
now consists of nodes responding gradiently to various acoustic dimen-
sions rather than visual features. Since speech unfolds over time, the input
to TRACE is a sequence of acoustic features, which stands in contrast to
the way that the visual IAM is presented with all visual information
simultaneously. As a result of this sequential presentation, even unambig-
uous speech input is temporarily ambiguous at the word level: Onsets are
consistent with many possible words and, as more of the input is received,
the pool of consistent words is narrowed until finally the offset leaves only
a single candidate.
In this way, TRACE captures the predictions of the Cohort model of
speech processing (Marslen-Wilson, 1987), which held that lexical
access occurs as a sequential search by method of elimination.
Importantly, lexical access in Cohort is all-or-none in that words that
are inconsistent with an onset are eliminated from consideration. As
a result, the Cohort model cannot recover in the case of degraded
information or mispronunciations. In contrast, TRACE is a continuous
mapping model, meaning that activation flows continuously between
layers, such that a given word unit can still receive activation, even if
some part of the input is inconsistent with it. As a result, TRACE
provides a better fit to the behavioral data, which shows, for example,
that listeners partially activate rhyme-cohorts that have a different
onset (e.g., making eye fixations to a speaker when the spoken input
is beaker; Allopenna et al., 1998). It is worth noting, however, that
TRACE does not provide a perfect fit to behavioral data: There is also
evidence that listeners partially activate anadromes – words with the
same phonemes in a different order (e.g., making eye fixations to a sub

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
26 Theoretical and Methodological Considerations

when the input is bus; Toscano, Anderson, & McMurray, 2013).


TRACE encodes information about temporal ordering by including
copies of each phoneme node corresponding to each of the possible
positions in an input stream (which can be similarly implemented with
letter position in the IAM), but the aforementioned results suggest
this might not be perfectly representative of human speech processing.
Still, TRACE has stood the test of time as one of the best general
models of speech processing and is able to capture a wide range of
phenomena related to lexical ambiguity resolution, as we will discuss
in more detail in the next section.
These PDP models provide the foundation for the BIA models. The
earliest form of the BIA model (van Heuven et al., 1998) extended the
orthographic-only IAM (McClelland & Rumelhart, 1981) by the addition
of two lexicons and the aforementioned language nodes. The later BIA+
model (Dijkstra & van Heuven, 2002) made the conceptual addition of
phonological encoding, similar to that of TRACE (McClelland & Elman,
1986). The continuous mapping property of these interactive activation
models means that even unambiguous stimuli result in temporary uncer-
tainty in the network. As a result, these models extend gracefully to
ambiguous inputs, which are dealt with in the same fashion as unambig-
uous ones. In the next section, we examine the behavior of PDP models in
the specific case of lexical ambiguity resolution, and we show that they
again provide a robust fit to behavioral data.

Lexical Ambiguity Resolution in PDP Models


Early behavioral evidence in lexical ambiguity resolution showed that
both meanings of an ambiguous word appear to become active at least
briefly, irrespective of preceding context, suggesting that lexical access
occurs first in a context-free stage of processing, followed by a second
stage of processing that integrates context (Swinney, 1979; Tanenhaus
et al., 1979). Later findings challenged those results. For example,
Tabossi (1988) demonstrated that in sentential contexts that are suffi-
ciently biasing, the contextually appropriate meaning of a homograph is
selectively activated. In the parlance of Figure 2.1(a), one can imagine
both a weak context that places the system in a region of the bug lexical
field that is roughly equidistant from its two semantic endpoints and
a strong context that places the system already deep into one of those
semantic endpoints. Vu, Kellas, and Paul (1998) extended Tabossi’s
findings by showing that multiple sources of contextual bias can be seen
to influence lexical access independently, such that priming of a target
word is influenced by a convergence of biases from multiple cues. These

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 27

and other studies began to swing the balance of evidence in favor of a PDP
type of account, and there is by now a very large body of work demon-
strating continuous bidirectional interaction between subsystems of the
language system (for review, see Spevack et al., 2018).
PDP models are able to capture the general pattern of behavioral data
regarding lexical ambiguity resolution. To understand how, let us con-
sider Kawamoto’s (1993) influential PDP model of lexical ambiguity
resolution, shown in Figure 2.4(b). In contrast to the IAM and TRACE
models discussed in the previous section, Kawamoto’s model makes use
of distributed rather than localist representations. Localist models have
a one-to-one mapping between nodes and represented entities as well as
hard-coded connections between entities. For example, the IAM and
TRACE have a single node for each feature, letter, and word, and the
connections between them are specified by the modeler in advance. In
those models, access of a lexical entry corresponds to activation of the
corresponding lexical node, and hence these models make it simple to
compare the activity of multiple word nodes over time.
Distributed models, on the other hand, encode representations as
a pattern of activity across many neurons that represent various features
or microfeatures. In Kawamoto’s (1993) model, each lexical entry corre-
sponds to a vector of activation values for 216 nodes, which are meant to
capture all features of a word: The first 48 nodes encode visual features
that define the orthography of the word; the next 48 nodes encode
phonetic features in specified positions, corresponding to pronunciation;
the next 24 nodes encode part-of-speech; and the last 96 nodes encode
meaning. While the total pattern across all nodes is unique with respect to
each lexical entry, each individual feature (meaning each possible value
for any of the 216 nodes) is consistent with several lexical entries. As
a result, the representation of each lexical entry is partially overlapping
with several other entries.
Another important difference between distributed and localist models
is that, in the former, the strength of connections between nodes must be
learned by the network, rather than coded by the researcher. Kawamoto’s
model is fully connected, meaning there are bidirectional links between
each of the 216 nodes. While it would, of course, be infeasible to manually
code all connections in a network of this size, this property of distributed
networks is actually a feature and not a bug: These models are intended to
capture developmental phenomena by teaching a lexicon to the network.
The network begins with connection strengths of 0 between all nodes.
During training, lexical entries (vectors of 216 activation values) are
presented to the network, which spreads activation according to its con-
nection strengths, eventually settling into a stable activity pattern.

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
28 Theoretical and Methodological Considerations

Initially, this stable pattern will not match the target pattern correspond-
ing to the lexical entry, so an error correction algorithm is used to modify
the connection strengths after each training trial, bringing the output
closer to the target pattern. After training, features that co-occur in
a word develop stronger connections, such that when some subset of
a word’s features are presented to the network (e.g., only orthography
or pronunciation), the full pattern of activity for that lexical entry may
emerge in the network. A lexical entry has been accessed by the network
when the full pattern of activity settles into a stable state that matches
some lexical entry.
The behavior of the network can be best understood as operating in
a high-dimensional state space, where each node serves as a dimension,
and the activation of all nodes is a set of coordinates that describes the
location of the system in the state space at that point in time. When the
network is presented with an ambiguous word, its location in the state
space (i.e., its activation pattern) moves in a direction that is somewhat
toward both regions that belong to the two meanings of that word.
Gradually, as context and other factors bias the system’s interpretation
of this ambiguous word, the trajectory will curve toward the region in
state space that corresponds to the contextually appropriate meaning.
This nonlinear trajectory of the system, as it moves through state space,
can be mathematically described as following along the contours of an
energy landscape that is imposed on the volume of the state space by
external inputs, context, and its neural connectivity pattern of excitatory
and inhibitory synapses. This energy landscape describes how certain
regions of state space have a strong attracting force and other regions
may have a weak attracting force. Interspersed among these basins of
attraction in the state space are other regions that repel the system away
from them (peaks in the energy landscape). The simplified sketches of
basins of attraction in Figures 2.1–2.3 have associated with them energy
landscapes that make some portions of them more strongly attracting and
other portions less so. For instance, Figure 2.4(c) shows an example of an
energy landscape where the state space of the system would correspond to
the two-dimensional floor of that three-dimensional space, and the height
dimension corresponds to the potential energy of the system. Much like
a marble would roll with gravity and momentum, the state of the system
(indicated as a black circle on the manifold surface of Figure 2.4(c) rolls
down the energy landscape’s nonlinear slopes and settles into an attractor
basin (which corresponds to a location in space that belongs to a word’s
meaning).
In Kawamoto’s (1993) simulations, unambiguous words were recog-
nized (and settled in their energy landscapes) more quickly than biased

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 29

a b

Phonology Meaning
Word
ROOM BOOM
Layer Spelling Pt of Spch

Letter R B K F
Layer

Feature | \ [ ] –
/
Layer

Letter + – + + – – + – – – + + – + – –
Input R.
Input Vector

c d L1 L2
Node Node

Semantic
L1 L2 (or other, e.g.
Words Words sensorimotor)
Lexicon layer

Orthography Phonology
Subordinate
Layer Layer
Meaning

Dominant Visual Feature Acoustic Feature


Meaning Layer Layer

Visual Word-Form Input Speech Input

Figure 2.4 (a) McClelland and Rumelhart’s (1981) interactive activation


model processing the letter R; (b) Kawamoto’s (1993) PDP model of
lexical ambiguity resolution with a sample of all connections shown; (c)
an example energy landscape that determines the trajectory of a system
as it traverses its state space; and (d) Dijkstra and van Heuven’s (2002)
BIA+ model

ambiguous words (words having one sense that is more common or


dominant than the other). This makes sense because an unambiguous
word will have only one attractor basin, and a biased ambiguous word
(Figure 2.4c) will have two attractor basins, resulting in some competi-
tion or vacillation between those two regions in state space. Equi-biased
ambiguous words, however, were recognized even more slowly than the
biased ambiguous words, because, while those biased ambiguous words
have two attractor basins, one of them is much steeper/stronger than the

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
30 Theoretical and Methodological Considerations

other. By contrast, the equi-biased ambiguous words have two attractor


basins that are nearly equal in strength, so the system takes longer to
finally settle into one of them. Importantly, Kawamoto found that sen-
tence context has a differential effect on biased and equi-biased ambig-
uous words. With equi-biased ambiguous words, context was highly
effective at tipping the balance and causing the system to settle into the
contextually appropriate attractor basin. However, with biased ambigu-
ous words, only a very strongly biasing context would be capable of
pushing the system toward the less common (or subordinate) meaning
of that word.
The work reviewed here illustrates the power of PDP models for
explaining lexical ambiguity resolution. Through the imagery of a high-
dimensional state space, with an energy landscape determining its
dynamics, it becomes clear how these models can capture both delayed
effects of context (Swinney, 1979; Tanenhaus et al., 1979) and early
effects of context (Tabossi, 1988; Vu et al., 1998).

Bilingual Interactive Activation


Although early theories of bilingual language processing proposed that
bilinguals could selectively activate one of their languages and deactivate
the other (Macnamara & Kushnir, 1971), the behavioral data now over-
whelmingly support a parallel interactive account, with both orthographic
or phonological input simultaneously activating representations in both
languages. For example, eye-tracking studies have shown that hearing
spoken words in one language can lead to eye fixations of a distractor
object whose name is phonologically similar in the task-irrelevant lan-
guage (Marian & Spivey, 2003a; Spivey & Marian, 1999). When
instructed to pick up the marker, Russian-English bilinguals frequently
look first at a stamp (called marka in Russian) before finally fixating the
marker (Spivey & Marian, 1999). Importantly, the magnitude of this
interlingual cohort effect is dependent on several factors, including language
experience (Weber & Cutler, 2004), phonetic featural similarity (Ju &
Luce, 2004), and recent use (Marian & Spivey, 2003b). Similar results
have been obtained for written input (De Groot, Delmaar, & Lupker,
2000; Dijkstra, Grainger, and van Heuven, 1999), with activation of
words in the irrelevant language being possible even when there is ortho-
graphic but no phonological overlap (Marian & Kaushanskaya, 2004) or
vice versa (Kaushanskaya & Marian, 2004). Furthermore, cross-
linguistic interference has been found to be dependent on the number
of orthographic neighbors of the target word in the nontarget language
(van Heuven, Dijkstra, & Grainger, 1998). Taken together, the

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 31

experimental evidence indicates that, for bilingual speakers, both ortho-


graphy and phonology can activate consistent words in both languages,
orthography activates phonology and vice versa, and there are important
roles for language history and stimulus characteristics (van Hell &
Tanner, 2012). As such, these results are consistent with a PDP account
of bilingual lexical processing, where multiple parameters are brought
together as dimensions in a high-dimensional state space (Onnis &
Spivey, 2012).
The BIA (van Heuven et al., 1998) and BIA+ (Dijkstra & van Heuven,
2002) were built on top of the original IAM (McClelland & Rumelhart,
1981) to deal with the case of bilingual language processing, in which
there are words from two or more languages that may overlap in features.
The BIA model (Figure 2.4d), like the IAM, includes layers with localist
nodes encoding features, letters, and words, respectively (although
a distributed-coding version of this model has been proposed; French,
1998; Jacquet & French, 2002). These layers work identically to that of
the IAM: feature nodes activate letters (in a specified position within the
word), nodes for letters in each position activate words with which they
are consistent, and word nodes have feedback connections with letter
nodes and inhibitory connection with all other word nodes. The BIA+
added additional layers for phonology and semantics (for simplicity,
hereafter our discussion will focus on this version of the model). The
lexicon, in the case of the BIA+, now includes words from two languages
instead of one. Importantly, this architecture models bilinguals as having
a unified lexicon: Letters activate words in both languages indiscrimi-
nately and words across languages retain inhibitory connections.
The most important difference between the IAM and BIA+ lies in the
addition of a top-most language layer. This layer includes two nodes – one
for each language – that have bidirectional excitatory connections with all
words in that language and inhibitory connections with all words in the
other language. This layer models the concept of a language mode, as
suggested by Grosjean (2001), whereby recent exposure to one language
will prime that language, resulting in processing costs when switching
languages (Altarriba et al., 1996; Meuter & Allport, 1999).
While lexical ambiguity in the monolingual case refers to intralingual
homonyms, homographs, and homophones, bilingual models need to
account for the addition of interlingual ambiguities as well. Cross-
language lexical ambiguities are functionally represented in the BIA
models by the inclusion of two separate word nodes, one in each lan-
guage, which differ in some of their connections to the orthography,
phonology, and semantic nodes, and exclusively activate their respective
language nodes. For the present purposes, interlingual ambiguities can be

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
32 Theoretical and Methodological Considerations

divided into three classes. First, cognates are pairs of words that have the
same spelling and meaning in two languages. For example, actor has the
same meaning in English and Spanish but slightly different phonology. In
the model, the two word nodes corresponding to a pair of cognates will
have the same connections to the orthography and semantic nodes and
some of the same connections to the phonology layer (depending on the
degree of phonological similarity across the two languages). Orthographic
input to the model will activate both words equally, which will then
mutually inhibit each other via the inhibitory connections between all
words. Hence, this type of ambiguity cannot be resolved without help
from the language nodes. Prior unambiguous input to the model in one of
the two languages will selectively activate the corresponding language
node, which then acts to inhibit all word nodes in the opposing language.
This alters the starting activation levels of the word nodes, allowing the
node in the relevant language to more strongly inhibit its counterpart and
win the competition.
Next, false cognates, or interlingual homographs, are pairs of words
with the same spelling but different meanings in each language. For
example, main is a synonym for primary in English but in French means
hand (with a fairly different pronunciation). These would be represented
in the BIA models as word nodes in each language that are identical in
their connections to the orthography layer, partly different in their con-
nections to the phonology later, and completely different in their connec-
tions to the semantic layer. Ambiguity resolution in this case could occur
again by priming of the language nodes or instead through contextual
bias. If, for example, sentential context activates semantic nodes corre-
sponding to one of the two resolutions of the ambiguity, this alters the
initial state of the system to be closer to one option. A sufficiently biasing
sentential context, even in the nontarget language, could override the
influence of the language nodes, leading the system to correctly recognize
a code-switched word that does not match the language of the sentential
context. This is consistent with experimental evidence showing that the
processing cost of switching languages is dependent on contextual bias
(Li, 1996; Moreno, Federmeier, & Kutas, 2002). Furthermore, results
have shown that code-switching is easier when the phonology of the code-
switched word is different from that of the context language (Grosjean,
1995; Li, 1996). In the BIA models, this would be accounted for by the
fact that code-switched words with minimal phonological overlap with
the context language will activate fewer competitors in the context lan-
guage, leading to faster resolution.
Finally, partial cognates, or interlingual cohorts, are pairs of words
across languages in which there is partial overlap in spelling or phonology.

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 33

sharp

shark
phonological and semantic dimensions

k”
h ar
“s

sharp
shark


a rk
“sh
sharik

phono and semantic dimensions

Figure 2.5 (a) For a monolingual, the linguistic input shark has
orthographic and phonological similarity to both shark or sharp, and
a few other words; (b) For a bilingual, that same input has similarity
with even more lexical representations, thus producing an extremely
nonconvex lexical field, and an even more nonlinear trajectory

For example, the English word shark is an interlingual cohort of sharik


(the Russian word for balloon). Because the bottom-up connections in the
BIA models are not language-selective, any input will send activation to
orthographic or phonological neighbors in both languages, and the degree
of competition in the network will be dependent on the number of
neighbors. Figure 2.5 uses the lexical-fields framework from Figures
2.1–2.3 to depict the regions of state space that can be visited while the
word shark is being presented to a monolingual English speaker (Figure
2.5a) or to a Russian-English bilingual (Figure 2.5b). For a monolingual,
the lexical field (or energy landscape, for that word stretches into a few
different regions of semantic state space) and the trajectory (or activation

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
34 Theoretical and Methodological Considerations

pattern over time) will be somewhat nonlinear as it curves slightly toward


the wrong word. By contrast, a bilingual’s lexical field stretches out into
several more regions of state space, resulting in an exceptionally curved
trajectory. While being presented with shark, the patterns of activation in
BIA+ would mimic this kind of state-space trajectory as it moves some-
what close to an interlingual cohort competitor before finally settling into
the correct pattern of activation.
However, feedback from the language nodes in the BIA+ model can
lead to asymmetric competition, whereby intralingual competitors in the
primed language will exert more influence than the interlingual competi-
tors from the other language. As an example from human data, when
Marian and Spivey (2003a) placed Russian-English bilinguals into
a relatively monolingual Russian language mode (with a consent form in
Russian, the experimenter speaking only native Russian, and Russian
music in the background), those participants exhibited substantial lexical
competition from intralingual competitors in Russian but not as much
from interlingual competitors in English. Ultimately, however, resolution
in the case of partial cognates will reliably be accomplished in the BIA+
model purely through bottom-up information (without the need for con-
text), since words in either language with partially inconsistent orthogra-
phy or phonology will receive less activation than the target word.
As revealed by Kawamoto’s (1993) model, contextual priming in
the BIA+ model can also influence the initial state of the system and
thus bias it toward one meaning of an ambiguous word. For example,
Schwartz and Kroll (2006) found that, when sentence context was
weak, cognates were processed faster by bilingual participants than
words in only one language, indicating that lexical representations
from both languages affected processing. However, when sentence
context was strong, this effect disappeared, suggesting that compre-
hension was guided selectively to the meaning in only one of the
languages (see also Libben & Titone, 2009).
Because work using the BIA models has not specifically focused on
simulating lexical ambiguity resolution tasks, it is important to note that
the account we have given here is somewhat speculative in nature.
However, with a general understanding of PDP principles and the struc-
ture of the BIA models, we expect that this account will by now be
intuitively clear. Since the BIA models allow parallel, bottom-up activa-
tion of words in both languages, interlingual ambiguities are really not
that different from intralingual ambiguities with monolinguals. How
quickly the system can resolve these ambiguities, and which resolution
ultimately wins, is dependent on the starting state of the system – via
priming of language or semantic nodes – and the overall energy

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 35

landscape, which determines the degree of attraction toward various


outcomes.

Discussion
Obviously, bilingualism does not involve having a new and separate
lexicon module inserted into a person’s cortex. Learning an L2, early or
late in life, involves rewiring the existing connectivity of multiple language
areas of the brain. This network of networks has some portions of it that
are mostly specialized for one or the other language (Kim et al., 1997),
but it also has many portions that are used by both languages (Marian,
Spivey, & Hirsch, 2003). The BIA and BIA+ models of bilingual lan-
guage processing have pursued that general kind of architecture and
produced results that correspond well with human data (Dijkstra & van
Heuven, 2002; van Heuven et al., 1998). As a result of that type of cortical
connectivity in a bilingual, reading or hearing a word from one language
can inadvertently partially activate a lexical representation in the other
language. It turns out that this process in bilinguals is not that different
from related processes in monolinguals. When monolinguals read or hear
a word in their language, they also exhibit inadvertent partial activation of
other related lexical representations.
Rather than thinking of these lexical representations as line entries in
a mental dictionary, some of which get partially activated, we have chosen
a different framework here for understanding how ambiguity (temporary
or otherwise) causes the language system to vacillate between multiple
possible interpretations. We have chosen a state-space framework,
wherein lexical representations exist as attractor basins, some with
a strong or weak pull, some with partial overlap with one another, and
some with tendrils that stretch out to semantically disparate regions of
state space. Those tentacular lexical attractor basins, whose tendrils reach
out in many directions in state space, may be unusually prevalent in
bilinguals, compared to monolinguals.
While the dictionary framework is clearly a metaphor, intended to help
one imagine how words might be organized in the language system, the
state-space framework need not be conceived as a metaphor (Onnis &
Spivey, 2012). When one takes a neural network, such as a brain or
connectionist model, and treats each node’s activation as a coordinate
in a state space, this serves as a mathematical description of the state of the
actual system (Elman, 2004, Spivey, 2008) – not a metaphor. Scientific
metaphors always break down at some point and can provide misleading
insights (Hoffman, 1980). In the case of a simulated neural network
processing two languages, as its state changes from timestep to timestep,

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
36 Theoretical and Methodological Considerations

one can access all the data necessary to provide an accurate state-space
description of this system – perhaps performing a dimensionality reduc-
tion down to two or three dimensions for purposes of data visualization
(Elman, 1991). In the case of an actual brain-and-body processing two
languages, however, it is of course not possible to measure every node in
the network. Nonetheless, we can measure quite a bit; and, when the right
behavioral measures are chosen carefully and sampled as continuously as
possible (e.g., Louwerse et al., 2012), those behaviors can be seen as
performing something similar to the dimensionality reduction performed
on the simulated neural network, thus allowing us to witness a low-
dimensional record of the high-dimensional mental trajectory (Spivey &
Dale, 2006, p. 209). Importantly, even with quantitatively abstracted
data, from recorded behaviors that result from hidden neural processes,
we are still not using a metaphor when we plot those data into a state space
for data visualization. The neural dimensions have been reduced by the
motor system in poorly understood ways, but there is no figurative ana-
logy being used to liken linguistic processes to something else, such as
a book with lexical entries listed in alphabetical order.
In this chapter, we have provided a series of theory visualizations as
proxies for those data visualizations. Armed with state-space trajec-
tories of connectionist networks addressing lexical ambiguity resolu-
tion in monolingual conditions and in bilingual conditions, one can
see that the attractor basins corresponding to word representations
come in a wide variety of shapes and sizes. Bilingualism may not
instigate a qualitatively different format of processing but instead
may just introduce a quantitative change in the distribution of those
different shapes and sizes. Compared to monolinguals, bilinguals may
experience a little more phonological (and in some cases ortho-
graphic) overlap in their lexical fields, which may result in a little
more lexical competition on a regular basis. Perhaps it is this incessant
practice with increased lexical competition that trains a bilingual’s
brain to have greater cognitive control (e.g., Kroll & Bialystok,
2013; Spivey & Cardon, 2015). If one must have a metaphor,
then – far from being a dictionary – the mental lexicon is perhaps
more like a high-dimensional golf course with sandpits, greens, and
fairways all interlacing among one another; and a bilingual’s golf
course is especially tangled.

Keywords
Ambiguous words, Bilingual interactive activation (BIA) model,
Bilingual interactive activation Plus (BIA+) model, Bilingual lexical

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 37

processing, Code-switching, Cognates, Cohort model, Connectionist


models, Contextual diversity, Continuous mapping model, Cross-
linguistic interference, Distributed networks, Distributed representation,
Dynamical-systems state space, False cognates, Feature layer,
Homographs, Homonyms, Homophone, Individual differences,
Inhibitory connections, Interlingual ambiguities, Interlingual cohort
effect, Interactive activations model (IAM), jTRACE, Language experi-
ence, Language mode, Language module, Letter layer, Lexical access,
Lexical ambiguity, Lexical entries, Lexical-fields framework, Localist
models, Microfeatures, Parallel distributed processing (PDP), Partial
cognates, Phoneme layer, Phonetic featural similarity, Phono-semantic
state space, Phonologically similar, Polysemous, Rhyme-cohorts,
Semantic field, Semantics, Semantics, Sentence context, Sequential
search, Situation context, Theory visualizations of bilingual lexical ambi-
guity, Theory Visualizations of Lexical Fields, TRACE, Word superiority
effect

Thought Questions
1. What are the pros and cons of localist versus distributed connec-
tionist models of bilingualism? Is one more appropriate than the
other?
2. Age of acquisition is not modeled in the BIA or BIA+ but is known to
have important effects. How might age of acquisition be integrated with
these models?
3. Views of embodied cognition (e.g., Barsalou, 2008) suggest that
action, planning, and sensorimotor representations may also play roles
in language processing. How might these or other cues influence bilingual
processing?

Internet Sites
Connectionism: www.ucs.louisiana.edu/~isb9112/dept/phil341/wisconn
.html
Connectionism as an Approach: www.iep.utm.edu/connect/
Bilingual Interactive Activation Plus: www.wikivisually.com/
wiki/Bilingual_interactive_activation_plus
Interactive Activation Models: www.psychology.nottingham.ac.uk/staff/
wvh/jiam/
What is Connectionism: www.mind.ilstu.edu/curriculum/connectionism_
intro/connectionism_1.php

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
38 Theoretical and Methodological Considerations

Further Reading
Dale, R., Fusaroli, R., Duran, N. D., & Richardson, D. C. (2013). The
self-organization of human interaction. In Psychology of Learning and
Motivation, 59, 43–95.

References
Adelman, J. S., Brown, G. D., & Quesada, J. F. (2006). Contextual diversity, not
word frequency, determines word-naming and lexical decision times.
Psychological Science, 17(9), 814–823.
Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the
time course of spoken word recognition using eye movements: Evidence for
continuous mapping models. Journal of memory and language, 38(4), 419–439.
Altarriba, J., & Gianico, J. L. (2003). Lexical ambiguity resolution across
languages: A theorical and empirical review. Experimental Psychology, 50(3),
159–170.
Altarriba, J., Kroll, J. F., Sholl, A., & Rayner, K. (1996). The influence of lexical
and conceptual constraints on reading mixed-language sentences: Evidence
from eye fixations and naming times. Memory and Cognition, 24(4), 477–492.
Barsalou, L. W. (2008). Grounded Cognition. Annual Review of Psychology, 59,
617–645.
Chen, Q., Huang, X., Bai, L., Xu, X., Yang, Y., & Tanenhaus, M. K. (2017). The
effect of contextual diversity on eye movements in Chinese sentence reading.
Psychonomic Bulletin and Review, 24(2), 510–518.
De Groot, A. M., Delmaar, P., & Lupker, S. J. (2000). The processing of
interlexical homographs in translation recognition and lexical decision:
Support for non-selective access to bilingual memory. The Quarterly Journal of
Experimental Psychology, 53A(2), 397–428.
Dijkstra, T., Grainger, J., & van Heuven, W. J. (1999). Recognition of cognates
and interlingual homographs: The neglected role of phonology. Journal of
Memory and language, 41(4), 496–518.
Dijkstra, T., & van Heuven, W. J. (2002). The architecture of the bilingual word
recognition system: From identification to decision. Bilingualism: Language and
Cognition, 5(3), 175–197.
Dörnyei, Z. (2005). The psychology of the language learner: Individual differences
in second language acquisition. Routledge.
Elman, J. L. (1991). Distributed representations, simple recurrent networks, and
grammatical structure. Machine Learning, 7(2–3), 195–225.
Elman, J. L. (2004). An alternative view of the mental lexicon. Trends in Cognitive
Sciences, 8(7), 301–306.
Elman, J. L. (2009). On the meaning of words and dinosaur bones: Lexical
knowledge without a lexicon. Cognitive Science, 33(4), 547–582.
Forster, K. I., & Chambers, S. M. (1973). Lexical access and naming time.
Journal of Memory and Language, 12(6), 627–635.
French, R. M. (1998). A simple recurrent network model of bilingual memory. In
M. A. Gernsbacher & S. J. Derry (Eds.), Proceedings of the 20th Annual Cognitive

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 39

Science Society Conference (pp. 368–373). Hillsdale, NJ: Lawrence Erlbaum


Associates.
Gibbs, R., & Matlock, T. (2001). Psycholinguistic perspectives on polysemy. In
H. Cuyckens & B. Zawada (Eds.), Polysemy in cognitive linguistics. (pp.
213–239). Amsterdam: John Benjamins.
Grosjean, F. (1994). Individual bilingualism. In The encyclopedia of language and
linguistics (pp. 1656–1660). Oxford: Pergamon Press.
Grosjean, F. (1995). A psycholinguistic approach to code-switching: The
recognition of guest words by bilinguals. In L. Milroy & P. Muysken (Eds.),
One speaker, two languages (pp. 259–275). Cambridge: Cambridge University
Press.
Grosjean, F. (2001). The bilingual’s language modes. In J. Nicol (Ed.), One mind,
two languages: Bilingual language processing (pp. 1–22). Oxford: Blackwell.
Hills, T. T., Maouene, J., Riordan, B., & Smith, L. (2010). The associative
structure of language: Contextual diversity in early word learning. Journal of
Memory and Language, 63(3), 259–273.
Hoffman, R. R. (1980). Metaphor in science. In R. P. Honeck & R. R. Hoffman
(Eds.), The psycholinguistics of figurative language. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Jacquet, M., & French, R. M. (2002). The BIA++: Extending the BIA+ to
a dynamical distributed connectionist framework. Bilingualism: Language and
Cognition, 5(3), 202–205.
Johns, B. T., Dye, M., & Jones, M. N. (2016). The influence of contextual
diversity on word learning. Psychonomic Bulletin and Review, 23(4),
1214–1220.
Ju, M., & Luce, P. A. (2004). Falling on sensitive ears: Constraints on bilingual
lexical activation. Psychological Science, 15(5), 314–318.
Kaushanskaya, M., & Marian, V. (2004). Activation of non-target language
phonology during bilingual visual word recognition: Evidence from eye-
tracking. In K. Forbus, D. Gentner, & T. Regier (Eds.), Proceedings of the 26th
Annual Meeting of the Cognitive Science Society (pp. 654–659). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Kawamoto, A. H. (1993). Nonlinear dynamics in the resolution of lexical
ambiguity: a distributed processing account. Journal of Memory and Language,
32, 474–516.
Kim, K. H., Relkin, N. R., Lee, K. M., & Hirsch, J. (1997). Distinct cortical areas
associated with native and second languages. Nature, 388(6638), 171–174.
Kroll, J. F., & Bialystok, E. (2013). Understanding the consequences of
bilingualism for language processing and cognition. Journal of Cognitive
Psychology, 25(5), 497–514.
Lehrer, A. (1974). Semantic fields and lexical structure, Amsterdam: John
Benjamins.
Li, P. (1996). Spoken word recognition of code-switched words by Chinese–
English bilinguals. Journal of Memory and Language, 35(6), 757–774.
Libben, M. R., & Titone, D. A. (2009). Bilingual lexical access in context:
evidence from eye movements during reading. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 35(2), 381–390.

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
40 Theoretical and Methodological Considerations

Louwerse, M. M., Dale, R., Bard, E. G., & Jeuniaux, P. (2012). Behavior
matching in multimodal communication is synchronized. Cognitive Science, 36
(8), 1404–1426.
Lyons, J. (1963). Structural semantics. Oxford: Blackwell.
MacDonald, M. C., & Christiansen, M. H. (2002). Reassessing working
memory: Comment on Just and Carpenter(1992) and Waters and Caplan
(1996). Psychological Review, 109(1), 35–54.
Macnamara, J., & Kushnir, S. L. (1971). Linguistic independence of bilinguals:
The input switch. Journal of Memory and Language, 10(5), 480.
Marian, V., & Kaushanskaya, M. (2004). Self-construal and emotion in bicultural
bilinguals. Journal of Memory and Language, 51(2), 190–201.
Marian, V., & Spivey, M. (2003a). Bilingual and monolingual processing of
competing lexical items. Applied Psycholinguistics, 24(2), 173–193.
Marian, V., & Spivey, M. (2003b). Competing activation in bilingual language
processing: Within-and between-language competition. Bilingualism: Language
and Cognition, 6(2), 97–115.
Marian, V., Spivey, M., & Hirsch, J. (2003). Shared and separate systems in
bilingual language processing: Converging evidence from eyetracking and brain
imaging. Brain and Language, 86(1), 70–82.
Marslen-Wilson, W. D. (1987). Functional parallelism in spoken
word-recognition. Cognition, 25(1–2), 71–102.
McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech
perception. Cognitive Psychology, 18(1), 1–86.
McClelland, J. L., & Johnston, J. C. (1977). The role of familiar units in
perception of words and nonwords. Perception and Psychophysics, 22(3),
249–261.
McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of
context effects in letter perception: I. An account of basic findings. Psychological
Review, 88(5), 375–407.
Meuter, R. F., & Allport, A. (1999). Bilingual language switching in naming:
Asymmetrical costs of language selection. Journal of memory and language, 40
(1), 25–40.
Miyake, A., Just, M. A., & Carpenter, P. A. (1994). Working memory constraints
on the resolution of lexical ambiguity: Maintaining multiple interpretations in
neutral contexts. Journal of Memory and Language, 33(2), 175–202.
Moreno, E. M., Federmeier, K. D., & Kutas, M. (2002). Switching languages,
switching palabras (words): An electrophysiological study of code switching.
Brain and Language, 80(2), 188–207.
Onnis, L., Spivey, M. J. (2012). Toward a new scientific visualization for the
language sciences. Information, 3, 124–150.
Plummer, P., Perea, M., & Rayner, K. (2014). The influence of contextual
diversity on eye movements in reading. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 40(1), 275–283.
Rumelhart, D. E., & McClelland, J. L. (1982). An interactive activation
model of context effects in letter perception: II. The contextual
enhancement effect and some tests and extensions of the model.
Psychological Review, 89(1), 60–94.

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 41

Schwartz, A. I., & Kroll, J. F. (2006). Bilingual lexical activation in sentence


context. Journal of Memory and Language, 55(2), 197–212.
Spevack, S. C., Falandays, J. B., Batzloff, B., & Spivey, M. J. (2018). Interactivity
of language. Language and Linguistics Compass, 12(7), e12282.
Spivey, M. J. (2008). The continuity of mind. New York: Oxford University Press.
Spivey, M. J., & Cardon, C. D. (2015). Methods for studying adult bilingualism.
In J. Schwieter (Ed.), The Cambridge handbook of bilingual language processing.
(pp. 108–132). New York: Cambridge University Press.
Spivey, M. J., & Dale, R. (2006). Continuous dynamics in real-time cognition.
Current Directions in Psychological Science, 15(5), 207–211.
Spivey, M. J., & Marian, V. (1999). Cross talk between native and second
languages: Partial activation of an irrelevant lexicon. Psychological Science, 10
(3), 281–284.
Spivey-Knowlton, M. J. (1996). Integration of visual and linguistic information:
Human data and model simulations. Unpublished doctoral dissertation,
University of Rochester.
Strauss, J., Harris, H. D., & Magnuson, J. S. (2007). jTRACE:
A reimplementation and extension of the TRACE model of speech perception
and spoken word recognition. Behavior Research Methods, 39(1), 19–30.
Swinney, D. A. (1979). Lexical access during sentence comprehension: (Re)
consideration of context effects. Journal of Verbal Learning and Verbal Behavior,
18(6), 645–659.
Tabossi, P. (1988). Accessing lexical ambiguity in different types of sentential
contexts. Journal of Memory and Language, 27, 324–340.
Tanenhaus, M. K., Leiman, J. M., & Seidenberg, M. S. (1979). Evidence for
multiple stages in the processing of ambiguous words in syntactic contexts.
Journal of Verbal Learning and Verbal Behavior, 18(4), 427–440.
Toscano, J. C., Anderson, N. D., & McMurray, B. (2013). Reconsidering the role
of temporal order in spoken word recognition. Psychonomic Bulletin and Review,
20(5), 981–987.
van Hell, J. G., & Tanner, D. (2012). Second language proficiency and cross-
language lexical activation. Language Learning, 62, 148–171.
van Heuven, W. J., Dijkstra, T., & Grainger, J. (1998). Orthographic
neighborhood effects in bilingual word recognition. Journal of Memory and
Language, 39(3), 458–483.
Vu, H., Kellas, G., & Paul, S. T. (1998). Sources of sentence constraint on lexical
ambiguity resolution. Memory and Cognition, 26(5), 979–1001.
Weber, A., & Cutler, A. (2004). Lexical competition in non-native spoken-word
recognition. Journal of Memory and Language, 50(1), 1–25.

Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003

You might also like