[email protected] - 44
[email protected] - 44
Introduction
We often think of bilingualism as “adding” a new and different language
system to one’s existing language system. There might be a tiny bit of
truth to this metaphor when learning a second language (L2) late in life.
However, it surely must be a poor metaphor for bilinguals who learn their
two languages relatively early in life. Rather than modeling bilingualism as
involving the “adding” of a second processor for the L2, perhaps bilingu-
alism itself can be treated as just another set of dimensions in the massive
state space in which a speaker’s linguistic representations are organized,
along with dimensions of situational context, text genre, grammatical
gender, linguistic register, syntax, phonology, semantics, and so on
(e.g., Onnis & Spivey, 2012). When these various aspects of language
are treated not as submodules within the language module but instead as
dimensions in a single state space, then suddenly new insights can be
gained in understanding how language is processed in general and how
bilinguals process lexical ambiguity in particular. In fact, the very concept
of a lexical representation changes dramatically when one switches from
a computer (or dictionary) metaphor of the lexicon to a dynamical system
account of word knowledge (Elman, 2004).
In this chapter, we review some connectionist models of bilingualism
and discuss how they might deal with lexical ambiguity; but, first, we
examine what lexical ambiguity itself “looks like” in the state space of
a language processing system. By treating the representational parameters
of a model as dimensions in a state space, the range of behaviors (and
regions visited) in that volumetric space can be identified more system-
atically. In a state space that combines a variety of linguistic aspects,
a word representation can be seen as extending not only across
a semantic field (e.g., Lehrer, 1974) but indeed across a lexical field that
combine the semantics, phonology, and situational context of how the
word is typically used (e.g., Elman, 2009; see also Lyons, 1963). By
studying the real-time temporal dynamics of lexical ambiguity resolution
17
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
18 Theoretical and Methodological Considerations
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 19
Phonological Dimension(s)
“bug”
(device)
(insect)
Semantic Dimension(s)
b
Phonological Dimension(s)
“dusted”
(cleaning) (baking)
Semantic Dimension(s)
c
Phonological Dimension(s)
“stup...”
(am
az
ing
(dumb) )
Semantic Dimension(s)
be somewhat similar to that for bug but notably different in that the
semantic regions used for its different meanings are spatially contiguous
with one another, allowing for blends across that semantic spectrum.
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
20 Theoretical and Methodological Considerations
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 21
a packaging
pugilism
factory
semantic dimensions
“boxer”
dog breeds
b packaging
pugilism
factory
semantic dimensions
“boxer”
dog breeds
Figure 2.2 Individual differences in lexical fields: (a) a person with low
memory span or limited English experience would have a functionally
narrow lexical field for the word boxer, whereas (b) a person with high
memory span or extensive English experience would have a more
tentacular lexical field for boxer, with tendrils that stretch into a variety
of semantic spaces
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
22 Theoretical and Methodological Considerations
Academic
Fiction
“posterior”
contexts/genres
Newspapers Magazines
Fiction
Academic
b
contexts/genres
“piglet”
Newspapers Magazines
contexts/genres
the words posterior and piglet. They both have the same overall lexical
frequency: 240 occurrences each in a 560 million word corpus.
Therefore, traditional approaches in psycholinguistics would predict that
these two words should exhibit equal latency in reading and reaction time
tasks (e.g., Forster & Chambers, 1973). However, about 60 percent of the
occurrences of posterior take place in academic texts, while only 10–15 per-
cent of its occurrences are in fiction, magazine, and newspaper contexts
each – and it almost never shows up in spoken contexts. Therefore, con-
textually speaking, posterior is a relatively nondiverse word. In our linguistic
state space framework, this would mean that its lexical field is relatively
simple and convex (Figure 2.3a). As a result, if the language system started
out in a random or neutral location in state space, and was forced to
traverse its way to the region for posterior, it might have a long distance to
travel, thus producing a somewhat long response time. By contrast, the
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 23
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
24 Theoretical and Methodological Considerations
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 25
the string. This creates a set of candidate letters that the system is
“considering,” consisting of all letters containing features present in
the input. The letter nodes then pass activation to words that are
consistent with those letters. Competition among words, via their
mutual inhibitory connections, results in the most highly consistent
word (or words, in the case that there is ambiguity) becoming more
active, while all other words are suppressed. The active word nodes
then pass activation to the letters they contain. In this way, letters that
are presented in the context of a word receive additional activation,
relative to letters in nonwords, which is how the model accounts for
the word superiority effect.
While the original IAM dealt primarily with visual word recognition,
the TRACE model (McClelland & Elman, 1986; and its reimplementa-
tion, jTRACE: Strauss, Harris, & Magnuson, 2007) extended the same
principles to model spoken word recognition. In TRACE, the letter layer
of the original IAM is replaced with a phoneme layer, and the feature layer
now consists of nodes responding gradiently to various acoustic dimen-
sions rather than visual features. Since speech unfolds over time, the input
to TRACE is a sequence of acoustic features, which stands in contrast to
the way that the visual IAM is presented with all visual information
simultaneously. As a result of this sequential presentation, even unambig-
uous speech input is temporarily ambiguous at the word level: Onsets are
consistent with many possible words and, as more of the input is received,
the pool of consistent words is narrowed until finally the offset leaves only
a single candidate.
In this way, TRACE captures the predictions of the Cohort model of
speech processing (Marslen-Wilson, 1987), which held that lexical
access occurs as a sequential search by method of elimination.
Importantly, lexical access in Cohort is all-or-none in that words that
are inconsistent with an onset are eliminated from consideration. As
a result, the Cohort model cannot recover in the case of degraded
information or mispronunciations. In contrast, TRACE is a continuous
mapping model, meaning that activation flows continuously between
layers, such that a given word unit can still receive activation, even if
some part of the input is inconsistent with it. As a result, TRACE
provides a better fit to the behavioral data, which shows, for example,
that listeners partially activate rhyme-cohorts that have a different
onset (e.g., making eye fixations to a speaker when the spoken input
is beaker; Allopenna et al., 1998). It is worth noting, however, that
TRACE does not provide a perfect fit to behavioral data: There is also
evidence that listeners partially activate anadromes – words with the
same phonemes in a different order (e.g., making eye fixations to a sub
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
26 Theoretical and Methodological Considerations
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 27
and other studies began to swing the balance of evidence in favor of a PDP
type of account, and there is by now a very large body of work demon-
strating continuous bidirectional interaction between subsystems of the
language system (for review, see Spevack et al., 2018).
PDP models are able to capture the general pattern of behavioral data
regarding lexical ambiguity resolution. To understand how, let us con-
sider Kawamoto’s (1993) influential PDP model of lexical ambiguity
resolution, shown in Figure 2.4(b). In contrast to the IAM and TRACE
models discussed in the previous section, Kawamoto’s model makes use
of distributed rather than localist representations. Localist models have
a one-to-one mapping between nodes and represented entities as well as
hard-coded connections between entities. For example, the IAM and
TRACE have a single node for each feature, letter, and word, and the
connections between them are specified by the modeler in advance. In
those models, access of a lexical entry corresponds to activation of the
corresponding lexical node, and hence these models make it simple to
compare the activity of multiple word nodes over time.
Distributed models, on the other hand, encode representations as
a pattern of activity across many neurons that represent various features
or microfeatures. In Kawamoto’s (1993) model, each lexical entry corre-
sponds to a vector of activation values for 216 nodes, which are meant to
capture all features of a word: The first 48 nodes encode visual features
that define the orthography of the word; the next 48 nodes encode
phonetic features in specified positions, corresponding to pronunciation;
the next 24 nodes encode part-of-speech; and the last 96 nodes encode
meaning. While the total pattern across all nodes is unique with respect to
each lexical entry, each individual feature (meaning each possible value
for any of the 216 nodes) is consistent with several lexical entries. As
a result, the representation of each lexical entry is partially overlapping
with several other entries.
Another important difference between distributed and localist models
is that, in the former, the strength of connections between nodes must be
learned by the network, rather than coded by the researcher. Kawamoto’s
model is fully connected, meaning there are bidirectional links between
each of the 216 nodes. While it would, of course, be infeasible to manually
code all connections in a network of this size, this property of distributed
networks is actually a feature and not a bug: These models are intended to
capture developmental phenomena by teaching a lexicon to the network.
The network begins with connection strengths of 0 between all nodes.
During training, lexical entries (vectors of 216 activation values) are
presented to the network, which spreads activation according to its con-
nection strengths, eventually settling into a stable activity pattern.
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
28 Theoretical and Methodological Considerations
Initially, this stable pattern will not match the target pattern correspond-
ing to the lexical entry, so an error correction algorithm is used to modify
the connection strengths after each training trial, bringing the output
closer to the target pattern. After training, features that co-occur in
a word develop stronger connections, such that when some subset of
a word’s features are presented to the network (e.g., only orthography
or pronunciation), the full pattern of activity for that lexical entry may
emerge in the network. A lexical entry has been accessed by the network
when the full pattern of activity settles into a stable state that matches
some lexical entry.
The behavior of the network can be best understood as operating in
a high-dimensional state space, where each node serves as a dimension,
and the activation of all nodes is a set of coordinates that describes the
location of the system in the state space at that point in time. When the
network is presented with an ambiguous word, its location in the state
space (i.e., its activation pattern) moves in a direction that is somewhat
toward both regions that belong to the two meanings of that word.
Gradually, as context and other factors bias the system’s interpretation
of this ambiguous word, the trajectory will curve toward the region in
state space that corresponds to the contextually appropriate meaning.
This nonlinear trajectory of the system, as it moves through state space,
can be mathematically described as following along the contours of an
energy landscape that is imposed on the volume of the state space by
external inputs, context, and its neural connectivity pattern of excitatory
and inhibitory synapses. This energy landscape describes how certain
regions of state space have a strong attracting force and other regions
may have a weak attracting force. Interspersed among these basins of
attraction in the state space are other regions that repel the system away
from them (peaks in the energy landscape). The simplified sketches of
basins of attraction in Figures 2.1–2.3 have associated with them energy
landscapes that make some portions of them more strongly attracting and
other portions less so. For instance, Figure 2.4(c) shows an example of an
energy landscape where the state space of the system would correspond to
the two-dimensional floor of that three-dimensional space, and the height
dimension corresponds to the potential energy of the system. Much like
a marble would roll with gravity and momentum, the state of the system
(indicated as a black circle on the manifold surface of Figure 2.4(c) rolls
down the energy landscape’s nonlinear slopes and settles into an attractor
basin (which corresponds to a location in space that belongs to a word’s
meaning).
In Kawamoto’s (1993) simulations, unambiguous words were recog-
nized (and settled in their energy landscapes) more quickly than biased
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 29
a b
Phonology Meaning
Word
ROOM BOOM
Layer Spelling Pt of Spch
Letter R B K F
Layer
Feature | \ [ ] –
/
Layer
Letter + – + + – – + – – – + + – + – –
Input R.
Input Vector
c d L1 L2
Node Node
Semantic
L1 L2 (or other, e.g.
Words Words sensorimotor)
Lexicon layer
Orthography Phonology
Subordinate
Layer Layer
Meaning
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
30 Theoretical and Methodological Considerations
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 31
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
32 Theoretical and Methodological Considerations
divided into three classes. First, cognates are pairs of words that have the
same spelling and meaning in two languages. For example, actor has the
same meaning in English and Spanish but slightly different phonology. In
the model, the two word nodes corresponding to a pair of cognates will
have the same connections to the orthography and semantic nodes and
some of the same connections to the phonology layer (depending on the
degree of phonological similarity across the two languages). Orthographic
input to the model will activate both words equally, which will then
mutually inhibit each other via the inhibitory connections between all
words. Hence, this type of ambiguity cannot be resolved without help
from the language nodes. Prior unambiguous input to the model in one of
the two languages will selectively activate the corresponding language
node, which then acts to inhibit all word nodes in the opposing language.
This alters the starting activation levels of the word nodes, allowing the
node in the relevant language to more strongly inhibit its counterpart and
win the competition.
Next, false cognates, or interlingual homographs, are pairs of words
with the same spelling but different meanings in each language. For
example, main is a synonym for primary in English but in French means
hand (with a fairly different pronunciation). These would be represented
in the BIA models as word nodes in each language that are identical in
their connections to the orthography layer, partly different in their con-
nections to the phonology later, and completely different in their connec-
tions to the semantic layer. Ambiguity resolution in this case could occur
again by priming of the language nodes or instead through contextual
bias. If, for example, sentential context activates semantic nodes corre-
sponding to one of the two resolutions of the ambiguity, this alters the
initial state of the system to be closer to one option. A sufficiently biasing
sentential context, even in the nontarget language, could override the
influence of the language nodes, leading the system to correctly recognize
a code-switched word that does not match the language of the sentential
context. This is consistent with experimental evidence showing that the
processing cost of switching languages is dependent on contextual bias
(Li, 1996; Moreno, Federmeier, & Kutas, 2002). Furthermore, results
have shown that code-switching is easier when the phonology of the code-
switched word is different from that of the context language (Grosjean,
1995; Li, 1996). In the BIA models, this would be accounted for by the
fact that code-switched words with minimal phonological overlap with
the context language will activate fewer competitors in the context lan-
guage, leading to faster resolution.
Finally, partial cognates, or interlingual cohorts, are pairs of words
across languages in which there is partial overlap in spelling or phonology.
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 33
sharp
shark
phonological and semantic dimensions
k”
h ar
“s
sharp
shark
”
a rk
“sh
sharik
Figure 2.5 (a) For a monolingual, the linguistic input shark has
orthographic and phonological similarity to both shark or sharp, and
a few other words; (b) For a bilingual, that same input has similarity
with even more lexical representations, thus producing an extremely
nonconvex lexical field, and an even more nonlinear trajectory
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
34 Theoretical and Methodological Considerations
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 35
Discussion
Obviously, bilingualism does not involve having a new and separate
lexicon module inserted into a person’s cortex. Learning an L2, early or
late in life, involves rewiring the existing connectivity of multiple language
areas of the brain. This network of networks has some portions of it that
are mostly specialized for one or the other language (Kim et al., 1997),
but it also has many portions that are used by both languages (Marian,
Spivey, & Hirsch, 2003). The BIA and BIA+ models of bilingual lan-
guage processing have pursued that general kind of architecture and
produced results that correspond well with human data (Dijkstra & van
Heuven, 2002; van Heuven et al., 1998). As a result of that type of cortical
connectivity in a bilingual, reading or hearing a word from one language
can inadvertently partially activate a lexical representation in the other
language. It turns out that this process in bilinguals is not that different
from related processes in monolinguals. When monolinguals read or hear
a word in their language, they also exhibit inadvertent partial activation of
other related lexical representations.
Rather than thinking of these lexical representations as line entries in
a mental dictionary, some of which get partially activated, we have chosen
a different framework here for understanding how ambiguity (temporary
or otherwise) causes the language system to vacillate between multiple
possible interpretations. We have chosen a state-space framework,
wherein lexical representations exist as attractor basins, some with
a strong or weak pull, some with partial overlap with one another, and
some with tendrils that stretch out to semantically disparate regions of
state space. Those tentacular lexical attractor basins, whose tendrils reach
out in many directions in state space, may be unusually prevalent in
bilinguals, compared to monolinguals.
While the dictionary framework is clearly a metaphor, intended to help
one imagine how words might be organized in the language system, the
state-space framework need not be conceived as a metaphor (Onnis &
Spivey, 2012). When one takes a neural network, such as a brain or
connectionist model, and treats each node’s activation as a coordinate
in a state space, this serves as a mathematical description of the state of the
actual system (Elman, 2004, Spivey, 2008) – not a metaphor. Scientific
metaphors always break down at some point and can provide misleading
insights (Hoffman, 1980). In the case of a simulated neural network
processing two languages, as its state changes from timestep to timestep,
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
36 Theoretical and Methodological Considerations
one can access all the data necessary to provide an accurate state-space
description of this system – perhaps performing a dimensionality reduc-
tion down to two or three dimensions for purposes of data visualization
(Elman, 1991). In the case of an actual brain-and-body processing two
languages, however, it is of course not possible to measure every node in
the network. Nonetheless, we can measure quite a bit; and, when the right
behavioral measures are chosen carefully and sampled as continuously as
possible (e.g., Louwerse et al., 2012), those behaviors can be seen as
performing something similar to the dimensionality reduction performed
on the simulated neural network, thus allowing us to witness a low-
dimensional record of the high-dimensional mental trajectory (Spivey &
Dale, 2006, p. 209). Importantly, even with quantitatively abstracted
data, from recorded behaviors that result from hidden neural processes,
we are still not using a metaphor when we plot those data into a state space
for data visualization. The neural dimensions have been reduced by the
motor system in poorly understood ways, but there is no figurative ana-
logy being used to liken linguistic processes to something else, such as
a book with lexical entries listed in alphabetical order.
In this chapter, we have provided a series of theory visualizations as
proxies for those data visualizations. Armed with state-space trajec-
tories of connectionist networks addressing lexical ambiguity resolu-
tion in monolingual conditions and in bilingual conditions, one can
see that the attractor basins corresponding to word representations
come in a wide variety of shapes and sizes. Bilingualism may not
instigate a qualitatively different format of processing but instead
may just introduce a quantitative change in the distribution of those
different shapes and sizes. Compared to monolinguals, bilinguals may
experience a little more phonological (and in some cases ortho-
graphic) overlap in their lexical fields, which may result in a little
more lexical competition on a regular basis. Perhaps it is this incessant
practice with increased lexical competition that trains a bilingual’s
brain to have greater cognitive control (e.g., Kroll & Bialystok,
2013; Spivey & Cardon, 2015). If one must have a metaphor,
then – far from being a dictionary – the mental lexicon is perhaps
more like a high-dimensional golf course with sandpits, greens, and
fairways all interlacing among one another; and a bilingual’s golf
course is especially tangled.
Keywords
Ambiguous words, Bilingual interactive activation (BIA) model,
Bilingual interactive activation Plus (BIA+) model, Bilingual lexical
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 37
Thought Questions
1. What are the pros and cons of localist versus distributed connec-
tionist models of bilingualism? Is one more appropriate than the
other?
2. Age of acquisition is not modeled in the BIA or BIA+ but is known to
have important effects. How might age of acquisition be integrated with
these models?
3. Views of embodied cognition (e.g., Barsalou, 2008) suggest that
action, planning, and sensorimotor representations may also play roles
in language processing. How might these or other cues influence bilingual
processing?
Internet Sites
Connectionism: www.ucs.louisiana.edu/~isb9112/dept/phil341/wisconn
.html
Connectionism as an Approach: www.iep.utm.edu/connect/
Bilingual Interactive Activation Plus: www.wikivisually.com/
wiki/Bilingual_interactive_activation_plus
Interactive Activation Models: www.psychology.nottingham.ac.uk/staff/
wvh/jiam/
What is Connectionism: www.mind.ilstu.edu/curriculum/connectionism_
intro/connectionism_1.php
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
38 Theoretical and Methodological Considerations
Further Reading
Dale, R., Fusaroli, R., Duran, N. D., & Richardson, D. C. (2013). The
self-organization of human interaction. In Psychology of Learning and
Motivation, 59, 43–95.
References
Adelman, J. S., Brown, G. D., & Quesada, J. F. (2006). Contextual diversity, not
word frequency, determines word-naming and lexical decision times.
Psychological Science, 17(9), 814–823.
Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the
time course of spoken word recognition using eye movements: Evidence for
continuous mapping models. Journal of memory and language, 38(4), 419–439.
Altarriba, J., & Gianico, J. L. (2003). Lexical ambiguity resolution across
languages: A theorical and empirical review. Experimental Psychology, 50(3),
159–170.
Altarriba, J., Kroll, J. F., Sholl, A., & Rayner, K. (1996). The influence of lexical
and conceptual constraints on reading mixed-language sentences: Evidence
from eye fixations and naming times. Memory and Cognition, 24(4), 477–492.
Barsalou, L. W. (2008). Grounded Cognition. Annual Review of Psychology, 59,
617–645.
Chen, Q., Huang, X., Bai, L., Xu, X., Yang, Y., & Tanenhaus, M. K. (2017). The
effect of contextual diversity on eye movements in Chinese sentence reading.
Psychonomic Bulletin and Review, 24(2), 510–518.
De Groot, A. M., Delmaar, P., & Lupker, S. J. (2000). The processing of
interlexical homographs in translation recognition and lexical decision:
Support for non-selective access to bilingual memory. The Quarterly Journal of
Experimental Psychology, 53A(2), 397–428.
Dijkstra, T., Grainger, J., & van Heuven, W. J. (1999). Recognition of cognates
and interlingual homographs: The neglected role of phonology. Journal of
Memory and language, 41(4), 496–518.
Dijkstra, T., & van Heuven, W. J. (2002). The architecture of the bilingual word
recognition system: From identification to decision. Bilingualism: Language and
Cognition, 5(3), 175–197.
Dörnyei, Z. (2005). The psychology of the language learner: Individual differences
in second language acquisition. Routledge.
Elman, J. L. (1991). Distributed representations, simple recurrent networks, and
grammatical structure. Machine Learning, 7(2–3), 195–225.
Elman, J. L. (2004). An alternative view of the mental lexicon. Trends in Cognitive
Sciences, 8(7), 301–306.
Elman, J. L. (2009). On the meaning of words and dinosaur bones: Lexical
knowledge without a lexicon. Cognitive Science, 33(4), 547–582.
Forster, K. I., & Chambers, S. M. (1973). Lexical access and naming time.
Journal of Memory and Language, 12(6), 627–635.
French, R. M. (1998). A simple recurrent network model of bilingual memory. In
M. A. Gernsbacher & S. J. Derry (Eds.), Proceedings of the 20th Annual Cognitive
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 39
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
40 Theoretical and Methodological Considerations
Louwerse, M. M., Dale, R., Bard, E. G., & Jeuniaux, P. (2012). Behavior
matching in multimodal communication is synchronized. Cognitive Science, 36
(8), 1404–1426.
Lyons, J. (1963). Structural semantics. Oxford: Blackwell.
MacDonald, M. C., & Christiansen, M. H. (2002). Reassessing working
memory: Comment on Just and Carpenter(1992) and Waters and Caplan
(1996). Psychological Review, 109(1), 35–54.
Macnamara, J., & Kushnir, S. L. (1971). Linguistic independence of bilinguals:
The input switch. Journal of Memory and Language, 10(5), 480.
Marian, V., & Kaushanskaya, M. (2004). Self-construal and emotion in bicultural
bilinguals. Journal of Memory and Language, 51(2), 190–201.
Marian, V., & Spivey, M. (2003a). Bilingual and monolingual processing of
competing lexical items. Applied Psycholinguistics, 24(2), 173–193.
Marian, V., & Spivey, M. (2003b). Competing activation in bilingual language
processing: Within-and between-language competition. Bilingualism: Language
and Cognition, 6(2), 97–115.
Marian, V., Spivey, M., & Hirsch, J. (2003). Shared and separate systems in
bilingual language processing: Converging evidence from eyetracking and brain
imaging. Brain and Language, 86(1), 70–82.
Marslen-Wilson, W. D. (1987). Functional parallelism in spoken
word-recognition. Cognition, 25(1–2), 71–102.
McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech
perception. Cognitive Psychology, 18(1), 1–86.
McClelland, J. L., & Johnston, J. C. (1977). The role of familiar units in
perception of words and nonwords. Perception and Psychophysics, 22(3),
249–261.
McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of
context effects in letter perception: I. An account of basic findings. Psychological
Review, 88(5), 375–407.
Meuter, R. F., & Allport, A. (1999). Bilingual language switching in naming:
Asymmetrical costs of language selection. Journal of memory and language, 40
(1), 25–40.
Miyake, A., Just, M. A., & Carpenter, P. A. (1994). Working memory constraints
on the resolution of lexical ambiguity: Maintaining multiple interpretations in
neutral contexts. Journal of Memory and Language, 33(2), 175–202.
Moreno, E. M., Federmeier, K. D., & Kutas, M. (2002). Switching languages,
switching palabras (words): An electrophysiological study of code switching.
Brain and Language, 80(2), 188–207.
Onnis, L., Spivey, M. J. (2012). Toward a new scientific visualization for the
language sciences. Information, 3, 124–150.
Plummer, P., Perea, M., & Rayner, K. (2014). The influence of contextual
diversity on eye movements in reading. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 40(1), 275–283.
Rumelhart, D. E., & McClelland, J. L. (1982). An interactive activation
model of context effects in letter perception: II. The contextual
enhancement effect and some tests and extensions of the model.
Psychological Review, 89(1), 60–94.
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003
Theory Visualizations for Bilingual Models 41
Downloaded from https://www.cambridge.org/core. University of Toronto, on 02 Jan 2020 at 12:05:12, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781316535967.003