Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

The document introduces a novel task for grounded language understanding that involves disambiguating ambiguous sentences using visual context, supported by a new multimodal corpus called LAVA. This corpus contains 237 sentences with various types of linguistic ambiguities paired with videos that illustrate different interpretations of each sentence. The study extends existing vision models to recognize multiple interpretations of sentences, facilitating a unified approach to disambiguation across different ambiguity types.

Uploaded by

Ayush Jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views7 pages

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

Uploaded by

Ayush Jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Do You See What I Mean?

Visual Resolution of
Linguistic Ambiguities

Abstract
Understanding language goes hand in hand with the ability to integrate com-
plex contextual information obtained via perception. We present a novel task for
grounded language understanding: disambiguating a sentence given a visual scene
which depicts one of the possible interpretations of that sentence. To this end, we
introduce a new multimodal corpus containing ambiguous sentences, representing
a wide range of syntactic, semantic and discourse ambiguities, coupled with videos
that visualize the different interpretations for each sentence. We address this task
by extending a vision model which determines if a sentence is depicted by a video.
We demonstrate how such a model can be adjusted to recognize different interpre-
tations of the same underlying sentence, allowing to disambiguate sentences in a
unified fashion across the different ambiguity types.

1 Introduction
Ambiguity is one of the defining characteristics of human languages, and language understanding
crucially relies on the ability to obtain unambiguous representations of linguistic content. While
some ambiguities can be resolved using intra-linguistic contextual cues, the disambiguation of many
linguistic constructions requires integration of world knowledge and perceptual information obtained
from other modalities.
We focus on the problem of grounding language in the visual modality, and introduce a novel task
for language understanding which requires resolving linguistic ambiguities by utilizing the visual
context in which the linguistic content is expressed. This type of inference is frequently called for in
human communication that occurs in a visual environment, and is crucial for language acquisition,
when much of the linguistic content refers to the visual surroundings of the child.
Our task is also fundamental to the problem of grounding vision in language, by focusing on
phenomena of linguistic ambiguity, which are prevalent in language, but typically overlooked when
using language as a medium for expressing understanding of visual content. Due to such ambiguities,
a superficially appropriate description of a visual scene may in fact not be sufficient for demonstrating
a correct understanding of the relevant visual content. Our task addresses this issue by introducing a
deep validation protocol for visual understanding, requiring not only providing a surface description
of a visual activity but also demonstrating structural understanding at the levels of syntax, semantics
and discourse.
To enable the systematic study of visually grounded processing of ambiguous language, we create
a new corpus, LAVA (Language and Vision Ambiguities). This corpus contains sentences with
linguistic ambiguities that can only be resolved using external information. The sentences are paired
with short videos that visualize different interpretations of each sentence. Our sentences encompass a
wide range of syntactic, semantic and dis-
course ambiguities, including ambiguous prepositional and verb phrase attachments, conjunctions,
logical forms, anaphora and ellipsis. Overall, the corpus contains 237 sentences, with 2 to 3
interpretations per sentence, and an average of 3.37 videos that depict visual variations of each
sentence interpretation, corresponding to a total of 1679 videos.
Using this corpus, we address the problem of selecting the interpretation of an ambiguous sentence
that matches the content of a given video. Our approach for tackling this task extends the sentence
tracker. The sentence tracker produces a score which determines if a sentence is depicted by a
video. This earlier work had no concept of ambiguities; it assumed that every sentence had a single
interpretation. We extend this approach to represent multiple interpretations of a sentence, enabling
us to pick the interpretation that is most compatible with the video.

2 Related Work
Previous language and vision studies focused on the development of multimodal word and sentence
representations as well as methods for describing images and videos in natural language. While these
studies handle important challenges in multimodal processing of language and vision, they do not
provide explicit modeling of linguistic ambiguities.
Previous work relating ambiguity in language to the visual modality addressed the problem of word
sense disambiguation. However, this work is limited to context independent interpretation of individ-
ual words, and does not consider structure-related ambiguities. Discourse ambiguities were previously
studied in work on multimodal coreference resolution. Our work expands this line of research, and
addresses further discourse ambiguities in the interpretation of ellipsis. More importantly, to the best
of our knowledge our study is the first to present a systematic treatment of syntactic and semantic
sentence level ambiguities in the context of language and vision.
The interactions between linguistic and visual information in human sentence processing have been
extensively studied in psycholinguistics and cognitive psychology. A considerable fraction of this
work focused on the processing of ambiguous language, providing evidence for the importance of
visual information for linguistic ambiguity resolution by humans. Such information is also vital
during language acquisition, when much of the linguistic content perceived by the child refers to their
immediate visual environment. Over time, children develop mechanisms for grounded disambiguation
of language, manifested among others by the usage of iconic gestures when communicating ambigu-
ous linguistic content. Our study leverages such insights to develop a complementary framework that
enables addressing the challenge of visually grounded disambiguation of language in the realm of
artificial intelligence.

3 Task
We provide a concrete framework for the study of language understanding with visual context by
introducing the task of grounded language disambiguation. This task requires to choose the correct
linguistic representation of a sentence given a visual context depicted in a video. Specifically, provided
with a sentence, n candidate interpretations of that sentence and a video that depicts the content of
the sentence, one needs to choose the interpretation that corresponds to the content of the video.
To illustrate this task, consider the example, where we are given the sentence “Sam approached the
chair with a bag” along with two different linguistic interpretations. In the first in-
terpretation, which corresponds to parse 1(a), Sam has the bag. In the second interpretation associated
with parse 1(b), the bag is on the chair rather than with Sam. Given the visual context from figure
1(c), the task is to choose which interpretation is most appropriate for the sentence.

4 Approach Overview
To address the grounded language disambiguation task, we use a compositional approach for determin-
ing if a specific interpretation of a sentence is depicted by a video. a sentence and an accompanying
interpretation encoded in first order logic, give rise to a grounded model that matches a video against
the provided sentence interpretation.
The model is comprised of Hidden Markov Models (HMMs) which encode the semantics of words,
and trackers which locate objects in video frames. To represent an interpretation of a sentence, word
models are combined with trackers through a cross-product which respects the semantic representation
of the sentence to create a single model which recognizes that interpretation.

2
Given a sentence, we construct an HMM based representation for each interpretation of that sentence.
We then detect candidate locations for objects in every frame of the video. Together the re-
forestation for the sentence and the candidate object locations are combined to form a model which
can determine if a given interpretation is depicted by the video. We test each interpretation and report
the interpretation with highest likelihood.

5 Corpus
To enable a systematic study of linguistic ambiguities that are grounded in vision, we compiled
a corpus with ambiguous sentences describing visual actions. The sentences are formulated such
that the correct linguistic interpretation of each sentence can only be determined using external,
non-linguistic, information about the depicted activity. For example, in the sentence “Bill held the
green chair and bag”, the correct scope of “green” can only be determined by integrating additional
information about the color of the bag. This information is provided in the accompanying videos,
which visualize the possible interpretations of each sentence. Figure 2 presents the syntactic parses
for this example along with frames from the respective videos. Although our videos contain visual
uncertainty, they are not ambiguous with respect to the linguistic interpretation they are presenting,
and hence a video always corresponds to a single candidate representation of a sentence.
The corpus covers a wide range of well
known syntactic, semantic and discourse ambiguity classes. While the ambiguities are associated
with various types, different sentence interpretations always represent distinct sentence meanings,
and are hence encoded semantically using first order logic. For syntactic and discourse ambiguities
we also provide an additional, ambiguity type specific encoding as described below.

• Syntax Syntactic ambiguities include Prepositional Phrase (PP) attachments, Verb Phrase
(VP) attachments, and ambiguities in the interpretation of conjunctions. In addition to
logical forms, sentences with syntactic ambiguities are also accompanied with Context Free
Grammar (CFG) parses of the candidate interpretations, generated from a deterministic CFG
parser.
• Semantics The corpus addresses several classes of semantic quantification ambiguities, in
which a syntactically unambiguous sentence may correspond to different logical forms. For
each such sentence we provide the respective logical forms.
• Discourse The corpus contains two types of discourse ambiguities, Pronoun Anaphora and
Ellipsis, offering examples comprising two sentences. In anaphora ambiguity cases, an
ambiguous pronoun in the second sentence is given its candidate antecedents in the first
sentence, as well as a corresponding logical form for the meaning of the second sentence. In
ellipsis cases, a part of the second sentence, which can constitute either the subject and the
verb, or the verb and the object, is omitted. We provide both interpretations of the omission
in the form of a single unambiguous sentence, and its logical form, which combines the
meanings of the first and the second sentences.

Table 2 lists examples of the different ambiguity classes, along with the candidate interpretations of
each example.
The corpus is generated using Part of Speech (POS) tag sequence templates. For each template, the
POS tags are replaced with lexical items from the corpus lexicon, described in table 3, using all the
visually applicable assignments. This generation process yields an overall of 237 sentences,
of which 213 sentences have 2 candidate interpretations, and 24 sentences have 3 interpretations.
Table 1 presents the corpus templates for each ambiguity class, along with the number of sentences
generated from each template.
The corpus videos are filmed in an indoor environment containing background objects and pedestrians.
To account for the manner of performing actions, videos are shot twice with different actors. Whenever
applicable, we also filmed the actions from two different directions (e.g. approach from the left,
and approach from the right). Finally, all videos were shot with two cameras from two different
view points. Taking these variations into account, the resulting video corpus contains 7.1 videos
per sentence and 3.37 videos per sentence interpretation, corresponding to a total of 1679 videos.

3
Table 1: POS templates for generating the sentences in our corpus. The rightmost column represents
the number of sentences in each category. The sentences are produced by replacing the POS tags
with all the visually applicable assignments of lexical items from the corpus lexicon shown in table 3.
Ambiguity Templates #
4*Syntax PP NNP V DT [JJ] NN1 IN DT [JJ] NN2. 48
VP NNP1 V [IN] NNP2 V [JJ] NN. 60
NNP1 [and NNP2] V DT JJ NN1 and NN2
Conjunction 40
NNP V DT NN1 or DT NN2 and DT NN3.
Total 148
NNP1 and NNP2 V a NN.
Semantics Logical Form 35
Someone V the NNS.
2*Discourse Anaphora NNP V DT NN1 and DT NN2. It is JJ. 36
Ellipsis NNP1 V NNP2. Also NNP3. 18
Total 54
Total 237

The average video length is 3.02 seconds (90.78 frames), with in an overall of 1.4 hours of footage
(152434 frames).
A custom corpus is required for this task because no existing corpus, containing either videos or
images, systematically covers multimodal ambiguities. Datasets aim to control for more aspects of
the videos than just the main action being performed but they do not provide the range of ambiguities
discussed here. The closest dataset is that of as it controls for object appearance, color, action,
and direction of motion, making it more likely to be suitable for evaluating disambiguation tasks.
Unfortunately, that dataset was designed to avoid ambiguities, and therefore is not suitable for
evaluating the work described here.

6 Model

To perform the disambiguation task, we extend the sentence recognition model which represents
sentences as compositions of words. Given a sentence, its first order logic interpretation and a
video, our model produces a score which determines if the sentence is depicted by the video. It
simultaneously tracks the participants in the events described by the sentence while recognizing the
events themselves. This al-
lows it to be flexible in the presence of noise by integrating top-down information from the sentence
with bottom-up information from object and property detectors. Each word in the query sentence is
represented by an HMM, which recognizes tracks (i.e. paths of detections in a video for a specific
object) that satisfy the semantics of the given word. In essence, this model can be described as having
two layers, one in which object tracking occurs and one in which words observe tracks and filter
tracks that do not satisfy the word constraints.
Given a sentence interpretation, we construct a sentence-specific model which recognizes if a video
depicts the sentence as follows. Each predicate in the first order logic formula has a corresponding
HMM, which can recognize if that predicate is true of a video given its arguments. Each variable has
a corresponding tracker which attempts to physically locate the bounding box corresponding to that
variable in each frame of a
video. This creates a bipartite graph: HMMs that represent predicates are connected to trackers that
represent variables. The trackers themselves are similar to the HMMs, in that they comprise a lattice
of potential bounding boxes in every frame. To construct a joint model for a sentence interpretation,
we take the cross product of HMMs and trackers, taking only those cross products dictated by the
structure of the formula corresponding to the desired interpretation. Given a video, we employ an
object detector to generate candidate detections in each frame, construct trackers which select one of
these detections in each frame, and finally construct the overall model from HMMs and trackers.

4
Table 2: An overview of the different ambiguity types, along with examples of ambiguous sentences
with their linguistic and visual interpretations. Note that similarly to semantic ambiguities, syntactic
and discourse ambiguities are also provided with first order logic formulas for the resulting sentence
interpretations. Table 4 shows additional examples for each ambiguity type, with frames from sample
videos corresponding to the different interpretations of each sentence.
Ambiguity Example Linguistic interpretations Visual setups
PP Claire left the green chair with a Claire [left the green chair] [with The bag is with Claire.
yellow bag. a yellow bag]. The bag is on the chair.
Claire left [the green chair with
a yellow bag].
VP Claire looked at Bill picking up Claire looked at [Bill [picking up Bill picks up the chair.
a chair. a chair]]. Claire picks up the chair.
Claire [looked at Bill] [picking
up a chair].
Conjunction Claire held a green bag and Claire held a [green [bag and The chair is green.
chair. chair]]. The chair is not green.
Claire held a [[green bag] and
[chair]].
Claire held the chair or the bag Claire held [[the chair] or [the Claire holds the chair.
and the telescope. bag and the telescope]]. Claire holds the chair and the
Claire held [[the chair or the bag] telescope.
and [the telescope]].
Logical Form Someone moved the two chairs. chair(x), move(Claire, x), Claire and Bill move the same
move(Bill, x) chair.
chair(x), chair(y), x ̸= y, Claire and Bill move different
move(Claire, x), chairs.
move(Bill, y) One person moves both chairs.
chair(x), chair(y), x ̸= y, Each chair moved by a different
person(u), person.
move(u, x), move(u, y)
chair(x), chair(y), x ̸= y,
person(u), person(v)
u ̸= v, move(u, x), move(v, y)
Anaphora Sam picked up the bag and the It = bag The bag is yellow.
chair. It is yellow. It = chair The chair is yellow.
Ellipsis Sam left Bill. Also Clark. Sam left Bill and Clark. Sam left Bill and Clark.
Sam and Clark left Bill. Sam and Clark left Bill.

Table 3: The lexicon used to instantiate the templates in table 1 in order to generate the corpus.

Syntactic Category Visual Category Words

Nouns Objects, People chair, bag, telescope, someone, proper names
Verbs Actions pick up, put down, hold, move (transitive), look at, approach, leave
Prepositions Spacial Relations with, left of, right of, on
Adjectives Visual Properties yellow, green

Provided an interpretation and its corresponding formula composed of P predicates and V variables,
along with a collection of object detections, bfi rame detection index, in each frame of a video of
length T the model computes the score of the videosentence pair by finding the optimal detection
for each participant in every frame. This is in essence the Viterbi algorithm, the MAP algorithm for
f rame
HMMs, applied to finding optimal object detections jvariable for each participant, and the optimal
f rame
state kpredicate for each predicate HMM, in every frame. Each detection is scored by its confidence
from the object detector, f and each object track is scored by a motion coherence metric g which

5
determines if the motion of the track agrees with the underlying optical flow. Each predicate,
V T
!
X X
max F (b1i1v ) + g(btit−1 , btitv ) +
i1 ...iV v
k1 ...kP v=1 t=2
! (1)
P X
T T
θ (1) θ (2)
X X
log hp (kpt , bitp , bitp )+ log ap (kpt−1 , kpt )
θp (1) θp (2)
p=1 t=1 t=2

p, is scored by the probability of observing a particular detection in a given state hp , and by the
probability of transitioning between states ap . The structure of the formula and the fact that multiple
predicates often refer to the same variables is recorded by θ, a mapping between predicates and their
arguments. The model computes the MAP estimate as:
for sentences which have words that refer to at most two tracks (i.e. transitive verbs or binary
predicates) but is trivially extended to arbitrary arities. Figure 3 provides a visual overview of the
model as a cross-product of tracker models and word models.
Our model extends the approach of in several ways. First, we depart from the dependency based
representation used in that work, and recast the model to encode first order logic formulas. Note
that some complex first order logic formulas cannot be directly encoded in the model and require
additional inference steps. This extension enables us to represent ambiguities in which a given
sentence has multiple logical interpretations for the same syntactic parse.
Second, we introduce several model components which are not specific to disambiguation, but are
required to encode linguistic constructions that are present in our corpus and could not be handled by
the model of. These new components are the predicate “not equal”, disjunction, and conjunction. The
key addition among these components is support for the new predicate “not equal”, which enforces
that two tracks, i.e. objects, are distinct from each other. For example, in the sentence “Claire and Bill
moved a chair” one would want to ensure that the two movers are distinct entities. In earlier work,
this was not required because the sentences tested in that work were designed to distinguish objects
based on constraints rather than identity. In other words, there might have been two different people
but they were distinguished in the sentence by their actions or appearance. To faithfully recognize
that two actors are moving the chair in the earlier example, we must ensure that they are disjoint
from each other. In order to do this we create a new HMM for this predicate, which assigns low
probability to tracks that heavily overlap, forcing the model to fit two different actors in the previous
example. By combining the new first order logic based semantic representation in lieu of a syntactic
representation with a more expressive model, we can encode the sentence interpretations required to
perform the disambiguation task.
Figure 3(left) shows an example of two different interpretations of the above discussed sentence
“Claire and Bill moved a chair”. Object trackers, which correspond to variables in the first order
logic representation of the sentence interpretation, are shown in red. Predicates which constrain the
possible bindings of the trackers, corresponding to predicates in the representation of the sentence, are
shown in blue. Links represent the argument structure of the first order logic formula, and determine
the cross products that are taken between the predicate HMMs and tracker lattices in order to form
the joint model which recognizes the entire interpretation in a video.
The resulting model provides a single unified formalism for representing all the ambiguities in table
2. Moreover, this approach can be tuned to different levels of specificity. We can create models that
are specific to one interpretation of a sentence or that are generic, and accept multiple interpretations
by eliding constraints that are not com-
mon between the different interpretations. This allows the model, like humans, to defer deciding on a
particular interpretation or to infer that multiple interpretation of the sentence are plausible.

7 Experimental Results
We tested the performance of the model described in the previous section on the LAVA dataset
presented in section 5. Each video in the dataset was pre-processed with object detectors for humans,
bags, chairs, and telescopes. We employed a mixture of CNN and DPM detectors, trained on held
out sections of our corpus. For each object class we generated proposals from both the CNN and

6
the DPM detectors, and trained a scoring function to map both results into the same space. The
scoring function consisted of a sigmoid over the confidence of the detectors trained on the same held
out portion of the training set. As none of the disambiguation examples discussed here rely on the
specific identity of the actors, we did not detect their identity. Instead, any sentence which contains
names was automatically converted to one which contains arbitrary “person” labels.
The sentences in our corpus have either two or three interpretations. Each interpretation has one or
more associated videos where the scene was shot from a different angle, carried out either by different
actors, with different objects, or in different directions of motion. For each sentence-video pair, we
performed a 1-out-of-2 or 1-out-of-3 classification task to determine which of the interpretations of
the corresponding sentence best fits that video. Overall chance performance on our dataset is 49.04%,
slightly lower than 50% due to the 1out-of-3 classification examples.
The model presented here achieved an accuracy of 75.36% over the entire corpus averaged across
all error categories. This demonstrates that the model is largely capable of capturing the underlying
task and that similar compositional crossmodal models may do the same. For each of the 3 major
ambiguity classes we had an accuracy of 84.26% for syntactic ambiguities, 72.28% for semantic
ambiguities, and 64.44% for discourse ambiguities.
The most significant source of model failures are poor object detections. Objects are often rotated
and presented at angles that are difficult to recognize. Certain object classes like the telescope
are much more difficult to recognize due to their small size and the fact that hands tend to largely
occlude them. This accounts for the degraded performance of the semantic ambiguities relative to the
syntactic ambiguities, as many more semantic ambiguities involved the telescope. Object detector
performance is similarly responsible for the lower performance of the discourse ambiguities which
relied much more on the accuracy of the person detector as many sentences involve only people
interacting with each other without any additional objects. This degrades performance by removing a
helpful constraint for inference, according to which people tend to be close to the objects they are
manipulating. In addition, these sentences introduced more visual uncertainty as they often involved
three actors.
The remaining errors are due to the event models. HMMs can fixate on short sequences of events
which seem as if they are part of an action, but in fact are just noise or the prefix of another action.
Ideally, one would want an event model which has a global view of the action, if an object went up
from the beginning to the end of the video while a person was holding it, it’s likely that the object was
being picked up. The event models used here cannot enforce this constraint, they merely assert that
the object was moving up for some number of frames; an event which can happen due to noise in the
object detectors. Enforcing such local constraints instead of the global constraint of the motion of the
object over the video makes joint tracking and event recognition tractable in the framework presented
here but can lead to errors. Finding models which strike a better balance between local information
and global constraints while maintaining tractable inference remains an area of future work.

8 Conclusion
We present a novel framework for studying ambiguous utterances expressed in a visual context. In
particular, we formulate a new task for resolving structural ambiguities using visual signal. This is a
fundamental task for humans, involving complex cognitive processing, and is a key challenge for
language acquisition during childhood. We release a multimodal corpus that enables to address this
task, as well as support further investigation of ambiguity related phenomena in visually grounded
language processing. Finally, we
present a unified approach for resolving ambiguous descriptions of videos, achieving good perfor-
mance on our corpus.
While our current investigation focuses on structural inference, we intend to extend this line of work
to learning scenarios, in which the agent has to deduce the meaning of words and sentences from
structurally ambiguous input. Furthermore, our framework can be beneficial for image and video
retrieval applications in which the query is expressed in natural language. Given an ambiguous query,
our approach will enable matching and clustering the retrieved results according to the different query
interpretations.

Learning English With Peppa Pig
No ratings yet
Learning English With Peppa Pig
15 pages
Language Learning via Peppa Pig
No ratings yet
Language Learning via Peppa Pig
13 pages
2023 Emnlp-Main 1045
No ratings yet
2023 Emnlp-Main 1045
12 pages
NLP
No ratings yet
NLP
78 pages
Samy - Gonzalez-Ledesma - 2010 - Pragmatic Annotation of Discourse Markers Parallel Corpus
No ratings yet
Samy - Gonzalez-Ledesma - 2010 - Pragmatic Annotation of Discourse Markers Parallel Corpus
7 pages
Introduction To NLP and Ambiguity
No ratings yet
Introduction To NLP and Ambiguity
42 pages
CNS Critical Review
No ratings yet
CNS Critical Review
6 pages
05-01-2021 Semantic Interpretation, Ambiguity and Disambiguity
No ratings yet
05-01-2021 Semantic Interpretation, Ambiguity and Disambiguity
8 pages
Unit - 3 NLP
No ratings yet
Unit - 3 NLP
15 pages
2022.findings Emnlp.182
No ratings yet
2022.findings Emnlp.182
13 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
9 pages
NLP Ambiguity
No ratings yet
NLP Ambiguity
35 pages
Sempraginsls Davidson2022
No ratings yet
Sempraginsls Davidson2022
237 pages
Zhang 等 - 2022 - GLIPv2 Unifying Localization and Vision-Language Understanding
No ratings yet
Zhang 等 - 2022 - GLIPv2 Unifying Localization and Vision-Language Understanding
25 pages
2 - 23 - A Survey of Vision-Language Pre-Training From The Lens of Multimodal Machine Translation
No ratings yet
2 - 23 - A Survey of Vision-Language Pre-Training From The Lens of Multimodal Machine Translation
10 pages
Vilbert: Pretraining Task-Agnostic Visiolinguistic Representations For Vision-And-Language Tasks
No ratings yet
Vilbert: Pretraining Task-Agnostic Visiolinguistic Representations For Vision-And-Language Tasks
11 pages
Chapter 9, Understanding The Structure of Sentences - Parts 1 + 2
No ratings yet
Chapter 9, Understanding The Structure of Sentences - Parts 1 + 2
60 pages
[email protected] - 44
No ratings yet
[email protected] - 44
25 pages
Harwath IJCV 2019
No ratings yet
Harwath IJCV 2019
22 pages
Seeing Is Understanding
No ratings yet
Seeing Is Understanding
24 pages
Eye Movements and Lexical Access in Spoken-Language Comprehension: Evaluating A Linking Hypothesis Between Fixations and Linguistic Processing
No ratings yet
Eye Movements and Lexical Access in Spoken-Language Comprehension: Evaluating A Linking Hypothesis Between Fixations and Linguistic Processing
24 pages
Chapter #6 Disambiguation
No ratings yet
Chapter #6 Disambiguation
13 pages
Video-LLaMA: Advanced Audio-Visual Model
No ratings yet
Video-LLaMA: Advanced Audio-Visual Model
7 pages
Visually Grounded Language Learning
No ratings yet
Visually Grounded Language Learning
67 pages
NLP Final Notes
No ratings yet
NLP Final Notes
47 pages
NLP Unit-3
No ratings yet
NLP Unit-3
37 pages
HET 524 Sentence Processing
No ratings yet
HET 524 Sentence Processing
88 pages
6454 31248 1 PB
No ratings yet
6454 31248 1 PB
7 pages
W Pg#s
No ratings yet
W Pg#s
67 pages
Cse 4022
No ratings yet
Cse 4022
284 pages
Chapter 5 - Communication Perceving and Acting
No ratings yet
Chapter 5 - Communication Perceving and Acting
20 pages
Topic 5 - Sentence Processing
No ratings yet
Topic 5 - Sentence Processing
33 pages
Instant Download New Language Technologies and Linguistic Research A Two Way Road 1st Edition Sandra Maria Aluisio PDF All Chapter
100% (22)
Instant Download New Language Technologies and Linguistic Research A Two Way Road 1st Edition Sandra Maria Aluisio PDF All Chapter
81 pages
A Memory-Based Model of Syntactic Analysis: Data-Oriented Parsing
No ratings yet
A Memory-Based Model of Syntactic Analysis: Data-Oriented Parsing
41 pages
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
No ratings yet
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
32 pages
Makalah Psycholinguistics
No ratings yet
Makalah Psycholinguistics
14 pages
Computational Methods For Integrating Vision and Language: Kobus Barnard
No ratings yet
Computational Methods For Integrating Vision and Language: Kobus Barnard
229 pages
NLP Conventional
No ratings yet
NLP Conventional
27 pages
Capstone Report - Updated
No ratings yet
Capstone Report - Updated
27 pages
Sementic Interpretation - Ambiguity and Disambiguity: BY, Dept. of Computer Engineering
No ratings yet
Sementic Interpretation - Ambiguity and Disambiguity: BY, Dept. of Computer Engineering
19 pages
59 - Paper 27
No ratings yet
59 - Paper 27
3 pages
Adolphs Multimodal Corpora
No ratings yet
Adolphs Multimodal Corpora
4 pages
Lecture 06
No ratings yet
Lecture 06
14 pages
NLP Asgn1
No ratings yet
NLP Asgn1
7 pages
Assignment 02
No ratings yet
Assignment 02
1 page
New Language Technologies and Linguistic Research A Two Way Road 1st Edition Sandra Maria Aluisio Available Instanly
No ratings yet
New Language Technologies and Linguistic Research A Two Way Road 1st Edition Sandra Maria Aluisio Available Instanly
102 pages
Speech & Language Processing Course
No ratings yet
Speech & Language Processing Course
39 pages
Enhanced Vocabulary Handling in Recurrent Neural Networks Through Positional Encoding
No ratings yet
Enhanced Vocabulary Handling in Recurrent Neural Networks Through Positional Encoding
7 pages
A Collaborative Painting Experience: Human-Machine Interaction On Canvas
No ratings yet
A Collaborative Painting Experience: Human-Machine Interaction On Canvas
2 pages
Microprocessor Architectures and Their Intersection With Subatomic Particle Physiognomy
No ratings yet
Microprocessor Architectures and Their Intersection With Subatomic Particle Physiognomy
14 pages
Specialized Neural Network For Extracting Financial Trading Signals: The Alpha Discovery Neural Network
No ratings yet
Specialized Neural Network For Extracting Financial Trading Signals: The Alpha Discovery Neural Network
5 pages
CMG
No ratings yet
CMG
131 pages
Synergistic Convergence of Photosynthetic Pathways in Subterranean Fungal Networks
No ratings yet
Synergistic Convergence of Photosynthetic Pathways in Subterranean Fungal Networks
14 pages
Finak
No ratings yet
Finak
32 pages
Chapter-3: Theory of TTS
No ratings yet
Chapter-3: Theory of TTS
26 pages
Video-Based Abnormal Human Behavior Recognition-A Review
No ratings yet
Video-Based Abnormal Human Behavior Recognition-A Review
14 pages
SER Techniques for Researchers
No ratings yet
SER Techniques for Researchers
55 pages
An Adventure of Epic Porpoises
No ratings yet
An Adventure of Epic Porpoises
174 pages
Ugc Net Cs Paper-2 Dbms Data Warehouse Miscellaneous
No ratings yet
Ugc Net Cs Paper-2 Dbms Data Warehouse Miscellaneous
38 pages
A Review On Speech Recognition Challenge
No ratings yet
A Review On Speech Recognition Challenge
7 pages
Hidden Markov Model Run Scoring Baseball
No ratings yet
Hidden Markov Model Run Scoring Baseball
14 pages
Hidden Markov Models for Stock Prediction
No ratings yet
Hidden Markov Models for Stock Prediction
7 pages
1 - Introduction - Rec
No ratings yet
1 - Introduction - Rec
32 pages
JAWS (Screen Reader)
No ratings yet
JAWS (Screen Reader)
18 pages
2.THE BEST Artificial Intelligence Questions and Answers
No ratings yet
2.THE BEST Artificial Intelligence Questions and Answers
32 pages
ZUPT
No ratings yet
ZUPT
16 pages
Twenty-Five Years of Evolution in Speech and Language Processing
No ratings yet
Twenty-Five Years of Evolution in Speech and Language Processing
13 pages
Analysis of Lip-Reading Using Deep Learning Techniques A Review
No ratings yet
Analysis of Lip-Reading Using Deep Learning Techniques A Review
6 pages
Comparison of Urdu Text To Speech Synthesis Using Unit Selection and HMM Based Techniques PDF
No ratings yet
Comparison of Urdu Text To Speech Synthesis Using Unit Selection and HMM Based Techniques PDF
5 pages
ASR - Thesis Report PDF
No ratings yet
ASR - Thesis Report PDF
42 pages
NLP Project Reportttt
No ratings yet
NLP Project Reportttt
9 pages
Upi Fraud Detection Using Machine Learning
No ratings yet
Upi Fraud Detection Using Machine Learning
5 pages
Murphy Book Solution
No ratings yet
Murphy Book Solution
100 pages
Research Methodology and Literature Review: Associate Professor Dr. Rayner Alfred
0% (1)
Research Methodology and Literature Review: Associate Professor Dr. Rayner Alfred
64 pages
A Real-Time ASL Recognition System Using Leap Motion Sensors
No ratings yet
A Real-Time ASL Recognition System Using Leap Motion Sensors
4 pages
Final2008f-Solution SVM PCA HMM BN
No ratings yet
Final2008f-Solution SVM PCA HMM BN
18 pages
Natural Language Processing State of The Art Curre
No ratings yet
Natural Language Processing State of The Art Curre
26 pages
Credit Card Fraud Detection Using Hidden Markov Models
No ratings yet
Credit Card Fraud Detection Using Hidden Markov Models
2 pages
Chinese Speech Recognition DNN Study
No ratings yet
Chinese Speech Recognition DNN Study
6 pages
Deep Sequential Models For Sampling-Based Planning
No ratings yet
Deep Sequential Models For Sampling-Based Planning
8 pages
Deloitte NL Data Analytics Artificial Intelligence Whitepaper Eng - Removed
No ratings yet
Deloitte NL Data Analytics Artificial Intelligence Whitepaper Eng - Removed
19 pages
Reasoning Under Uncertainty Guide
No ratings yet
Reasoning Under Uncertainty Guide
3 pages
Unit-4 Ai
No ratings yet
Unit-4 Ai
28 pages
T6-Hang Li - Machine Learning Methods-Springer (2023) - 230-252
No ratings yet
T6-Hang Li - Machine Learning Methods-Springer (2023) - 230-252
23 pages

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

Uploaded by

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

Uploaded by

Do You See What I Mean?

Syntactic Category Visual Category Words

You might also like