0% found this document useful (0 votes)

17 views6 pages

NLP End Sem

The document is an examination paper for a Natural Language Processing course at the Indian Institute of Science Education and Research, Bhopal. It includes various questions on topics such as activation functions, neural language models, feed-forward neural networks, LSTMs, and attention mechanisms. The exam is structured into two sections, with Section A consisting of ten two-mark questions and Section B containing two five-mark questions.

Uploaded by

ramavaththarun73

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views6 pages

NLP End Sem

Uploaded by

ramavaththarun73

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Indian Institute of Science Education and Research, Bhopal

End Semester Examination Full marks:

Course: DSE 407/607 30
Course Name: Natural Language Processing Date: 22 Nov
2022

Section A: Following questions are two marks sentence are the same. On the other hand, in
each. Answer any ten of them. concatenation, we concat the word embeddings
1. Describe horizontally to get the sentence embedding. Thus, if
a. ReLU we have a list of ‘m’ words, each with dimension
b. Sigmoid 1xn, then, the sentence vector will be of dimension
c. A usecase where you will prefer (a.) 1x mn.
over (b.) and vice-versa. Concatenation preserves the word representation
Ans: (a) ReLU is an activation function defined as, more effectively than pooling. However, at the same
y = ReLU(z) = max(z,0). Where ‘z’ is the input time it requires more memory to store sentence
variable and ‘y’ is the output variable. embeddings. Thus, in case we have high system
memory(GPU/RAM) we can opt for concatenation.
On the contrary, polling seems to be a reasonable
choice for low-memory settings.
3. What are the advantages and
disadvantages of neural language
models(NLM) over n-gram language
models? Describe a case where you would
prefer an n-gram language model over a
(b) Sigmoid is an activation function defined as, NLM.
y = σ(z) = 1/ (1+e ). It squashes the outliers toward Ans: Some of the advantages NLM have on n-gram
−z

0 or 1. based LMs are,

(i) NLM can handle much longer histories,
theoretically up to infinite length.
(ii) NLM can generalize better over contexts of
similar words as compare to n-gram models.
(iii) It is more accurate at word-prediction as
compared to n-gram models.
(c) In the hidden layer units of the feed-forward
neural network, ReLU is used as a preferred The disadvantages of NLM with respect to n-gram
activation function. In the output layer of multi-label models are
classification, sigmoid is used as the preferred (i) NLMs are more complex, slower and need more
activation function. energy to train than n-gram models.
(ii) The decision process of NLMs are not clear as it
2. What are the different ways with which you is in n-gram models.
can convert a list of word embeddings to a
sentence embedding? Define the use-cases So for smaller tasks and in low cost setting an
in which you will prefer one method over n-gram language model is still the right tool. This is
another. because the models don’t expect to encounter
Ans: There are two ways in which we can convert a unknown words etc.
list of word embeddings to a sentence embedding, 4. Answer the following in the context of
(a) pooling, and (b) concatenation. In pooling we language modeling task,
apply a max/min/average operation across each
dimension of word embeddings to get the sentence
embedding. Here the dimensions of word and

–Best of luck–
a. What are the limitations and Ans: Perplexity is used to measure the quality of a
advantages of feed-worword neural language model. Perplexity (PP) of a model θ on an
networks? unseen test set is the inverse probability that θ
b. What are the advantages and assigns to the test set, normalized by the test set
disadvantages of LSTM? length.
Ans: Limitations of FFNN:
1. In FFNN we can not give any arbitrary length
context/previous words to predict for next word.
Thus the decision process is locally dependent.
2. It needs simultaneous access to all context
words. This makes it difficult to use it for applied LM
tasks like on the spot translation etc. Better models will have low perplexity.

Benefits of FFNN: 7. Define

1. The computation is easily parallelizable. a. Teacher forcing. What is its usage?
2. With the help of pretrained embedding, the b. auto-regression? What is its usage?
context can be generalizable. Ans: This idea that we always give the model the
3. It does more More accurate at word-prediction as correct history sequence to predict the next word
compare to n-gram models. (rather than feeding the model its best case from the
previous teacher forcing time step) is called teacher
Advantages of LSTM: forcing. It is used in training of sequence to
1. It solves the issue of vanishing and exploding sequence tasks.
gradients that we encounter in RNN.
2. Context management is better in LSTM, which Autoregression, on the other hand, represents the
makes the decision process more accurate. idea where we give the currently generated history
as input to generate output for the next step. It is
Disadvantages of LSTM: typically used in the deployment/ prediction/ test
1. Inherently sequential nature of recurrent networks phase.
makes it hard to do computation in parallel.
2. Passing information through an extended series 8. What is
of recurrent connections leads to information loss a. stacked-RNN? What are its
and difficulties in training advantages and disadvantages?
5. What are positional encodings? What is its b. Bidirectional RNN? What are its
role in transformer networks? advantages and disadvantages?
Ans: In transformers, they don’t have any notion of
Ans: (a) Stacked RNNs consist of multiple RNN
the relative, or absolute, positions of the tokens in
the input. Thus, to preserve the positional layers where the output of one layer serves as the
information, we modify the input embeddings by input to a subsequent layer.
Advantages:
combining them with positional information specific
1. Stacked RNNs generally outperform single-layer
to each position in an input sequence. The simplest
networks. This is because The network induces
way to generate positional embedding is to consider
representations at differing levels of abstraction
the position of the token from the starting position
across layers.
and append it with input embedding. Alternatively,
Disadvantages:
we can use a static function to map integer inputs to
real valued vectors in a way that captures the 1. Computation is costlier and directly proportional
inherent relationships among the positions. to the number of layers.
(b) Bidirectional RNNs: Bidirectional RNNs are the
6. What is perplexity? How it can help us to RNNs which take both the left and the right context
identify better language models. as input to produce output for the current hidden
state. Technically, they are a combination of two words in a sentence can relate to each other in
simple RNNs, where one RNN reads the left context many different ways simultaneously. For example,
and the other reads the right context to produce distinct syntactic, semantic, and discourse
hidden representations. These representations at relationships can hold between verbs and their
each token are concatenated to get the final arguments in a sentence. With the help of
representation. Other simple ways to combine the multi-head self attention, we can learn these
representations include element-wise addition or aspects simultaneously.
multiplication.
11. What is referential density? How it is
Advantages: causing a problem in machine translation?
1. For tasks where the whole sequence of input is Ans: Referential density of a language refers to the
available, this RNN produces better results due to density of pronoun usage in that language. Some
the availability of both contexts. languages like Japanese and Chinese, for example,
tend to omit pronouns far more than Spanish and
Disadvantages: English. Translating a text from a referentially less
1. We need both left and right context to use this dense language to a more dense language is a
type of RNN. problem because the model must somehow identify
2. Computational complexity is high as compare to each pronoun gaps/zeros, and recover who or what
simple RNNs is being talked about in order to insert the proper
3. It also suffers from vanishing and exploding pronouns.
gradient problems.
Section B: Following questions are five
9. What is residual connection? What are its marks each. Answer any of two.
advantages?
Ans: Residual connections are the connections that 12. Describe the hidden markov model? Can
pass information from a lower layer to a higher layer you use it for named entity recognition? If
without going through the intermediate layer. Here yes, how will you do it?
we add a layer’s input vector to its output vector Ans: An HMM is a probabilistic sequence model:
before passing it forward. given a sequence of units (words, letters,
Advantages: Allowing information from the morphemes, sentences, etc.), it computes a
activation going forward and the gradient going probability distribution over possible sequences of
backwards to skip a layer improves learning and labels and chooses the best label sequence. The
gives higher level layers direct access to information HMM is based on augmenting the Markov chain.
from lower layers. This improves the overall
performance of the model. A Markov chain makes a very strong assumption
that if we want to predict the future in the sequence,
10. Describe multi-head attention. How it is all that matters is the current state. All the states
helpful in building NLP models? before the current state have no impact on th future
Ans: A Multi-head attention layer consists of a set of except via the current state. Formally, consider a
self-attention layers, called heads, that reside in sequence of state variables q1,q2,...,qi, then P(qi =
parallel layers at the same depth in a model, each a|q1...qi−1) = P(qi = a|qi−1).
with its own set of parameters. Each head i is
provided with its own set of key, query and value In many cases, however, the events we are
matrices. These are used to project the inputs into interested in are hidden i.e. we don’t observe them
separate key, value, and query embeddings directly. For example parts of speech tags, named
separately for each head. Given these distinct sets entity tags of a given input text.
of parameters, each head can learn different
aspects of the relationships that exist among inputs
at the same level of abstraction. In NLP, different
A hidden Markov model (HMM) allows us to model access to all of the inputs up to and including the
both observed events and hidden events that we one under consideration, but it has no access to
think of as causal factors in our probabilistic model. information about inputs beyond the current one.
The computation performed for each item is
Let Q = {q1, q2, … qN } is a set of N states. independent of all the other computations. Thus, we
A = {a11 … ai j ...aNN} be the transition probability easily parallelize both forward inference and training
matrix A, where each aij representing the probability of such models.
of moving from state i to state j. At the core of an attention-based approach is the
O = {o1,o2,… oT} be the sequence of T observations, ability to compare an item of interest to a collection
each one drawn from a vocabulary V = {v1, v2,..., vV}. of other items in a way that reveals their relevance
B = bi(ot) be the emission probabilities, each in the current context. In the case of self-attention,
expressing the probability of an observation ot being the set of comparisons are to other elements within
generated from a state qi. a given sequence. The result of these comparisons
is then used to compute an output for the current
π = π1,π2,...,πN be the initial probability distribution input.
over states. πi is the probability that the Markov
chain will start in state i.

A first-order hidden Markov model instantiates two

simplifying assumptions. First one is the markov
assumption, i.e. P(qi | q1,...,qi−1) = P(qi |qi−1)

Second, the probability of an output observation oi

depends only on the state that produced the
observation qi and not on any other states or any
other observations P(oi | q1,...qi,...,qT , o1,...,oi ,...,oT )
= P(oi | qi ). Now, the probability of states is given by For example, computation of y3 is based on a set of
comparisons between the input x3 and its preceding
P(q1, q2, · · · , qn | o1, o2, · · · ,on) = P(o1, o2, · · · , on| elements x1 and x2, and to x3 itself. The simplest
q1, q2, · · · , qn) P(q1, q2, · · · , qn) / P(o1, o2, · · · , on) form of comparison between elements in a
self-attention layer is a dot product.
In this case conditional probabilities are to be
replaced by transition probability matrix values with
above two assumptions.

For NER, the observable states are the

vocabularies, the hidden states would be the named
entity tags. αi j indicates the proportional relevance of each input
13. Describe the mechanism of self-attention to the input element i that is the current focus of
networks. How it is helping the transformer attention. Given the proportional scores in α, we
networks to attain more parallel processing. then generate an output value yi by taking the sum
Ans: Self-attention layer allows a network to directly of the inputs seen so far, weighted by their
extract and use information from arbitrarily large respective α value.
contexts without the need to pass it through
intermediate recurrent connections as in RNNs.

It maps input sequences(x1,...,xn) to output

sequences of the same length (y1,...,yn). When
processing each item in the input, the model has
14. Describe the encoder decoder architecture?
How will you incorporate attention The idea of attention is instead to create the single
mechanisms to make it better? fixed-length vector c by taking a weighted sum of all
Ans: Encoder-decoder network, also known as the encoder hidden states. The weights focus on
sequence to sequence network is a type of (‘attend to’) a particular part of the source text that is
architecture where we map from a sequence of relevant for the token the decoder is currently
input words or tokens to a sequence of tags that are producing.
not merely direct mappings from individual words.

These are models capable of generating

contextually appropriate, arbitrary length, output
sequences. The key idea underlying these networks
is the use of an encoder network that takes an input
sequence and creates a contextualized
representation of it, often called the context. This
representation is then passed to a decoder which
generates a task specific output sequence.

Mathematically, it is defined as

15. What are the advantages of LSTM?

Describe the different gate mechanism it
has.
Where, c is the context vector, hdt / het refers to the Ans: RNNs need to backpropagate the error signal
decoder/encoder hidden state at time-stamp t, yt back through time. As a result, during the backward
refers to the output at time step t. pass of training, the hidden layers are subject to
repeated multiplications, as determined by the
In a typical encoder-decoder network the context length of the sequence. A frequent result of this
vector/ final encode hidden vector is either fed to the process is that the gradients are eventually driven to
first unit of decoder layer or to every unit of decoder zero, a situation called the vanishing gradients
network to generate the output. Thus, the only thing problem.
the decoder knows about the source text is what’s in
this context vector. Thus, this final hidden state is Long short-term memory provides a solution to
thus acting as a bottleneck. above by adding an explicit context layer to the
architecture (in addition to the usual recurrent
The attention mechanism is a solution to the hidden layer) and through the use of specialized
bottleneck problem. It allows the decoder to get neural units that make use of gates to control the
information from all the hidden states of the flow of information into and out of the units that
encoder, not just the last hidden state. comprise the network layers
The gates in an LSTM neurons share a common (as opposed to what information needs to be
design pattern i.e. (i) A feedforward layer, (ii) preserved for future decisions).
Followed by a sigmoid activation function, and (iii)
Followed by a pointwise multiplication with the layer
being gated.
Given the appropriate weights for the various gates,
The choice of the sigmoid as the activation function an LSTM accepts as input the context layer, and
arises from its tendency to push its outputs to either hidden layer from the previous time step, along with
0 or 1. Combining this with a pointwise multiplication the current input vector. It then generates updated
has an effect similar to that of a binary mask. Values context and hidden vectors as output.
in the layer being gated that align with values near 1
in the mask are passed through nearly unchanged.
Values corresponding to lower values are
essentially erased.

LSTM has three gates, (i) Forget gate, (ii) Add gate
and (iii) Output gate

Forget gate: The purpose of this gate to delete

information from the context that is no longer
needed. The forget gate computes a weighted sum
of the previous state’s hidden layer and the current
input and passes that through a sigmoid. This mask
is then multiplied element-wise by the context vector
to remove the information from context that is no
longer required.

LSTM Add Gate: Here the task is compute the

actual information we need to extract from the
previous hidden state and current inputs—the same
basic computation we’ve been using for all our
recurrent networks. Then we generate the mask for
the add gate to select the information to add to the
current context.

Next, we add this to the modified context vector to

get our new context vector

LSTM Output Gate: It is used to decide what

information is required for the current hidden state

Horas em Inglês Exercício
91% (11)
Horas em Inglês Exercício
3 pages
Yoruba Lesson Note Week 3
No ratings yet
Yoruba Lesson Note Week 3
9 pages
Cheatsheet Recurrent Neural Networks
No ratings yet
Cheatsheet Recurrent Neural Networks
5 pages
Deep Learning-Question Bank-Module-Wise
67% (3)
Deep Learning-Question Bank-Module-Wise
5 pages
The Perfect' Ebook PDF
0% (1)
The Perfect' Ebook PDF
13 pages
PGDT Module: English for Teachers
100% (1)
PGDT Module: English for Teachers
167 pages
Student Resource Book Answer Key
No ratings yet
Student Resource Book Answer Key
34 pages
Artificial Intelligence Research Paper
No ratings yet
Artificial Intelligence Research Paper
13 pages
(Suarez, F, Kronen, J) On The Formal Cause of Substance
No ratings yet
(Suarez, F, Kronen, J) On The Formal Cause of Substance
221 pages
English Practice Worksheet
No ratings yet
English Practice Worksheet
5 pages
RNN & LSTM Notes
No ratings yet
RNN & LSTM Notes
8 pages
3 Sequence and Language Modeling
No ratings yet
3 Sequence and Language Modeling
56 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
Modul - Kuliah - Bahasa - Inggris - Semester - 1 2018
No ratings yet
Modul - Kuliah - Bahasa - Inggris - Semester - 1 2018
176 pages
231867-06 B Pascal Arrays Function Prosidures and Paradime
100% (1)
231867-06 B Pascal Arrays Function Prosidures and Paradime
9 pages
Lecture8 421
No ratings yet
Lecture8 421
85 pages
Exam ml4nlp1 Hs21.example Solution
No ratings yet
Exam ml4nlp1 Hs21.example Solution
6 pages
CS585 Lecture October15th
No ratings yet
CS585 Lecture October15th
162 pages
2023 07 28 Evolution of Language Models
No ratings yet
2023 07 28 Evolution of Language Models
73 pages
ML 5
No ratings yet
ML 5
20 pages
NLP MCQs
No ratings yet
NLP MCQs
15 pages
50 LLM Interview Questions
100% (2)
50 LLM Interview Questions
56 pages
Pets and Domestic Animals
No ratings yet
Pets and Domestic Animals
17 pages
Practice Question DL Unit-3
No ratings yet
Practice Question DL Unit-3
3 pages
93 Unit 8 Test
No ratings yet
93 Unit 8 Test
4 pages
Eng I STAAR Expository Writing Rubric
No ratings yet
Eng I STAAR Expository Writing Rubric
1 page
What Are Modal Verbs
No ratings yet
What Are Modal Verbs
2 pages
Textile Chemistry CV - Sameer Pandey
No ratings yet
Textile Chemistry CV - Sameer Pandey
4 pages
Bert
No ratings yet
Bert
60 pages
BSE-ODISHA Result of Annual HSC Result 2024 - Board of Secondary Education, Odisha
No ratings yet
BSE-ODISHA Result of Annual HSC Result 2024 - Board of Secondary Education, Odisha
3 pages
RNN-1
No ratings yet
RNN-1
50 pages
Trend
No ratings yet
Trend
47 pages
NLP Short Que Ans
No ratings yet
NLP Short Que Ans
21 pages
Summaries of The Chapters
No ratings yet
Summaries of The Chapters
29 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
Correcting Comma Splices: Five Methods
No ratings yet
Correcting Comma Splices: Five Methods
5 pages
Chapter 1 Solutions
No ratings yet
Chapter 1 Solutions
5 pages
Verb Dan Adverb Nya
No ratings yet
Verb Dan Adverb Nya
13 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
Slides
No ratings yet
Slides
26 pages
Presentación Interactiva - Connectors
No ratings yet
Presentación Interactiva - Connectors
13 pages
Survey On Recurrent Neural Network in Natural Lang
No ratings yet
Survey On Recurrent Neural Network in Natural Lang
5 pages
انجليزي - رابع ابتدائي - مهاراتي الفصل الثالث PDF
No ratings yet
انجليزي - رابع ابتدائي - مهاراتي الفصل الثالث PDF
7 pages
Question Bank - 3
No ratings yet
Question Bank - 3
5 pages
Reading Comprehension - Effective Presentation Tips: A Second Language A Native Speaker
0% (1)
Reading Comprehension - Effective Presentation Tips: A Second Language A Native Speaker
4 pages
Answer Key Class Test 1 Paper3
No ratings yet
Answer Key Class Test 1 Paper3
7 pages
DP Module 5
No ratings yet
DP Module 5
8 pages
Applied NLP - Project - Learner Template
No ratings yet
Applied NLP - Project - Learner Template
5 pages
10 RNN
No ratings yet
10 RNN
56 pages
Imp ML
No ratings yet
Imp ML
8 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
14 LookingForward
No ratings yet
14 LookingForward
48 pages
Itl Week 2 1
No ratings yet
Itl Week 2 1
6 pages
Assignment I
No ratings yet
Assignment I
6 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
Passage 1
No ratings yet
Passage 1
1 page
Deep Learning: Large Language Models
No ratings yet
Deep Learning: Large Language Models
58 pages
Temas para Un Ensayo de Comparación y Contraste
100% (1)
Temas para Un Ensayo de Comparación y Contraste
6 pages
DL Unit-3 Question Bank
No ratings yet
DL Unit-3 Question Bank
39 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
Exam
No ratings yet
Exam
10 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
CT Lembar Kerja Mahasiswa Topik 3
No ratings yet
CT Lembar Kerja Mahasiswa Topik 3
18 pages
RNNs and LSTMs
No ratings yet
RNNs and LSTMs
41 pages
Unit Plan Nouns and Verbs-Portfolio
No ratings yet
Unit Plan Nouns and Verbs-Portfolio
3 pages
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
No ratings yet
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
6 pages
Top 50 LinkedIn LLM Interview Questions
100% (1)
Top 50 LinkedIn LLM Interview Questions
12 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
Genai 2 Marks
No ratings yet
Genai 2 Marks
4 pages
BUS216 Business Email Guide
No ratings yet
BUS216 Business Email Guide
4 pages
8th Learners 8th Grade SA 1 Entertainment and Media
No ratings yet
8th Learners 8th Grade SA 1 Entertainment and Media
2 pages
Elementary Vocabulary Final Exam
No ratings yet
Elementary Vocabulary Final Exam
3 pages
Domande ANN
No ratings yet
Domande ANN
28 pages
Deep Learning 101
No ratings yet
Deep Learning 101
1 page
DL Module 5
No ratings yet
DL Module 5
10 pages
Picture Story GR 3
No ratings yet
Picture Story GR 3
1 page
NN UNIT 5 Notes
No ratings yet
NN UNIT 5 Notes
23 pages
Ajaz Ahmad 101203540
No ratings yet
Ajaz Ahmad 101203540
7 pages
NLP MCQ Advanced Real 1 20
No ratings yet
NLP MCQ Advanced Real 1 20
7 pages
NLP Comprehensive Study Guide Pokhara University Fall 2025
No ratings yet
NLP Comprehensive Study Guide Pokhara University Fall 2025
50 pages
Deep Learning
No ratings yet
Deep Learning
24 pages
LLM Interview Questions PDF
No ratings yet
LLM Interview Questions PDF
12 pages
NLP Question Bank: Chapter-Wise Practice Problems With Solutions
No ratings yet
NLP Question Bank: Chapter-Wise Practice Problems With Solutions
45 pages
Recurrent Neural Networks LSTMS, Transformers, Graph Neural Networks
No ratings yet
Recurrent Neural Networks LSTMS, Transformers, Graph Neural Networks
97 pages
4.2 Sequence2Sequence (RNN)
No ratings yet
4.2 Sequence2Sequence (RNN)
46 pages

NLP End Sem

Uploaded by

NLP End Sem

Uploaded by

Indian Institute of Science Education and Research, Bhopal

End Semester Examination Full marks:

0 or 1. based LMs are,

Benefits of FFNN: 7. Define

A first-order hidden Markov model instantiates two

Second, the probability of an output observation oi

For NER, the observable states are the

It maps input sequences(x1,...,xn) to output

These are models capable of generating

15. What are the advantages of LSTM?

Forget gate: The purpose of this gate to delete

LSTM Add Gate: Here the task is compute the

Next, we add this to the modified context vector to

LSTM Output Gate: It is used to decide what

You might also like