0% found this document useful (0 votes)

9 views10 pages

Assignment4 - Deeplearning

The assignment involves training a vanilla RNN to synthesize English text character by character using the book 'The Goblet of Fire' by J.K. Rowling. Key tasks include preparing data, implementing back-propagation, using the Adam optimization algorithm, and synthesizing text from the trained RNN. The document outlines the mathematical framework for the RNN, the Adam update steps, and provides guidelines for implementing and training the model.

Uploaded by

kenning.max

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views10 pages

Assignment4 - Deeplearning

Uploaded by

kenning.max

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Course: DD2424 - Assignment 4

In this assignment you will train an RNN to synthesize English text character
by character. You will train a vanilla RNN with outputs, as described in
lecture 9, using the text from the book The Goblet of Fire by J.K. Rowling.
The variation of SGD you will use for the optimization will be Adam. The
final version of your code should contain these major components:

• Preparing Data: Read in the training data, determine the number

of unique characters in the text and set up mapping functions - one
mapping each character to a unique index and another mapping each
index to a character.
• Back-propagation: The forward and the backward pass of the back-
propagation algorithm for a vanilla RNN to efficiently compute the
gradients.
• Adam updating of your RNN’s parameters.
• Synthesizing text from your RNN: Given a learnt set of parame-
ters for the RNN, a default initial hidden state h0 and an initial input
vector, x0 , from which to bootstrap from then you will write a function
to generate a sequence of text.

Background 1: A Vanilla RNN

The mathematical details of the RNN you will implement are as follows.
Given a sequence of input vectors, x1 , . . . , xτ , where each xt has size d × 1
and an initial hidden state h0 , the RNN outputs at each time-step t a vector
of probabilities, pt (K × 1), for each possible character and a hidden state
ht+1 for size m × 1. That is
for t = 1, 2, . . . , τ
at = W ht−1 + U xt + b (1)
ht = tanh(at ) (2)
o t = V ht + c (3)
pt = SoftMax(ot ) (4)

The loss for a single labelled sequence is the sum of the cross-entropy loss
for each
τ τ
1X 1X T
L(x1:τ , y1:τ , Θ) = lt = − yt log(pt ) (5)
τ τ
t=1 t=1

1
where Θ = {b, c, W, U, V }, x1:τ = {x1 , . . . , xτ } and y1:τ is defined similarly.
The equations for the gradient computations of the back-propagation algo-
rithm for such an RNN are given in Lecture 9. Note in the lecture notes the
bias vectors have been omitted. It is left as an exercise for you to compute
the gradient w.r.t. the two bias vectors.

Background 2: Adam algorithm for optimization

In this assignment you will implement the variant of an adaptive SGD called
Adam. To refresh your memory the update steps for Adam are defined as
(more details are in the lecture notes)

mθ,t′ = β1 mθ,t′ −1 + (1 − β1 ) gt′ (6)

vθ,t′ = β2 vθ,t′ −1 + (1 − β2 ) gt2′ (7)
′
m̂θ,t′ = mθ,t′ /(1 − β1t ) (8)
′
v̂θ,t′ = vθ,t′ /(1 − β2t ) (9)
η
θ t′ +1 = θ t′ − p m̂θ,t′ (10)
v̂θ,t′ + ϵ

where

• θ is a generic place holder for the parameter vector/matrix under

consideration,

• t′ refers to the iteration of the SGD update (not to be confused with the
t used to denote the input and output vectors of the labelled training
sequence),

• gt′ is the gradient vector ∂L

∂θ and

• in an abuse of notation the operations of division, raising to the power

of two and square root are applied to each entry of the vector/matrix
independently.

The standard default setting for the hyper-parameters are β1 = .9, β1 = .999
and ϵ = 1e − 8.

Exercise 1: Implement and train a vanilla RNN

In the following I will sketch the different parts you will need to write to
complete the assignment. Note it is a guideline. You can, of course, have a
different design, but you should read the outline to help inform how different
parameters and design choices are made.

2
0.1 Read in the data

First you need to read in the training data from the text file of The Goblet
of Fire available for download at the Canvas webpage. To save you some
time here is code that will read in the contents of this text file.

book fname = book dir + ’goblet book.txt’

fid = open(book fname, "r")
book data = fid.read()
fid.close()

All the characters of the book are now in book data. To get a vector con-
taining the unique characters in book data apply

unique chars = list(set(book data))

Once you have this list then its length K corresponds to the dimensionality
of the output and also the input vector of your RNN.
To allow you to easily go between a character and its one-hot encoding and
in the other direction you should initialize two dictionaries - char to ind
and ind to char. Then for char to ind you should fill in the characters
in your alphabet as its keys and create an integer for its value (keep things
simple and use where the character appears in the vector unique chars as
its value). And similarly for ind to char fill in the integers 0 to K-1 as its
keys and assign the appropriate character value for each integer. You will use
these dictionaries when you convert a sequence of characters into a sequence
of vectors of one-hot encodings and then when you convert a synthesized
sequence of one-hot encodings back into a sequence of characters.

0.2 Set hyper-parameters & initialize the RNN’s parameters

The one hyper-parameter you need to define the RNN’s architecture is the
dimensionality of its hidden state m. For this assignment you should set
m=100. The other hyper-parameters you need to set are those associated
with training and these are the learning rate eta and the length of the
input sequences (seq length) you use during training. Here are the default
settings for this assignment eta=.001 and seq length=25.
In my code I found it easiest to store the parameters of the network in a
dictionary called RNN. I initialized the bias vectors RNN[’b’] and RNN[’c’]
to be zero vectors of length m×1 and K×1. Note for this task the dimen-
sionality of the input and output vectors are the same. While the weight
matrices are randomly initialized as

3
RNN[’U’] = (1/np.sqrt(2*K))*rng.standard normal(size = (m, K))
RNN[’W’] = (1/np.sqrt(2*m))*rng.standard normal(size = (m, m))
RNN[’V’] = (1/np.sqrt(m))*rng.standard normal(size = (K, m))

where rng is defined as in Assignment 1. Please note due to how arrays are
normally stored in memory in python it is probably more efficient to store
each input example in X as a row vector as opposed to a column vector. In
this case then X would have size seq len × K and the RNN’s parameters
should have transposed dimensionality that is RNN[’U’] has size K × m
and RNN[’U’] size m × K. And then applying RNN[’U’] to X is done with
np.matmul(X, RNN[’U’]) etc. For the rest of this document it is assumed
that X stores each separate input as a column but converting to the row
storage is straightforward and just requires converting the vectors in the
equations in the first of this document from column vectors to row vectors
and making the adjustments accordingly.

0.3 Synthesize text from your randomly initialized RNN

Before you begin training your RNN, you should write a function that will
synthesize a sequence of characters using the current parameter values in
your RNN. Besides RNN, it will take as input a vector h0 (the hidden state
at time 0), another vector x0 which will represent the first (dummy) input
vector to your RNN (it can be some character like afull-stop), and an integer
n denoting the length of the sequence you want to generate. In the body of
the function you will write code to implement the equations (1-4). There is
just one major difference - you have to generate the next input vector xnext
from the current input vector x. At each time step t when you generate a
vector of probabilities for the labels, you then have to sample a label (i.e. an
integer) from this discrete probability distribution. This sample will then
be the (t + 1)th character in your sequence and will be the input vector for
the next time-step of your RNN.
Here is one way to randomly select a character based on the output proba-
bility scores p:

cp = np.cumsum(p, axis=0)
a = rng.uniform(size=1)
ii = np.argmax(cp - a > 0)

First you compute the vector containing the cumulative sum of the proba-
bilities. Then you generate a random draw, a, from a uniform distribution
in the range 0 to 1. Next you find the index 0≤ii≤K-1 such that cp[ii-1]
≤ a ≤ cp[ii]. You should store each index you sample for 0≤t≤n-1 and
let your function output the matrix Y (size K×n) where Y is the one-hot

4
encoding of each sampled character. Given Y you can then use ind to char
to convert it to a sequence of characters and view what text your RNN has
generated.

0.4 Implement the forward & backward pass of back-prop

Next up is writing the code to compute the gradients of the loss w.r.t. the
parameters of the model. While you write this code, you should use the first
seq length characters of book data as your labelled sequence for debugging
that is

X chars = book data[0:seq length]

Y chars = book data[1:seq length+1]

Note the label for an input character is the next character in the book.
Once you have X chars and Y chars, you then have to convert them to the
matrices X and Y containing the one-hot encoding of the characters of the
sequence. Both X and Y have size K×seq length and each column of the
respective matrices corresponds to an input vector and its target output
vector. You should also set h0 to the zero vector. Given this labelled
sequence and initial hidden state you are in a position to write and call a
function that performs the forward-pass of the back-prop algorithm. This
function should apply the equations (1-4) to the input data just described
and return the loss and also the final and intermediary output vectors at
each time step needed by the backward-pass of the algorithm.
Once you have computed the forward-pass then the next step is to write the
code for the backward pass of the back-prop algorithm. Here you should
implement the equations given in Lecture 9. As per usual you should store
the computed gradients in a dictionary. This will allow you to write more
streamlined code for gradient checking and to implement the Adam updates
that is something akin to (but where you will have the updates of the Adam algorithm
as opposed to vanilla SGD):

for kk in grads.keys():
RNN[kk] = RNN[kk] - eta * grads[kk]
end

After you have written the code to compute the forward and backward pass,
you then have to, as per usual check your gradient computations. On the
Canvas website I have provided a skeleton version of the PyTorch code, with
lines missing in the forward pass you have to fill in, needed to calculate the
gradients with its automatic differentiation engine. You need to write these
lines with the appropriate torch operations to compute them instead of
numpy ones. To avoid numerical issues you should do your checks with a
network with m=10 and seq length=25.

5
0.5 Train your RNN using Adam

You are now ready to write the high-level loop to train your RNN with the
text in book data. The general high-level approach will be as follows. Let
e (initialized to 0) be the integer that keeps track of where in the book
you are. At each iteration of the SGD training you should grab a labelled
training sequence of length seq length characters. Thus your sequence of
input characters corresponds to book data[e:e+seq length] and the labels
for this sequence is book data[e+1:e+seq length+1]. You should convert
these sequence of characters into the matrices X and Y (the one-hot encoding
vectors of each character in the input and output sequences).
However, before you pass this labelled sequence into your forward and back-
ward functions you also need to define hprev. If e=0 then hprev should be
the zero vector while if e>0 then hprev should be set to the last computed
hidden state by the forward pass in the previous iteration. Thus (hopefully)
you have a hprev that has stored the context of all the prior characters it
has seen so far in the book! Now you have all the inputs needed for the
forward and backward pass functions to compute the gradient. Once you
have computed the gradients then you can apply the Adam update step to
all the parameters of your RNN.
Your forward pass function should also return the loss for the labelled train-
ing sequence. As we are implementing SGD the loss from one training
sequence to the next will vary alot and also it is too expensive to compute
the loss of the entire training data, it useful to keep track of a smoothed
version of the loss over the iterations with a weighted sum of the smoothed
loss and the current loss such as:

smooth loss = .999* smooth loss + .001 * loss;

You should print out smooth loss regularly (say after every 100th update
step) to see if the smoothed loss is, in general, reducing. What I found is
that learning is initially very fast and then it slows. After the 1st epoch
learning is slower and you can see the smoothed loss going up and down
according to which part of the novel is harder or easier to predict, but at
corresponding points in the novel there is a general trend for the smoothed
loss to get gradually smaller. For reference the smooth loss at beginning
of the 3rd epoch of training, ∼133,000 update steps, for me was ∼ 1.55.
You should also synthesize text (of length 200 characters) from your RNN
regularly (say after every 1,000th update step) during training (you can let
out a shout of hurrah when you see your first synthesized Harry, Hermione,
Dumbledore, or . . .). This allows you to see if your training is doing some-
thing sensible. You can do this by calling your function where h0 is the same

6
hprev as used in the forward pass and x0 is X[:, 0:1] (the first character
of the labelled input sequence for the current iteration).
At the end of an update step you should then increase your the counter e by
seq length. If this results in e>len(book data)-seq length-1 then you
should reset e to be 0 and loop through the characters in the book again.
When you reset e you have completed one epoch of training. Also when you
reset e to 0 you should also reset hprev to the zero-vector.
To help you debug here are snapshots of text sequences I generated at the
different stages of my training:
iter = 1, smooth loss=4.604919523909445
HFjY .)1B}d"^Ljt’ü"iui04!.’"W(Kf
_jUPhTSnpGKnUe’)SoCzklq9D-H jm6Qci}3:r)l.OYk-ornyskWnüa99n•4};JvzWzqssH 1’9"?Y/C7)cugdüh!6aap0üSwT3F7:uG2G_4"s9x!ac0bHIty2d0 0_gRzk04ZE
ITbk}qFU6}T:hQ•(DyOBYc):C9HDhYM

iter = 1,000, smooth loss=3.380624181004383

ved wasderon.. "oo"r
sob waxe nlehe singe vfsrle dnd if id ans ofe too sHerhing dinhee hoack hinprewafed ce hotak dis woat wimedhldgeY,I. jramd doent eard.kanmth srend?no

iter = 4,000, smooth loss=2.2598227679427607

It tily."
"ht blantehe hhinn whe five hess add bony.

He," s." Heofing, cher comt twer suvery thouket llichuther. Wetsext Mrpange coemetwoo dha fioul ofon the ickearimtly of I to Harov statteapeters. e

iter = 30,000, smooth loss=1.7861039794613953:

less at they lords, Harry punte bubben angry baster mince wroue a sawn to be shony there zork. . stow hit lead ressered to loos.
"I ow whase, seid like, tid squacly mpeoved lark that the glorn, sum th

iter = 150,000, smooth loss=1.5343960296531385:

ngraction thank amplewing.
"This moing the mart noomed almant wesesiven name. You took air had bark"
Harry. . . But the toreed an hour a saver to weet Hawr down said, spook fintort. "Sire Cumbbattani

To complete the assignment:

To pass the assignment you need to upload:

1. The code for this assignment.

2. A brief pdf report with the following content:

i) State how you checked your analytic gradient computations and

whether you think that your gradient computations are bug free
for your RNN.
ii) Include a graph of the smooth loss function for a longish training
run (at least 2 epochs).
iii) Show the evolution of the text synthesized by your RNN during
training by including a sample of synthesized text (200 characters
long) before the first and before every 10,000th update steps when
you train for 100,000 update steps.

7
iv) A passage of length 1000 characters synthesized from your best
model (the one that achieved the lowest loss).

Exercise 2: Optional for bonus points

Improve the performance of the network

There are lots of ways the RNN you have just trained could be improved
upon. Here are some avenues you can explore:

1. In the basic assignment each training sequence was visited in the same
order for each epoch of training. It might be better to train with
sequences from random locations in the text at each update iteration.
Training in this fashion means hprev has to be re-set before calculating
the gradients. Or one could use a compromise solution and split the
book into L chunks (perhaps randomise this for each epoch) and then
randomly choose the order of these chunks to use for training and then
run through each chunk sequentially as in the basic assignment. For
this scenario then hprev only has to be re-set for each chunk. I have
not tried either approach so it would be interesting to see if either has
any effect on speed of convergence or the results. To get a measure of
performance you could set aside one chunk of the book as a validation
set and use this to see if there is any quantitative difference between
the predictions made on this set with the two different ways of training.
(1 point)

2. In the basic assignment we effectively had batches of size 1. In con-

junction with the previous approach it would be interesting to see if
batches bigger than one could be used to speed up convergence (with
respect to the number of update steps) and also result in a better
trained model see the previous comments. (1 point)

3. When generating text people play around with how they sample from
the discrete probability vector output by the RNN. One simple idea is
to introduce a temperature parameter T when applying the SoftMax
operator that is:

exp(ot /T )
p̃t = (11)
1Texp(ot /T )

Setting T ∈ [0, 1) makes the distribution more peaky then the default
SoftMax and the “peakiness” increases as T decreases. Low temper-
atures skew the distribution towards the high probability characters

8
and if you sample from p̃t instead of pt then you will probably in-
crease the quality of the text generated, however, it will decrease the
diversity of the text generated.
The paper The Curious Case of Neural Text Degeneration by Ari
Holtzman et al., ICLR 2020 proposes a variation on this alternative
called Nucleus Sampling. Here you define a threshold θ ∈ (0, 1]. Let
the indices i1 , i2 , . . . , iK correspond to the indices of pt when it is
sorted in descending order (ie pt,i1 is the highest entry in pt and pt,iK
is the lowest). Then find the smallest integer kt s.t.
kt
X
pt,ij ≥ θ (12)
j=1

then let
kt
X
′
p = pt,ij (13)
j=1

Create a new probability vector p̃t to sample from during the text
generation phrase by setting for i = 1, . . . , K
(
pt,i /p′ if pt,i ≥ pt,ikt
p̃t,i = (14)
0 otherwise

In effect you will only sample from the top kt probabilities. The size of
the sampling set will differ from one time step to the next depending
on the shape of pt and the threshold θ.
You should use the two different strategies for sampling, temperature
and Nucleus Sampling, with a low, medium and high values of their
respective parameters T and θ and show examples of the qualitative
effect on the text generated. (If you were to continue in this direction
in a more quantitative manner then you would to define a quality
metric such as BLEU/PERPLEXITY or human evaluation.) (2 points)

4. You can speed up your gradient computations with the following

• Some of the computations in the forward and backward pass can

be pre-computed with matrix operations and stored before en-
tering the for loop through time and then accessing the pre-
computed quantities. Similarly some operations can be computed
after the for loop and applied as a batch matrix operation if the
appropriate data is stored such as a the final output and softmax
layers in the forward pass.

9
• The matrix X is a one-hot encoding of the character data and is
very sparse. Thus in the forward and backward passes instead of
computing matrix multiplications with sparse matrices one can
instead use the indices of the non-zero entries of X to speed up
computations. This is especially relevant for the gradient com-
putations of U.
• In the backward pass the gradient computations for W require
computing the outer product of two vectors. If you use np.matmul(.,.)
to compute this outer product, replacing it with np.outer(.,.)
should lead to speed ups.

Make these changes and report by how much you were able to speed
up your code. It would be also fun to compare the speed of your fast
gradient computations to those of pytorch. (1 point)

Bonus Points Available: You can complete any of the above suggestions and
earn to a maximum of 4 bonus points. To get the bonus point you must
submit:

1. Your code.

2. A pdf document reporting briefly on the upgrades you performed to

the basic assignment and the results you achieved.

Mathematics For Electrical Science and Physical Science, M-1, S2
No ratings yet
Mathematics For Electrical Science and Physical Science, M-1, S2
4 pages
Preparation of Specimens FR Immunohistochemistry - PPT (2) - 1
No ratings yet
Preparation of Specimens FR Immunohistochemistry - PPT (2) - 1
33 pages
Multicast Sockets Overview and Practical Java Example 1
No ratings yet
Multicast Sockets Overview and Practical Java Example 1
10 pages
Electromagnetism Research Paper
No ratings yet
Electromagnetism Research Paper
3 pages
2023 Assessments Final
No ratings yet
2023 Assessments Final
12 pages
A Quick Recap: Artificial Intelligence LAB
No ratings yet
A Quick Recap: Artificial Intelligence LAB
29 pages
Making Salts
No ratings yet
Making Salts
29 pages
PDL 06-Merged
No ratings yet
PDL 06-Merged
8 pages
Project Report (Org) 4
No ratings yet
Project Report (Org) 4
49 pages
7th Guide
No ratings yet
7th Guide
2 pages
UNIT5
No ratings yet
UNIT5
81 pages
Lec02 RNN
No ratings yet
Lec02 RNN
52 pages
Engine Diagrams Cummins
100% (3)
Engine Diagrams Cummins
34 pages
Deep Learning Cat 2
No ratings yet
Deep Learning Cat 2
14 pages
Fluid Statics and Fluid Dynamics General Physics 1
No ratings yet
Fluid Statics and Fluid Dynamics General Physics 1
41 pages
RES-TLL008F21-6036 Lab9
No ratings yet
RES-TLL008F21-6036 Lab9
10 pages
Mastercam PDF
0% (1)
Mastercam PDF
2 pages
RNN IITMumbai
No ratings yet
RNN IITMumbai
9 pages
11 RNN
No ratings yet
11 RNN
32 pages
Cours9a RNN
No ratings yet
Cours9a RNN
29 pages
VAM Stock - 20240516
No ratings yet
VAM Stock - 20240516
16 pages
AD3511 - Deep Learning Lab Manual
No ratings yet
AD3511 - Deep Learning Lab Manual
53 pages
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-02-28 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-02-28 Reference-Material-I
39 pages
Sequence Models
No ratings yet
Sequence Models
73 pages
Sequence Transduction With Recurrent Neural Networks: Alex Graves
No ratings yet
Sequence Transduction With Recurrent Neural Networks: Alex Graves
9 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
86 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
RNN
No ratings yet
RNN
8 pages
A Comparison of LSTM and GRU Networks For Learning Symbolic Sequences
No ratings yet
A Comparison of LSTM and GRU Networks For Learning Symbolic Sequences
15 pages
RNN LSTM From Scratch - Ipynb
No ratings yet
RNN LSTM From Scratch - Ipynb
55 pages
Topology Concepts for Math Students
No ratings yet
Topology Concepts for Math Students
45 pages
Embedding
No ratings yet
Embedding
55 pages
Ad3511-Deep Learning-Lab Manual
No ratings yet
Ad3511-Deep Learning-Lab Manual
53 pages
CNN and RNN Code
No ratings yet
CNN and RNN Code
10 pages
Deep Learning Manual
No ratings yet
Deep Learning Manual
53 pages
Tensor Flow Chat Bot
No ratings yet
Tensor Flow Chat Bot
44 pages
Exercise 4 Changes
No ratings yet
Exercise 4 Changes
4 pages
Unit 5 DL
No ratings yet
Unit 5 DL
26 pages
Experiment 3.3
No ratings yet
Experiment 3.3
3 pages
Recurrent Neural Nets
No ratings yet
Recurrent Neural Nets
144 pages
Final Project Report On Formulation of A Pesticide (LUBWAMA KENNETH)
No ratings yet
Final Project Report On Formulation of A Pesticide (LUBWAMA KENNETH)
39 pages
Experiment 5
No ratings yet
Experiment 5
8 pages
Recurrent Neural Networks (RNN) : Subtitle
No ratings yet
Recurrent Neural Networks (RNN) : Subtitle
53 pages
Ultra Sensitive TSH Test Report
No ratings yet
Ultra Sensitive TSH Test Report
1 page
CISC 867 Deep Learning: 14. Text Classification With Recurrent Neural Networks and Word Embeddings
No ratings yet
CISC 867 Deep Learning: 14. Text Classification With Recurrent Neural Networks and Word Embeddings
28 pages
2023-Wireless Communications-CEP-Project
No ratings yet
2023-Wireless Communications-CEP-Project
4 pages
The Geek Way Andrew Mcafee Reid Hoffman Download
100% (1)
The Geek Way Andrew Mcafee Reid Hoffman Download
40 pages
MIT 6.S191 - Recurrent Neural Networks
No ratings yet
MIT 6.S191 - Recurrent Neural Networks
84 pages
Unit 5 Updated
No ratings yet
Unit 5 Updated
125 pages
CM Slides On Attention
No ratings yet
CM Slides On Attention
162 pages
The Unreasonable Effectiveness of Recurrent Neural Networks
No ratings yet
The Unreasonable Effectiveness of Recurrent Neural Networks
1 page
Zuber CFT Lectures
No ratings yet
Zuber CFT Lectures
36 pages
DL 8
No ratings yet
DL 8
4 pages
08 NLP With Deep Learning
No ratings yet
08 NLP With Deep Learning
31 pages
Exp 8 Machine Translation
No ratings yet
Exp 8 Machine Translation
11 pages
Chemsheets AS 1051 Hesss Law 2 Combustion
100% (1)
Chemsheets AS 1051 Hesss Law 2 Combustion
2 pages
Diffusion of Solids in Liquids
No ratings yet
Diffusion of Solids in Liquids
8 pages
Exact Solutions of The Sextic Oscillator From The Bi-Confluent Heun Equation
No ratings yet
Exact Solutions of The Sextic Oscillator From The Bi-Confluent Heun Equation
17 pages
RNN Overview: Types, Applications, and Code
No ratings yet
RNN Overview: Types, Applications, and Code
8 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
RNN Model Selection and Analysis
No ratings yet
RNN Model Selection and Analysis
4 pages
Thermal Hydraulics for Engineers
No ratings yet
Thermal Hydraulics for Engineers
85 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
ANN Text and Sequence Processing
No ratings yet
ANN Text and Sequence Processing
33 pages
DL 8
No ratings yet
DL 8
7 pages
6b. Recurrent Neural Networks
No ratings yet
6b. Recurrent Neural Networks
38 pages
Swiss GIS Projection Guide
No ratings yet
Swiss GIS Projection Guide
6 pages
Polynomial Expansion Paper
No ratings yet
Polynomial Expansion Paper
4 pages
Chapter2 Limitations of RNN
No ratings yet
Chapter2 Limitations of RNN
29 pages
LSTM Lecture
No ratings yet
LSTM Lecture
163 pages
Big Data Computing: Week 8 Quiz
No ratings yet
Big Data Computing: Week 8 Quiz
3 pages
Machine Translation Using Natural Language Process
No ratings yet
Machine Translation Using Natural Language Process
6 pages
Data Presentation Methods in Education
No ratings yet
Data Presentation Methods in Education
4 pages
ADW511A
No ratings yet
ADW511A
61 pages
Deep Learning: Sequence Models
No ratings yet
Deep Learning: Sequence Models
85 pages
Text Generation With LSTM Recurrent Neural Networks in Python With Keras
No ratings yet
Text Generation With LSTM Recurrent Neural Networks in Python With Keras
23 pages
Build RNN with Numpy: Step-by-Step Guide
No ratings yet
Build RNN with Numpy: Step-by-Step Guide
36 pages
NN Text Generation Zaid Bouslikhin
No ratings yet
NN Text Generation Zaid Bouslikhin
14 pages
Problem 1 Proposal
No ratings yet
Problem 1 Proposal
24 pages
Ccnet 10f Lec02 ch2
No ratings yet
Ccnet 10f Lec02 ch2
42 pages
Kathrein 80010430 PDF
No ratings yet
Kathrein 80010430 PDF
1 page
6S191 MIT DeepLearning L2
No ratings yet
6S191 MIT DeepLearning L2
85 pages
Activity 1.6
No ratings yet
Activity 1.6
3 pages
LSTM RNNs for Sequence Generation
No ratings yet
LSTM RNNs for Sequence Generation
43 pages
ONGC Spce Tube Product
No ratings yet
ONGC Spce Tube Product
2 pages
Lesson 6: Practical Deep Learning For Coders (V2)
No ratings yet
Lesson 6: Practical Deep Learning For Coders (V2)
21 pages
Skewb Puzzle Solving Guide
No ratings yet
Skewb Puzzle Solving Guide
12 pages
2014 10 Cho EMNLP
No ratings yet
2014 10 Cho EMNLP
11 pages

Assignment4 - Deeplearning

Uploaded by

Assignment4 - Deeplearning

Uploaded by

Course: DD2424 - Assignment 4

• Preparing Data: Read in the training data, determine the number

Background 1: A Vanilla RNN

Background 2: Adam algorithm for optimization

mθ,t′ = β1 mθ,t′ −1 + (1 − β1 ) gt′ (6)

• θ is a generic place holder for the parameter vector/matrix under

• gt′ is the gradient vector ∂L

• in an abuse of notation the operations of division, raising to the power

Exercise 1: Implement and train a vanilla RNN

book fname = book dir + ’goblet book.txt’

unique chars = list(set(book data))

0.2 Set hyper-parameters & initialize the RNN’s parameters

0.3 Synthesize text from your randomly initialized RNN

0.4 Implement the forward & backward pass of back-prop

X chars = book data[0:seq length]

smooth loss = .999* smooth loss + .001 * loss;

iter = 1,000, smooth loss=3.380624181004383

iter = 4,000, smooth loss=2.2598227679427607

iter = 30,000, smooth loss=1.7861039794613953:

iter = 150,000, smooth loss=1.5343960296531385:

To complete the assignment:

To pass the assignment you need to upload:

1. The code for this assignment.

2. A brief pdf report with the following content:

i) State how you checked your analytic gradient computations and

Exercise 2: Optional for bonus points

Improve the performance of the network

2. In the basic assignment we effectively had batches of size 1. In con-

4. You can speed up your gradient computations with the following

• Some of the computations in the forward and backward pass can

2. A pdf document reporting briefly on the upgrades you performed to

You might also like