Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 45393e5

Browse files
committed
Fix all warnings
1 parent dee5503 commit 45393e5

5 files changed

Lines changed: 75 additions & 75 deletions

File tree

beginner_source/blitz/autograd_tutorial.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -89,10 +89,10 @@
8989
# You should have got a matrix of ``4.5``. Let’s call the ``out``
9090
# *Variable* “:math:`o`”.
9191
# We have that :math:`o = \frac{1}{4}\sum_i z_i`,
92-
# :math:`z_i = 3(x_i+2)^2` and :math:`z_i\bigr\rvert_{x_i=1} = 27`.
93-
# Therefore,
94-
# :math:`\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)`, hence
95-
# :math:`\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5`.
92+
# :math:`z_i = 3(x_i+2)^2` and :math:`z_i\bigr\rvert_{x_i=1} = 27`.
93+
# Therefore,
94+
# :math:`\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)`, hence
95+
# :math:`\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5`.
9696

9797
###############################################################
9898
# You can do many crazy things with autograd!

beginner_source/blitz/tensor_tutorial.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
###############################################################
4444
# .. note::
4545
# ``torch.Size`` is in fact a tuple, so it supports the same operations
46+
#
4647
# Operations
4748
# ^^^^^^^^^^
4849
# There are multiple syntaxes for operations. Let's see addition as an example

beginner_source/former_torchies_tutorial.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,10 @@ In this tutorial, you will learn the following:
77
1. Using torch Tensors, and important difference against (Lua)Torch
88
2. Using the autograd package
99
3. Building neural networks
10+
1011
- Building a ConvNet
1112
- Building a Recurrent Net
13+
1214
4. Use multiple GPUs
1315

1416

beginner_source/nlp/advanced_tutorial.py

Lines changed: 67 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
======================================================
55
66
Dyanmic versus Static Deep Learning Toolkits
7-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7+
--------------------------------------------
88
99
Pytorch is a *dynamic* neural network kit. Another example of a dynamic
1010
kit is `Dynet <https://github.com/clab/dynet>`__ (I mention this because
@@ -47,76 +47,73 @@
4747
the code more closely resembling the host language (by that I mean that
4848
Pytorch and Dynet look more like actual Python code than Keras or
4949
Theano).
50-
"""
5150
51+
Bi-LSTM Conditional Random Field Discussion
52+
-------------------------------------------
5253
53-
#####################################################################
54-
# Bi-LSTM Conditional Random Field Discussion
55-
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56-
#
57-
# For this section, we will see a full, complicated example of a Bi-LSTM
58-
# Conditional Random Field for named-entity recognition. The LSTM tagger
59-
# above is typically sufficient for part-of-speech tagging, but a sequence
60-
# model like the CRF is really essential for strong performance on NER.
61-
# Familiarity with CRF's is assumed. Although this name sounds scary, all
62-
# the model is is a CRF but where an LSTM provides the features. This is
63-
# an advanced model though, far more complicated than any earlier model in
64-
# this tutorial. If you want to skip it, that is fine. To see if you're
65-
# ready, see if you can:
66-
#
67-
# - Write the recurrence for the viterbi variable at step i for tag k.
68-
# - Modify the above recurrence to compute the forward variables instead.
69-
# - Modify again the above recurrence to compute the forward variables in
70-
# log-space (hint: log-sum-exp)
71-
#
72-
# If you can do those three things, you should be able to understand the
73-
# code below. Recall that the CRF computes a conditional probability. Let
74-
# :math:`y` be a tag sequence and :math:`x` an input sequence of words.
75-
# Then we compute
76-
#
77-
# .. math:: P(y|x) = \frac{\exp{(\text{Score}(x, y)})}{\sum_{y'} \exp{(\text{Score}(x, y')})}
78-
#
79-
# Where the score is determined by defining some log potentials
80-
# :math:`\log \psi_i(x,y)` such that
81-
#
82-
# .. math:: \text{Score}(x,y) = \sum_i \log \psi_i(x,y)
83-
#
84-
# To make the partition function tractable, the potentials must look only
85-
# at local features.
86-
#
87-
# In the Bi-LSTM CRF, we define two kinds of potentials: emission and
88-
# transition. The emission potential for the word at index :math:`i` comes
89-
# from the hidden state of the Bi-LSTM at timestep :math:`i`. The
90-
# transition scores are stored in a :math:`|T|x|T|` matrix
91-
# :math:`\textbf{P}`, where :math:`T` is the tag set. In my
92-
# implementation, :math:`\textbf{P}_{j,k}` is the score of transitioning
93-
# to tag :math:`j` from tag :math:`k`. So:
94-
#
95-
# .. math:: \text{Score}(x,y) = \sum_i \log \psi_\text{EMIT}(y_i \rightarrow x_i) + \log \psi_\text{TRANS}(y_{i-1} \rightarrow y_i)
96-
#
97-
# .. math:: = \sum_i h_i[y_i] + \textbf{P}_{y_i, y_{i-1}}
98-
#
99-
# where in this second expression, we think of the tags as being assigned
100-
# unique non-negative indices.
101-
#
102-
# If the above discussion was too brief, you can check out
103-
# `this <http://www.cs.columbia.edu/%7Emcollins/crf.pdf>`__ write up from
104-
# Michael Collins on CRFs.
105-
#
106-
# Implementation Notes
107-
# ~~~~~~~~~~~~~~~~~~~~
108-
#
109-
# The example below implements the forward algorithm in log space to
110-
# compute the partition function, and the viterbi algorithm to decode.
111-
# Backpropagation will compute the gradients automatically for us. We
112-
# don't have to do anything by hand.
113-
#
114-
# The implementation is not optimized. If you understand what is going on,
115-
# you'll probably quickly see that iterating over the next tag in the
116-
# forward algorithm could probably be done in one big operation. I wanted
117-
# to code to be more readable. If you want to make the relevant change,
118-
# you could probably use this tagger for real tasks.
119-
#####################################################################
54+
For this section, we will see a full, complicated example of a Bi-LSTM
55+
Conditional Random Field for named-entity recognition. The LSTM tagger
56+
above is typically sufficient for part-of-speech tagging, but a sequence
57+
model like the CRF is really essential for strong performance on NER.
58+
Familiarity with CRF's is assumed. Although this name sounds scary, all
59+
the model is is a CRF but where an LSTM provides the features. This is
60+
an advanced model though, far more complicated than any earlier model in
61+
this tutorial. If you want to skip it, that is fine. To see if you're
62+
ready, see if you can:
63+
64+
- Write the recurrence for the viterbi variable at step i for tag k.
65+
- Modify the above recurrence to compute the forward variables instead.
66+
- Modify again the above recurrence to compute the forward variables in
67+
log-space (hint: log-sum-exp)
68+
69+
If you can do those three things, you should be able to understand the
70+
code below. Recall that the CRF computes a conditional probability. Let
71+
:math:`y` be a tag sequence and :math:`x` an input sequence of words.
72+
Then we compute
73+
74+
.. math:: P(y|x) = \frac{\exp{(\text{Score}(x, y)})}{\sum_{y'} \exp{(\text{Score}(x, y')})}
75+
76+
Where the score is determined by defining some log potentials
77+
:math:`\log \psi_i(x,y)` such that
78+
79+
.. math:: \text{Score}(x,y) = \sum_i \log \psi_i(x,y)
80+
81+
To make the partition function tractable, the potentials must look only
82+
at local features.
83+
84+
In the Bi-LSTM CRF, we define two kinds of potentials: emission and
85+
transition. The emission potential for the word at index :math:`i` comes
86+
from the hidden state of the Bi-LSTM at timestep :math:`i`. The
87+
transition scores are stored in a :math:`|T|x|T|` matrix
88+
:math:`\textbf{P}`, where :math:`T` is the tag set. In my
89+
implementation, :math:`\textbf{P}_{j,k}` is the score of transitioning
90+
to tag :math:`j` from tag :math:`k`. So:
91+
92+
.. math:: \text{Score}(x,y) = \sum_i \log \psi_\text{EMIT}(y_i \rightarrow x_i) + \log \psi_\text{TRANS}(y_{i-1} \rightarrow y_i)
93+
94+
.. math:: = \sum_i h_i[y_i] + \textbf{P}_{y_i, y_{i-1}}
95+
96+
where in this second expression, we think of the tags as being assigned
97+
unique non-negative indices.
98+
99+
If the above discussion was too brief, you can check out
100+
`this <http://www.cs.columbia.edu/%7Emcollins/crf.pdf>`__ write up from
101+
Michael Collins on CRFs.
102+
103+
Implementation Notes
104+
--------------------
105+
106+
The example below implements the forward algorithm in log space to
107+
compute the partition function, and the viterbi algorithm to decode.
108+
Backpropagation will compute the gradients automatically for us. We
109+
don't have to do anything by hand.
110+
111+
The implementation is not optimized. If you understand what is going on,
112+
you'll probably quickly see that iterating over the next tag in the
113+
forward algorithm could probably be done in one big operation. I wanted
114+
to code to be more readable. If you want to make the relevant change,
115+
you could probably use this tagger for real tasks.
116+
"""
120117
# Author: Robert Guthrie
121118

122119
import torch
@@ -358,7 +355,7 @@ def forward(self, sentence): # dont confuse this with _forward_alg above.
358355

359356
######################################################################
360357
# Exercise: A new loss function for discriminative tagging
361-
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
358+
# --------------------------------------------------------
362359
#
363360
# It wasn't really necessary for us to create a computation graph when
364361
# doing decoding, since we do not backpropagate from the viterbi path

intermediate_source/reinforcement_q_learning.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -465,4 +465,4 @@ def optimize_model():
465465

466466
env.close()
467467
plt.ioff()
468-
plt.show()
468+
plt.show()

0 commit comments

Comments
 (0)