Added Data Parallelism tutorial in blitz (#174)

hunkim · chsasank · commit 2294ad7d6c15 · 2017-11-14T11:08:26.000+05:30
* Added Data Parallelism tutorial in blitz

* Added  and note in the cifar10 tutorial

* Kill  line and all warnings
diff --git a/_static/img/data_parallel.png b/_static/img/data_parallel.png
diff --git a/beginner_source/blitz/cifar10_tutorial.py b/beginner_source/blitz/cifar10_tutorial.py
@@ -297,6 +297,11 @@ def forward(self, x):
 # - Understanding PyTorch's Tensor library and neural networks at a high level.
 # - Train a small neural network to classify images
 #
+# Training on multiple GPUs
+# -------------------------
+# If you want to see even more MASSIVE speedup using all of your GPUs,
+# please check out :doc:`data_parallel_tutorial`.
+#
 # Where do I go next?
 # -------------------
 #
diff --git a/beginner_source/blitz/data_parallel_tutorial.py b/beginner_source/blitz/data_parallel_tutorial.py
@@ -0,0 +1,254 @@
+"""
+Optional: Data Parallelism
+==========================
+**Authors**: `Sung Kim <https://github.com/hunkim>`_ and `Jenny Kang <https://github.com/jennykang>`_
+
+In this tutorial, we will learn how to use multiple GPUs using ``DataParallel``.
+
+It's very easy to use GPUs with PyTorch. You can put the model on a GPU:
+
+.. code:: python
+
+    model.gpu()
+
+Then, you can copy all your tensors to the GPU:
+
+.. code:: python
+
+    mytensor = my_tensor.gpu()
+
+Please note that just calling ``mytensor.gpu()`` won't copy the tensor
+to the GPU. You need to assign it to a new tensor and use that tensor on the GPU.
+
+It's natural to execute your forward, backward propagations on multiple GPUs. 
+However, Pytorch will only use one GPU by default. You can easily run your 
+operations on multiple GPUs by making your model run parallelly using 
+``DataParallel``:
+
+.. code:: python
+
+    model = nn.DataParallel(model)
+
+That's the core behind this tutorial. We will explore it in more detail below.
+"""
+
+
+######################################################################
+# Imports and parameters
+# ----------------------
+# 
+# Import PyTorch modules and define parameters.
+# 
+
+import torch
+import torch.nn as nn
+from torch.autograd import Variable
+from torch.utils.data import Dataset, DataLoader
+
+# Parameters and DataLoaders
+input_size = 5
+output_size = 2
+
+batch_size = 30
+data_size = 100
+
+
+######################################################################
+# Dummy DataSet
+# -------------
+# 
+# Make a dummy (random) dataset. You just need to implement the
+# getitem 
+#
+
+class RandomDataset(Dataset):
+
+    def __init__(self, size, length):
+        self.len = length
+        self.data = torch.randn(length, size)
+
+    def __getitem__(self, index):
+        return self.data[index]
+
+    def __len__(self):
+        return self.len
+
+rand_loader = DataLoader(dataset=RandomDataset(input_size, 100),
+                         batch_size=batch_size, shuffle=True)
+
+
+######################################################################
+# Simple Model
+# ------------
+# 
+# For the demo, our model just gets an input, performs a linear operation, and 
+# gives an output. However, you can use ``DataParallel`` on any model (CNN, RNN,
+# Capsule Net etc.) 
+#
+# We've placed a print statement inside the model to monitor the size of input
+# and output tensors. 
+# Please pay attention to what is printed at batch rank 0.
+# 
+
+class Model(nn.Module):
+    # Our model
+
+    def __init__(self, input_size, output_size):
+        super(Model, self).__init__()
+        self.fc = nn.Linear(input_size, output_size)
+
+    def forward(self, input):
+        output = self.fc(input)
+        print("  In Model: input size", input.size(), 
+              "output size", output.size())
+
+        return output
+
+
+######################################################################
+# Create Model and DataParallel
+# -----------------------------
+# 
+# This is the core part of the tutorial. First, we need to make a model instance
+# and check if we have multiple GPUs. If we have multiple GPUs, we can wrap 
+# our model using ``nn.DataParallel``. Then we can put our model on GPUs by
+# ``model.gpu()`` 
+# 
+
+model = Model(input_size, output_size)
+if torch.cuda.device_count() > 1:
+  print("Let's use", torch.cuda.device_count(), "GPUs!")
+  # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
+  model = nn.DataParallel(model)
+
+if torch.cuda.is_available():
+   model.cuda()
+
+
+######################################################################
+# Run the Model
+# -------------
+# 
+# Now we can see the sizes of input and output tensors.
+# 
+
+for data in rand_loader:
+    if torch.cuda.is_available():
+        input_var = Variable(data.cuda())
+    else:
+        input_var = Variable(data)
+
+    output = model(input_var)
+    print("Outside: input size", input_var.size(),
+          "output_size", output.size())
+
+
+######################################################################
+# Results
+# -------
+# 
+# When we batch 30 inputs and 30 outputs, the model gets 30 and outputs 30 as
+# expected. But if you have GPUs, then you can get results like this.
+# 
+# 2 GPUs
+# ~~~~~~
+#
+# If you have 2, you will see:
+# 
+# .. code:: bash
+# 
+#     # on 2 GPUs
+#     Let's use 2 GPUs!
+#         In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
+#         In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
+#     Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
+#         In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
+#         In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
+#     Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
+#         In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
+#         In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
+#     Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
+#         In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
+#         In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
+#     Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
+# 
+# 3 GPUs
+# ~~~~~~
+# 
+# If you have 3 GPUs, you will see:
+# 
+# .. code:: bash
+# 
+#     Let's use 3 GPUs!
+#         In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
+#         In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
+#         In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
+#     Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
+#         In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
+#         In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
+#         In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
+#     Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
+#         In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
+#         In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
+#         In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
+#     Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
+#     Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
+# 
+# 8 GPUs
+# ~~~~~~~~~~~~~~
+# 
+# If you have 8, you will see:
+# 
+# .. code:: bash
+# 
+#     Let's use 8 GPUs!
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#     Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#     Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
+#         In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
+#     Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
+#         In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
+#         In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
+#         In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
+#         In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
+#         In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
+#     Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
+# 
+
+
+######################################################################
+# Summary
+# -------
+# 
+# DataParallel splits your data automatically and sends job orders to multiple
+# models on several GPUs. After each model finishes their job, DataParallel
+# collects and merges the results before returning it to you.
+# 
+# For more information, please check out
+# http://pytorch.org/tutorials/beginner/former\_torchies/parallelism\_tutorial.html.
+# 
diff --git a/beginner_source/deep_learning_60min_blitz.rst b/beginner_source/deep_learning_60min_blitz.rst
@@ -24,6 +24,7 @@ Goal of this tutorial:
    /beginner/blitz/autograd_tutorial
    /beginner/blitz/neural_networks_tutorial
    /beginner/blitz/cifar10_tutorial
+   /beginner/blitz/data_parallel_tutorial
 
 .. galleryitem:: /beginner/blitz/tensor_tutorial.py
     :figure: /_static/img/tensor_illustration_flat.png
@@ -37,6 +38,9 @@ Goal of this tutorial:
 .. galleryitem:: /beginner/blitz/cifar10_tutorial.py
     :figure: /_static/img/cifar10.png
 
+.. galleryitem:: /beginner/blitz/data_parallel_tutorial.py
+    :figure: /_static/img/data_parallel.png
+
 .. raw:: html
 
     <div style='clear:both'></div>
diff --git a/intermediate_source/dist_tuto.rst b/intermediate_source/dist_tuto.rst
@@ -1,5 +1,5 @@
 Writing Distributed Applications with PyTorch
-===============================
+=============================================
 **Author**: `Séb Arnold <http://seba1511.com>`_
 
 In this short tutorial, we will be going over the distributed package of PyTorch. We'll see how to set up the distributed setting, use the different communication strategies, and go over some the internals of the package.
@@ -373,7 +373,7 @@ world.
             dist.all_reduce(param.grad.data, op=dist.reduce_op.SUM)
             param.grad.data /= size 
 
-*Et voilà *! We successfully implemented distributed synchronous SGD and
+*Et voilà*! We successfully implemented distributed synchronous SGD and
 could train any model on a large computer cluster.
 
 **Note:** While the last sentence is *technically* true, there are `a

Original file line number	Diff line number	Diff line change
`@@ -297,6 +297,11 @@ def forward(self, x):`
`297`	`297`	`# - Understanding PyTorch's Tensor library and neural networks at a high level.`
`298`	`298`	`# - Train a small neural network to classify images`
`299`	`299`	`#`
	`300`	`+# Training on multiple GPUs`
	`301`	`+# -------------------------`
	`302`	`+# If you want to see even more MASSIVE speedup using all of your GPUs,`
	`303`	+# please check out :doc:`data_parallel_tutorial`.
	`304`	`+#`
`300`	`305`	`# Where do I go next?`
`301`	`306`	`# -------------------`
`302`	`307`	`#`