pytorch
diff --git a/‎.pyspelling.yml‎
Lines changed: 65 additions & 0 deletions b/‎.pyspelling.yml‎
Lines changed: 65 additions & 0 deletions
diff --git a/‎beginner_source/bettertransformer_tutorial.rst‎
Lines changed: 7 additions & 6 deletions b/‎beginner_source/bettertransformer_tutorial.rst‎
Lines changed: 7 additions & 6 deletions
diff --git a/‎beginner_source/colab.rst‎
Lines changed: 6 additions & 6 deletions b/‎beginner_source/colab.rst‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎beginner_source/ddp_series_fault_tolerance.rst‎
Lines changed: 66 additions & 64 deletions b/‎beginner_source/ddp_series_fault_tolerance.rst‎
Lines changed: 66 additions & 64 deletions
diff --git a/‎beginner_source/ddp_series_intro.rst‎
Lines changed: 8 additions & 7 deletions b/‎beginner_source/ddp_series_intro.rst‎
Lines changed: 8 additions & 7 deletions
@@ -51,3 +51,68 @@ matrix:
         - code
         - pre
   - pyspelling.filters.url:
+- name: reST
+  sources:
+  - beginner_source/*.rst
+  dictionary:
+    wordlists:
+      - en-wordlist.txt
+  pipeline:
+  - pyspelling.filters.text:
+  - pyspelling.filters.context:
+      context_visible_first: true
+      delimiters:
+      # Ignore text between inline back ticks
+      - open: '(div style|iframe).*'
+        close: '\n'
+      - open: '(- )?(?P<open>`+)'
+        close: '(?P=open)'
+      - open: ':figure:.*'
+        close: '\n'
+      # Ignore reStructuredText roles
+      - open: ':(?:(class|file|func|math|ref|octicon)):`'
+        content: '[^`]*'
+        close: '`'
+      - open: ':width:'
+        close: '$'
+      # Exclude raw directive
+      - open: '\.\. (raw|grid-item-card|galleryitem|includenodoc)::.*$\n*'
+        close: '\n'
+      # Ignore reStructuredText literals
+      - open: '::$'
+        close: '(?P<literal>(?:((?P<indent>[ ]+).*$)|(\n))+)'
+      # Ignore reStructuredText hyperlinks
+      - open: '\s'
+        content: '\w*'
+        close: '_'
+      # Ignore hyperlink in the DDP tutorials
+      - open: '`.*'
+        close: '`__'
+      # Ignore reStructuredText header ---
+      - open: '^'
+        content: '--*'
+        close: '$'
+      # Ignore reStructuredText header '''
+      - open: '^'
+        content: '''''*'
+        close: '$'
+      # Ignore reStructuredText block directives
+      - open: '\.\. (code-block|math)::.*$\n*'
+        content: '(?P<first>(^(?P<indent>[ ]+).*$\n))(?P<other>(^([ \t]+.*|[ \t]*)$\n)*)'
+        close: '(^(?![ \t]+.*$))'
+      - open: '\.\. (raw)::.*$\n*'
+        close: '^\s*$'
+      # Ignore reStructuredText substitution definitions
+      - open: '^\.\. \|[^|]+\|'
+        close: '$'
+      # Ignore reStructuredText substitutions
+      - open: '\|'
+        content: '[^|]*'
+        close: '\|_?'
+      # Ignore reStructuredText toctree
+      - open: '\.\.\s+toctree::'
+        close: '(?P<toctree>(?:((?P<indent>[ ]+).*$)|(\n))+)'
+      # Ignore directives
+      - open: '\.\.\s+(image|include|only)::'
+        close: '$'
+  - pyspelling.filters.url:
@@ -8,11 +8,11 @@ In this tutorial, we show how to use Better Transformer for production
 inference with torchtext.  Better Transformer is a production ready fastpath to
 accelerate deployment of Transformer models with high performance on CPU and GPU.
 The fastpath feature works transparently for models based either directly on 
-PyTorch core nn.module or with torchtext.  
+PyTorch core ``nn.module`` or with torchtext.  
 
 Models which can be accelerated by Better Transformer fastpath execution are those
-using the following PyTorch core `torch.nn.module` classes `TransformerEncoder`, 
-`TransformerEncoderLayer`, and `MultiHeadAttention`.  In addition, torchtext has 
+using the following PyTorch core ``torch.nn.module`` classes ``TransformerEncoder``, 
+``TransformerEncoderLayer``, and ``MultiHeadAttention``.  In addition, torchtext has 
 been updated to use the core library modules to benefit from fastpath acceleration.
 (Additional modules may be enabled with fastpath execution in the future.)
 
@@ -32,7 +32,8 @@ To follow this example in Google Colab, `click here
 
 Better Transformer Features in This Tutorial
 --------------------------------------------
-* Load pre-trained models (pre-1.12 created without Better Transformer)
+
+* Load pretrained models (created before PyTorch version 1.12 without Better Transformer)
 * Run and benchmark inference on CPU with and without BT fastpath (native MHA only)
 * Run and benchmark inference on (configurable) DEVICE with and without BT fastpath (native MHA only)
 * Enable sparsity support
@@ -48,9 +49,9 @@ Additional information about Better Transformer may be found in the PyTorch.Org
 
 1. Setup
 
-1.1 Load pre-trained models
+1.1 Load pretrained models
 
-We download the XLM-R model from the pre-defined torchtext models by following the instructions in
+We download the XLM-R model from the predefined torchtext models by following the instructions in
 `torchtext.models <https://pytorch.org/text/main/models.html>`__.  We also set the DEVICE to execute 
 on-accelerator tests.  (Enable GPU execution for your environment as appropriate.)
 
 
@@ -51,18 +51,18 @@ file can't be found.
 To fix this, we'll copy the required file into our Google Drive account.
 
 1. Log into Google Drive.
-2. In Google Drive, make a folder named **data**, with a subfolder named
-   **cornell**.
+2. In Google Drive, make a folder named ``data``, with a subfolder named
+   ``cornell``.
 3. Visit the Cornell Movie Dialogs Corpus and download the movie-corpus ZIP file.
 4. Unzip the file on your local machine.
-5. Copy the file **utterances.jsonl** to the **data/cornell** folder that you
+5. Copy the file ``utterances.jsonl`` to the ``data/cornell`` folder that you
    created in Google Drive.
 
 Now we'll need to edit the file in\_ \_Colab to point to the file on
 Google Drive.
 
 In Colab, add the following to top of the code section over the line
-that begins *corpus\_name*:
+that begins ``corpus\_name``:
 
 ::
 
@@ -71,8 +71,8 @@ that begins *corpus\_name*:
 
 Change the two lines that follow:
 
-1. Change the **corpus\_name** value to **"cornell"**.
-2. Change the line that begins with **corpus** to this:
+1. Change the ``corpus\_name`` value to ``"cornell"``.
+2. Change the line that begins with ``corpus`` to this:
 
 ::
 
 
@@ -1,7 +1,9 @@
-`Introduction <ddp_series_intro.html>`__ \|\| `What is DDP <ddp_series_theory.html>`__ \|\| `Single-Node
-Multi-GPU Training <ddp_series_multigpu.html>`__ \|\| **Fault
-Tolerance** \|\| `Multi-Node
-training <../intermediate/ddp_series_multinode.html>`__ \|\| `minGPT Training <../intermediate/ddp_series_minGPT.html>`__
+`Introduction <ddp_series_intro.html>`__ \|\|
+`What is DDP <ddp_series_theory.html>`__ \|\|
+`Single-Node Multi-GPU Training <ddp_series_multigpu.html>`__ \|\|
+**Fault Tolerance** \|\|
+`Multi-Node training <../intermediate/ddp_series_multinode.html>`__ \|\|
+`minGPT Training <../intermediate/ddp_series_minGPT.html>`__
 
 
 Fault-tolerant Distributed Training with ``torchrun``
@@ -61,8 +63,8 @@ Why use ``torchrun``
 don't need to. For instance,
 
 -  You don't need to set environment variables or explicitly pass the ``rank`` and ``world_size``; ``torchrun`` assigns this along with several other `environment variables <https://pytorch.org/docs/stable/elastic/run.html#environment-variables>`__.
--  No need to call ``mp.spawn`` in your script; you only need a generic ``main()`` entrypoint, and launch the script with ``torchrun``. This way the same script can be run in non-distributed as well as single-node and multinode setups. 
--  Gracefully restarting training from the last saved training snapshot
+-  No need to call ``mp.spawn`` in your script; you only need a generic ``main()`` entry point, and launch the script with ``torchrun``. This way the same script can be run in non-distributed as well as single-node and multinode setups.
+-  Gracefully restarting training from the last saved training snapshot.
 
 
 Graceful restarts
@@ -84,7 +86,7 @@ For graceful restarts, you should structure your train script like:
          save_snapshot(snapshot_path)
 
 If a failure occurs, ``torchrun`` will terminate all the processes and restart them. 
-Each process entrypoint first loads and initializes the last saved snapshot, and continues training from there.
+Each process entry point first loads and initializes the last saved snapshot, and continues training from there.
 So at any failure, you only lose the training progress from the last saved snapshot. 
 
 In elastic training, whenever there are any membership changes (adding or removing nodes), ``torchrun`` will terminate and spawn processes
@@ -101,52 +103,51 @@ Process group initialization
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 -  ``torchrun`` assigns ``RANK`` and ``WORLD_SIZE`` automatically,
-   amongst `other env
-   variables <https://pytorch.org/docs/stable/elastic/run.html#environment-variables>`__
-
-.. code:: diff
-
-   - def ddp_setup(rank, world_size):
-   + def ddp_setup():
-   -     """
-   -     Args:
-   -         rank: Unique identifier of each process
-   -         world_size: Total number of processes
-   -     """
-   -     os.environ["MASTER_ADDR"] = "localhost"
-   -     os.environ["MASTER_PORT"] = "12355"
-   -     init_process_group(backend="nccl", rank=rank, world_size=world_size)
-   +     init_process_group(backend="nccl")
+   among `other envvariables <https://pytorch.org/docs/stable/elastic/run.html#environment-variables>`__
+
+.. code-block:: diff
+
+    - def ddp_setup(rank, world_size):
+    + def ddp_setup():
+    -     """
+    -     Args:
+    -         rank: Unique identifier of each process
+    -         world_size: Total number of processes
+    -     """
+    -     os.environ["MASTER_ADDR"] = "localhost"
+    -     os.environ["MASTER_PORT"] = "12355"
+    -     init_process_group(backend="nccl", rank=rank, world_size=world_size)
+    +     init_process_group(backend="nccl")
          torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))
 
-Use Torchrun-provided env variables
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Use torchrun-provided environment variables
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. code:: diff
+.. code-block:: diff
 
-   - self.gpu_id = gpu_id
-   + self.gpu_id = int(os.environ["LOCAL_RANK"])
+    - self.gpu_id = gpu_id
+    + self.gpu_id = int(os.environ["LOCAL_RANK"])
 
 Saving and loading snapshots
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Regularly storing all the relevant information in snapshots allows our
 training job to seamlessly resume after an interruption.
 
-.. code:: diff
+.. code-block:: diff
 
-   + def _save_snapshot(self, epoch):
-   +     snapshot = {}
-   +     snapshot["MODEL_STATE"] = self.model.module.state_dict()
-   +     snapshot["EPOCHS_RUN"] = epoch
-   +     torch.save(snapshot, "snapshot.pt")
-   +     print(f"Epoch {epoch} | Training snapshot saved at snapshot.pt")
+    + def _save_snapshot(self, epoch):
+    +     snapshot = {}
+    +     snapshot["MODEL_STATE"] = self.model.module.state_dict()
+    +     snapshot["EPOCHS_RUN"] = epoch
+    +     torch.save(snapshot, "snapshot.pt")
+    +     print(f"Epoch {epoch} | Training snapshot saved at snapshot.pt")
 
-   + def _load_snapshot(self, snapshot_path):
-   +     snapshot = torch.load(snapshot_path)
-   +     self.model.load_state_dict(snapshot["MODEL_STATE"])
-   +     self.epochs_run = snapshot["EPOCHS_RUN"]
-   +     print(f"Resuming training from snapshot at Epoch {self.epochs_run}")
+    + def _load_snapshot(self, snapshot_path):
+    +     snapshot = torch.load(snapshot_path)
+    +     self.model.load_state_dict(snapshot["MODEL_STATE"])
+    +     self.epochs_run = snapshot["EPOCHS_RUN"]
+    +     print(f"Resuming training from snapshot at Epoch {self.epochs_run}")
 
 
 Loading a snapshot in the Trainer constructor
@@ -155,14 +156,14 @@ Loading a snapshot in the Trainer constructor
 When restarting an interrupted training job, your script will first try
 to load a snapshot to resume training from.
 
-.. code:: diff
+.. code-block:: diff
 
-   class Trainer:
-      def __init__(self, snapshot_path, ...):
-      ...
-   +  if os.path.exists(snapshot_path):
-   +     self._load_snapshot(snapshot_path)
-      ...
+    class Trainer:
+       def __init__(self, snapshot_path, ...):
+       ...
+    +  if os.path.exists(snapshot_path):
+    +     self._load_snapshot(snapshot_path)
+       ...
 
 
 Resuming training
@@ -171,34 +172,35 @@ Resuming training
 Training can resume from the last epoch run, instead of starting all
 over from scratch.
 
-.. code:: diff
+.. code-block:: diff
 
-   def train(self, max_epochs: int):
-   -  for epoch in range(max_epochs):
-   +  for epoch in range(self.epochs_run, max_epochs):
-         self._run_epoch(epoch)
+    def train(self, max_epochs: int):
+    -  for epoch in range(max_epochs):
+    +  for epoch in range(self.epochs_run, max_epochs):
+          self._run_epoch(epoch)
 
 
 Running the script
 ~~~~~~~~~~~~~~~~~~
-Simply call your entrypoint function as you would for a non-multiprocessing script; ``torchrun`` automatically
+
+Simply call your entry point function as you would for a non-multiprocessing script; ``torchrun`` automatically
 spawns the processes.
 
-.. code:: diff
+.. code-block:: diff
 
-   if __name__ == "__main__":
-      import sys
-      total_epochs = int(sys.argv[1])
-      save_every = int(sys.argv[2])
-   -  world_size = torch.cuda.device_count()
-   -  mp.spawn(main, args=(world_size, total_epochs, save_every,), nprocs=world_size)
-   +  main(save_every, total_epochs)
+    if __name__ == "__main__":
+       import sys
+       total_epochs = int(sys.argv[1])
+       save_every = int(sys.argv[2])
+    -  world_size = torch.cuda.device_count()
+    -  mp.spawn(main, args=(world_size, total_epochs, save_every,), nprocs=world_size)
+    +  main(save_every, total_epochs)
 
 
-.. code:: diff
+.. code-block:: diff
 
-   - python multigpu.py 50 10
-   + torchrun --standalone --nproc_per_node=4 multigpu_torchrun.py 50 10
+    - python multigpu.py 50 10
+    + torchrun --standalone --nproc_per_node=4 multigpu_torchrun.py 50 10
 
 Further Reading
 ---------------
 
@@ -1,7 +1,8 @@
-**Introduction** \|\| `What is DDP <ddp_series_theory.html>`__ \|\| `Single-Node
-Multi-GPU Training <ddp_series_multigpu.html>`__ \|\| `Fault
-Tolerance <ddp_series_fault_tolerance.html>`__ \|\| `Multi-Node
-training <../intermediate/ddp_series_multinode.html>`__ \|\| `minGPT Training <../intermediate/ddp_series_minGPT.html>`__
+**Introduction** \|\| `What is DDP <ddp_series_theory.html>`__ \|\|
+`Single-Node Multi-GPU Training <ddp_series_multigpu.html>`__ \|\|
+`Fault Tolerance <ddp_series_fault_tolerance.html>`__ \|\|
+`Multi-Node training <../intermediate/ddp_series_multinode.html>`__ \|\|
+`minGPT Training <../intermediate/ddp_series_minGPT.html>`__
 
 Distributed Data Parallel in PyTorch - Video Tutorials
 ======================================================
@@ -34,9 +35,9 @@ You will need multiple CUDA GPUs to run the tutorial code. Typically,
 this can be done on a cloud instance with multiple GPUs (the tutorials
 use an Amazon EC2 P3 instance with 4 GPUs).
 
-The tutorial code is hosted at this `github
-repo <https://github.com/pytorch/examples/tree/main/distributed/ddp-tutorial-series>`__. Clone the repo and
-follow along!
+The tutorial code is hosted in this
+`github repo <https://github.com/pytorch/examples/tree/main/distributed/ddp-tutorial-series>`__.
+Clone the repository and follow along!
 
 Tutorial sections
 -----------------