Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit baaf226

Browse files
author
Svetlana Karslioglu
authored
Pyspelling: add config for beginner/.rst files (#2517)
1 parent b4e6207 commit baaf226

14 files changed

Lines changed: 287 additions & 185 deletions

.pyspelling.yml

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,68 @@ matrix:
5151
- code
5252
- pre
5353
- pyspelling.filters.url:
54+
- name: reST
55+
sources:
56+
- beginner_source/*.rst
57+
dictionary:
58+
wordlists:
59+
- en-wordlist.txt
60+
pipeline:
61+
- pyspelling.filters.text:
62+
- pyspelling.filters.context:
63+
context_visible_first: true
64+
delimiters:
65+
# Ignore text between inline back ticks
66+
- open: '(div style|iframe).*'
67+
close: '\n'
68+
- open: '(- )?(?P<open>`+)'
69+
close: '(?P=open)'
70+
- open: ':figure:.*'
71+
close: '\n'
72+
# Ignore reStructuredText roles
73+
- open: ':(?:(class|file|func|math|ref|octicon)):`'
74+
content: '[^`]*'
75+
close: '`'
76+
- open: ':width:'
77+
close: '$'
78+
# Exclude raw directive
79+
- open: '\.\. (raw|grid-item-card|galleryitem|includenodoc)::.*$\n*'
80+
close: '\n'
81+
# Ignore reStructuredText literals
82+
- open: '::$'
83+
close: '(?P<literal>(?:((?P<indent>[ ]+).*$)|(\n))+)'
84+
# Ignore reStructuredText hyperlinks
85+
- open: '\s'
86+
content: '\w*'
87+
close: '_'
88+
# Ignore hyperlink in the DDP tutorials
89+
- open: '`.*'
90+
close: '`__'
91+
# Ignore reStructuredText header ---
92+
- open: '^'
93+
content: '--*'
94+
close: '$'
95+
# Ignore reStructuredText header '''
96+
- open: '^'
97+
content: '''''*'
98+
close: '$'
99+
# Ignore reStructuredText block directives
100+
- open: '\.\. (code-block|math)::.*$\n*'
101+
content: '(?P<first>(^(?P<indent>[ ]+).*$\n))(?P<other>(^([ \t]+.*|[ \t]*)$\n)*)'
102+
close: '(^(?![ \t]+.*$))'
103+
- open: '\.\. (raw)::.*$\n*'
104+
close: '^\s*$'
105+
# Ignore reStructuredText substitution definitions
106+
- open: '^\.\. \|[^|]+\|'
107+
close: '$'
108+
# Ignore reStructuredText substitutions
109+
- open: '\|'
110+
content: '[^|]*'
111+
close: '\|_?'
112+
# Ignore reStructuredText toctree
113+
- open: '\.\.\s+toctree::'
114+
close: '(?P<toctree>(?:((?P<indent>[ ]+).*$)|(\n))+)'
115+
# Ignore directives
116+
- open: '\.\.\s+(image|include|only)::'
117+
close: '$'
118+
- pyspelling.filters.url:

beginner_source/bettertransformer_tutorial.rst

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ In this tutorial, we show how to use Better Transformer for production
88
inference with torchtext. Better Transformer is a production ready fastpath to
99
accelerate deployment of Transformer models with high performance on CPU and GPU.
1010
The fastpath feature works transparently for models based either directly on
11-
PyTorch core nn.module or with torchtext.
11+
PyTorch core ``nn.module`` or with torchtext.
1212

1313
Models which can be accelerated by Better Transformer fastpath execution are those
14-
using the following PyTorch core `torch.nn.module` classes `TransformerEncoder`,
15-
`TransformerEncoderLayer`, and `MultiHeadAttention`. In addition, torchtext has
14+
using the following PyTorch core ``torch.nn.module`` classes ``TransformerEncoder``,
15+
``TransformerEncoderLayer``, and ``MultiHeadAttention``. In addition, torchtext has
1616
been updated to use the core library modules to benefit from fastpath acceleration.
1717
(Additional modules may be enabled with fastpath execution in the future.)
1818

@@ -32,7 +32,8 @@ To follow this example in Google Colab, `click here
3232

3333
Better Transformer Features in This Tutorial
3434
--------------------------------------------
35-
* Load pre-trained models (pre-1.12 created without Better Transformer)
35+
36+
* Load pretrained models (created before PyTorch version 1.12 without Better Transformer)
3637
* Run and benchmark inference on CPU with and without BT fastpath (native MHA only)
3738
* Run and benchmark inference on (configurable) DEVICE with and without BT fastpath (native MHA only)
3839
* Enable sparsity support
@@ -48,9 +49,9 @@ Additional information about Better Transformer may be found in the PyTorch.Org
4849

4950
1. Setup
5051

51-
1.1 Load pre-trained models
52+
1.1 Load pretrained models
5253

53-
We download the XLM-R model from the pre-defined torchtext models by following the instructions in
54+
We download the XLM-R model from the predefined torchtext models by following the instructions in
5455
`torchtext.models <https://pytorch.org/text/main/models.html>`__. We also set the DEVICE to execute
5556
on-accelerator tests. (Enable GPU execution for your environment as appropriate.)
5657

beginner_source/colab.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -51,18 +51,18 @@ file can't be found.
5151
To fix this, we'll copy the required file into our Google Drive account.
5252

5353
1. Log into Google Drive.
54-
2. In Google Drive, make a folder named **data**, with a subfolder named
55-
**cornell**.
54+
2. In Google Drive, make a folder named ``data``, with a subfolder named
55+
``cornell``.
5656
3. Visit the Cornell Movie Dialogs Corpus and download the movie-corpus ZIP file.
5757
4. Unzip the file on your local machine.
58-
5. Copy the file **utterances.jsonl** to the **data/cornell** folder that you
58+
5. Copy the file ``utterances.jsonl`` to the ``data/cornell`` folder that you
5959
created in Google Drive.
6060

6161
Now we'll need to edit the file in\_ \_Colab to point to the file on
6262
Google Drive.
6363

6464
In Colab, add the following to top of the code section over the line
65-
that begins *corpus\_name*:
65+
that begins ``corpus\_name``:
6666

6767
::
6868

@@ -71,8 +71,8 @@ that begins *corpus\_name*:
7171

7272
Change the two lines that follow:
7373

74-
1. Change the **corpus\_name** value to **"cornell"**.
75-
2. Change the line that begins with **corpus** to this:
74+
1. Change the ``corpus\_name`` value to ``"cornell"``.
75+
2. Change the line that begins with ``corpus`` to this:
7676

7777
::
7878

beginner_source/ddp_series_fault_tolerance.rst

Lines changed: 66 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
1-
`Introduction <ddp_series_intro.html>`__ \|\| `What is DDP <ddp_series_theory.html>`__ \|\| `Single-Node
2-
Multi-GPU Training <ddp_series_multigpu.html>`__ \|\| **Fault
3-
Tolerance** \|\| `Multi-Node
4-
training <../intermediate/ddp_series_multinode.html>`__ \|\| `minGPT Training <../intermediate/ddp_series_minGPT.html>`__
1+
`Introduction <ddp_series_intro.html>`__ \|\|
2+
`What is DDP <ddp_series_theory.html>`__ \|\|
3+
`Single-Node Multi-GPU Training <ddp_series_multigpu.html>`__ \|\|
4+
**Fault Tolerance** \|\|
5+
`Multi-Node training <../intermediate/ddp_series_multinode.html>`__ \|\|
6+
`minGPT Training <../intermediate/ddp_series_minGPT.html>`__
57

68

79
Fault-tolerant Distributed Training with ``torchrun``
@@ -61,8 +63,8 @@ Why use ``torchrun``
6163
don't need to. For instance,
6264

6365
- You don't need to set environment variables or explicitly pass the ``rank`` and ``world_size``; ``torchrun`` assigns this along with several other `environment variables <https://pytorch.org/docs/stable/elastic/run.html#environment-variables>`__.
64-
- No need to call ``mp.spawn`` in your script; you only need a generic ``main()`` entrypoint, and launch the script with ``torchrun``. This way the same script can be run in non-distributed as well as single-node and multinode setups.
65-
- Gracefully restarting training from the last saved training snapshot
66+
- No need to call ``mp.spawn`` in your script; you only need a generic ``main()`` entry point, and launch the script with ``torchrun``. This way the same script can be run in non-distributed as well as single-node and multinode setups.
67+
- Gracefully restarting training from the last saved training snapshot.
6668

6769

6870
Graceful restarts
@@ -84,7 +86,7 @@ For graceful restarts, you should structure your train script like:
8486
save_snapshot(snapshot_path)
8587
8688
If a failure occurs, ``torchrun`` will terminate all the processes and restart them.
87-
Each process entrypoint first loads and initializes the last saved snapshot, and continues training from there.
89+
Each process entry point first loads and initializes the last saved snapshot, and continues training from there.
8890
So at any failure, you only lose the training progress from the last saved snapshot.
8991

9092
In elastic training, whenever there are any membership changes (adding or removing nodes), ``torchrun`` will terminate and spawn processes
@@ -101,52 +103,51 @@ Process group initialization
101103
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
102104

103105
- ``torchrun`` assigns ``RANK`` and ``WORLD_SIZE`` automatically,
104-
amongst `other env
105-
variables <https://pytorch.org/docs/stable/elastic/run.html#environment-variables>`__
106-
107-
.. code:: diff
108-
109-
- def ddp_setup(rank, world_size):
110-
+ def ddp_setup():
111-
- """
112-
- Args:
113-
- rank: Unique identifier of each process
114-
- world_size: Total number of processes
115-
- """
116-
- os.environ["MASTER_ADDR"] = "localhost"
117-
- os.environ["MASTER_PORT"] = "12355"
118-
- init_process_group(backend="nccl", rank=rank, world_size=world_size)
119-
+ init_process_group(backend="nccl")
106+
among `other envvariables <https://pytorch.org/docs/stable/elastic/run.html#environment-variables>`__
107+
108+
.. code-block:: diff
109+
110+
- def ddp_setup(rank, world_size):
111+
+ def ddp_setup():
112+
- """
113+
- Args:
114+
- rank: Unique identifier of each process
115+
- world_size: Total number of processes
116+
- """
117+
- os.environ["MASTER_ADDR"] = "localhost"
118+
- os.environ["MASTER_PORT"] = "12355"
119+
- init_process_group(backend="nccl", rank=rank, world_size=world_size)
120+
+ init_process_group(backend="nccl")
120121
torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))
121122
122-
Use Torchrun-provided env variables
123-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123+
Use torchrun-provided environment variables
124+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
124125

125-
.. code:: diff
126+
.. code-block:: diff
126127
127-
- self.gpu_id = gpu_id
128-
+ self.gpu_id = int(os.environ["LOCAL_RANK"])
128+
- self.gpu_id = gpu_id
129+
+ self.gpu_id = int(os.environ["LOCAL_RANK"])
129130
130131
Saving and loading snapshots
131132
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
132133

133134
Regularly storing all the relevant information in snapshots allows our
134135
training job to seamlessly resume after an interruption.
135136

136-
.. code:: diff
137+
.. code-block:: diff
137138
138-
+ def _save_snapshot(self, epoch):
139-
+ snapshot = {}
140-
+ snapshot["MODEL_STATE"] = self.model.module.state_dict()
141-
+ snapshot["EPOCHS_RUN"] = epoch
142-
+ torch.save(snapshot, "snapshot.pt")
143-
+ print(f"Epoch {epoch} | Training snapshot saved at snapshot.pt")
139+
+ def _save_snapshot(self, epoch):
140+
+ snapshot = {}
141+
+ snapshot["MODEL_STATE"] = self.model.module.state_dict()
142+
+ snapshot["EPOCHS_RUN"] = epoch
143+
+ torch.save(snapshot, "snapshot.pt")
144+
+ print(f"Epoch {epoch} | Training snapshot saved at snapshot.pt")
144145
145-
+ def _load_snapshot(self, snapshot_path):
146-
+ snapshot = torch.load(snapshot_path)
147-
+ self.model.load_state_dict(snapshot["MODEL_STATE"])
148-
+ self.epochs_run = snapshot["EPOCHS_RUN"]
149-
+ print(f"Resuming training from snapshot at Epoch {self.epochs_run}")
146+
+ def _load_snapshot(self, snapshot_path):
147+
+ snapshot = torch.load(snapshot_path)
148+
+ self.model.load_state_dict(snapshot["MODEL_STATE"])
149+
+ self.epochs_run = snapshot["EPOCHS_RUN"]
150+
+ print(f"Resuming training from snapshot at Epoch {self.epochs_run}")
150151
151152
152153
Loading a snapshot in the Trainer constructor
@@ -155,14 +156,14 @@ Loading a snapshot in the Trainer constructor
155156
When restarting an interrupted training job, your script will first try
156157
to load a snapshot to resume training from.
157158

158-
.. code:: diff
159+
.. code-block:: diff
159160
160-
class Trainer:
161-
def __init__(self, snapshot_path, ...):
162-
...
163-
+ if os.path.exists(snapshot_path):
164-
+ self._load_snapshot(snapshot_path)
165-
...
161+
class Trainer:
162+
def __init__(self, snapshot_path, ...):
163+
...
164+
+ if os.path.exists(snapshot_path):
165+
+ self._load_snapshot(snapshot_path)
166+
...
166167
167168
168169
Resuming training
@@ -171,34 +172,35 @@ Resuming training
171172
Training can resume from the last epoch run, instead of starting all
172173
over from scratch.
173174

174-
.. code:: diff
175+
.. code-block:: diff
175176
176-
def train(self, max_epochs: int):
177-
- for epoch in range(max_epochs):
178-
+ for epoch in range(self.epochs_run, max_epochs):
179-
self._run_epoch(epoch)
177+
def train(self, max_epochs: int):
178+
- for epoch in range(max_epochs):
179+
+ for epoch in range(self.epochs_run, max_epochs):
180+
self._run_epoch(epoch)
180181
181182
182183
Running the script
183184
~~~~~~~~~~~~~~~~~~
184-
Simply call your entrypoint function as you would for a non-multiprocessing script; ``torchrun`` automatically
185+
186+
Simply call your entry point function as you would for a non-multiprocessing script; ``torchrun`` automatically
185187
spawns the processes.
186188

187-
.. code:: diff
189+
.. code-block:: diff
188190
189-
if __name__ == "__main__":
190-
import sys
191-
total_epochs = int(sys.argv[1])
192-
save_every = int(sys.argv[2])
193-
- world_size = torch.cuda.device_count()
194-
- mp.spawn(main, args=(world_size, total_epochs, save_every,), nprocs=world_size)
195-
+ main(save_every, total_epochs)
191+
if __name__ == "__main__":
192+
import sys
193+
total_epochs = int(sys.argv[1])
194+
save_every = int(sys.argv[2])
195+
- world_size = torch.cuda.device_count()
196+
- mp.spawn(main, args=(world_size, total_epochs, save_every,), nprocs=world_size)
197+
+ main(save_every, total_epochs)
196198
197199
198-
.. code:: diff
200+
.. code-block:: diff
199201
200-
- python multigpu.py 50 10
201-
+ torchrun --standalone --nproc_per_node=4 multigpu_torchrun.py 50 10
202+
- python multigpu.py 50 10
203+
+ torchrun --standalone --nproc_per_node=4 multigpu_torchrun.py 50 10
202204
203205
Further Reading
204206
---------------

beginner_source/ddp_series_intro.rst

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
1-
**Introduction** \|\| `What is DDP <ddp_series_theory.html>`__ \|\| `Single-Node
2-
Multi-GPU Training <ddp_series_multigpu.html>`__ \|\| `Fault
3-
Tolerance <ddp_series_fault_tolerance.html>`__ \|\| `Multi-Node
4-
training <../intermediate/ddp_series_multinode.html>`__ \|\| `minGPT Training <../intermediate/ddp_series_minGPT.html>`__
1+
**Introduction** \|\| `What is DDP <ddp_series_theory.html>`__ \|\|
2+
`Single-Node Multi-GPU Training <ddp_series_multigpu.html>`__ \|\|
3+
`Fault Tolerance <ddp_series_fault_tolerance.html>`__ \|\|
4+
`Multi-Node training <../intermediate/ddp_series_multinode.html>`__ \|\|
5+
`minGPT Training <../intermediate/ddp_series_minGPT.html>`__
56

67
Distributed Data Parallel in PyTorch - Video Tutorials
78
======================================================
@@ -34,9 +35,9 @@ You will need multiple CUDA GPUs to run the tutorial code. Typically,
3435
this can be done on a cloud instance with multiple GPUs (the tutorials
3536
use an Amazon EC2 P3 instance with 4 GPUs).
3637

37-
The tutorial code is hosted at this `github
38-
repo <https://github.com/pytorch/examples/tree/main/distributed/ddp-tutorial-series>`__. Clone the repo and
39-
follow along!
38+
The tutorial code is hosted in this
39+
`github repo <https://github.com/pytorch/examples/tree/main/distributed/ddp-tutorial-series>`__.
40+
Clone the repository and follow along!
4041

4142
Tutorial sections
4243
-----------------

0 commit comments

Comments
 (0)