File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -195,11 +195,15 @@ def init_model():
195195# GPU compute and the observed speedup may be less significant.
196196#
197197# You may also see different speedup results depending on the chosen ``mode``
198- # argument. Since our model and data are small, we want to reduce overhead as
199- # much as possible, and so we chose ``"reduce-overhead"`` . For your own models,
198+ # argument. The ``"reduce-overhead"`` mode uses CUDA graphs to further reduce
199+ # the overhead of Python . For your own models,
200200# you may need to experiment with different modes to maximize speedup. You can
201201# read more about modes `here <https://pytorch.org/get-started/pytorch-2.0/#user-experience>`__.
202202#
203+ # You may might also notice that the second time we run our model with ``torch.compile`` is significantly
204+ # slower than the other runs, although it is much faster than the first run. This is because the ``"reduce-overhead"``
205+ # mode runs a few warm-up iterations for CUDA graphs.
206+ #
203207# For general PyTorch benchmarking, you can try using ``torch.utils.benchmark`` instead of the ``timed``
204208# function we defined above. We wrote our own timing function in this tutorial to show
205209# ``torch.compile``'s compilation latency.
You can’t perform that action at this time.
0 commit comments