diff --git a/R/LearnerTorch.R b/R/LearnerTorch.R
index fe6526012..4902b25b6 100644
--- a/R/LearnerTorch.R
+++ b/R/LearnerTorch.R
@@ -44,6 +44,17 @@
 #' * multi-class classification: The `factor` target variable of a [`TaskClassif`][mlr3::TaskClassif] is a label-encoded
 #'   [`torch_long`][torch::torch_long] with shape `(batch_size)` where the label-encoding goes from `1` to `n_classes`.
 #'
+#' @section Important Runtime Considerations:
+#' There are a few hyperparameters settings that can have a considerable impact on the runtime of the learner.
+#' These include:
+#'
+#' * `device`: Use a GPU if possible.
+#' * `num_threads`: Set this to the number of CPU cores available if training on CPU.
+#' * `tensor_dataset`: Set this to `TRUE` (or `"device"` if on a GPU) if the dataset fits into memory.
+#' * `batch_size`: Especially for very small models, choose a larger batch size.
+#'
+#' Also, see the *Early Stopping and Internal Tuning* section for how to terminate training early.
+#'
 #' @template param_id
 #' @template param_task_type
 #' @template param_param_vals
diff --git a/man-roxygen/paramset_torchlearner.R b/man-roxygen/paramset_torchlearner.R
index 8753f3843..46e62e5cb 100644
--- a/man-roxygen/paramset_torchlearner.R
+++ b/man-roxygen/paramset_torchlearner.R
@@ -62,7 +62,6 @@
 #' **Dataloader**:
 #' * `batch_size` :: `integer(1)`\cr
 #'   The batch size (required).
-#'   When working with small models or datasets, choosing a larger batch size can considerably speed up training.
 #' * `shuffle` :: `logical(1)`\cr
 #'   Whether to shuffle the instances in the dataset. This is initialized to `TRUE`,
 #'   which differs from the default (`FALSE`).
diff --git a/man/mlr_learners_torch.Rd b/man/mlr_learners_torch.Rd
index 6f2988498..ae9d042b9 100644
--- a/man/mlr_learners_torch.Rd
+++ b/man/mlr_learners_torch.Rd
@@ -60,6 +60,20 @@ is also ensured to be the first factor level) is \code{1} and the negative class
 }
 }
 
+\section{Important Runtime Considerations}{
+
+There are a few hyperparameters settings that can have a considerable impact on the runtime of the learner.
+These include:
+\itemize{
+\item \code{device}: Use a GPU if possible.
+\item \code{num_threads}: Set this to the number of CPU cores available if training on CPU.
+\item \code{tensor_dataset}: Set this to \code{TRUE} (or \code{"device"} if on a GPU) if the dataset fits into memory.
+\item \code{batch_size}: Especially for very small models, choose a larger batch size.
+}
+
+Also, see the \emph{Early Stopping and Internal Tuning} section for how to terminate training early.
+}
+
 \section{Model}{
 
 The Model is a list of class \code{"learner_torch_model"} with the following elements:
@@ -145,7 +159,6 @@ Is initialized to 0.
 \itemize{
 \item \code{batch_size} :: \code{integer(1)}\cr
 The batch size (required).
-When working with small models or datasets, choosing a larger batch size can considerably speed up training.
 \item \code{shuffle} :: \code{logical(1)}\cr
 Whether to shuffle the instances in the dataset. This is initialized to \code{TRUE},
 which differs from the default (\code{FALSE}).
diff --git a/man/mlr_pipeops_torch_model.Rd b/man/mlr_pipeops_torch_model.Rd
index 3998458bf..8d3c6e47f 100644
--- a/man/mlr_pipeops_torch_model.Rd
+++ b/man/mlr_pipeops_torch_model.Rd
@@ -79,7 +79,6 @@ Is initialized to 0.
 \itemize{
 \item \code{batch_size} :: \code{integer(1)}\cr
 The batch size (required).
-When working with small models or datasets, choosing a larger batch size can considerably speed up training.
 \item \code{shuffle} :: \code{logical(1)}\cr
 Whether to shuffle the instances in the dataset. This is initialized to \code{TRUE},
 which differs from the default (\code{FALSE}).