diff --git a/doc/modules/pipeline.rst b/doc/modules/pipeline.rst index 4356b3fe8d640..232b3ed72bbda 100644 --- a/doc/modules/pipeline.rst +++ b/doc/modules/pipeline.rst @@ -16,11 +16,16 @@ into one. This is useful as there is often a fixed sequence of steps in processing the data, for example feature selection, normalization and classification. :class:`Pipeline` serves two purposes here: - **Convenience**: You only have to call ``fit`` and ``predict`` once on your +Convenience and encapsulation + You only have to call ``fit`` and ``predict`` once on your data to fit a whole sequence of estimators. - - **Joint parameter selection**: You can :ref:`grid search ` +Joint parameter selection + You can :ref:`grid search ` over parameters of all estimators in the pipeline at once. +Safety + Pipelines help avoid leaking statistics from your test data into the + trained model in cross-validation, by ensuring that the same samples are + used to train the transformers and predictors. All estimators in a pipeline, except the last one, must be transformers (i.e. must have a ``transform`` method).