DOC a note on data leakage and pipeline (#9510)

jnothman · ogrisel · commit 7761a8e4fdfe · 2017-08-11T16:36:41.000+02:00
diff --git a/doc/modules/pipeline.rst b/doc/modules/pipeline.rst
@@ -16,11 +16,16 @@ into one. This is useful as there is often a fixed sequence
 of steps in processing the data, for example feature selection, normalization
 and classification. :class:`Pipeline` serves two purposes here:
 
-    **Convenience**: You only have to call ``fit`` and ``predict`` once on your
+Convenience and encapsulation
+    You only have to call ``fit`` and ``predict`` once on your
     data to fit a whole sequence of estimators.
-
-    **Joint parameter selection**: You can :ref:`grid search <grid_search>`
+Joint parameter selection
+    You can :ref:`grid search <grid_search>`
     over parameters of all estimators in the pipeline at once.
+Safety
+    Pipelines help avoid leaking statistics from your test data into the
+    trained model in cross-validation, by ensuring that the same samples are
+    used to train the transformers and predictors.
 
 All estimators in a pipeline, except the last one, must be transformers
 (i.e. must have a ``transform`` method).