Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 7761a8e

Browse files
jnothmanogrisel
authored andcommitted
DOC a note on data leakage and pipeline (#9510)
1 parent f2d66b8 commit 7761a8e

File tree

1 file changed

+8
-3
lines changed

1 file changed

+8
-3
lines changed

doc/modules/pipeline.rst

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,16 @@ into one. This is useful as there is often a fixed sequence
1616
of steps in processing the data, for example feature selection, normalization
1717
and classification. :class:`Pipeline` serves two purposes here:
1818

19-
**Convenience**: You only have to call ``fit`` and ``predict`` once on your
19+
Convenience and encapsulation
20+
You only have to call ``fit`` and ``predict`` once on your
2021
data to fit a whole sequence of estimators.
21-
22-
**Joint parameter selection**: You can :ref:`grid search <grid_search>`
22+
Joint parameter selection
23+
You can :ref:`grid search <grid_search>`
2324
over parameters of all estimators in the pipeline at once.
25+
Safety
26+
Pipelines help avoid leaking statistics from your test data into the
27+
trained model in cross-validation, by ensuring that the same samples are
28+
used to train the transformers and predictors.
2429

2530
All estimators in a pipeline, except the last one, must be transformers
2631
(i.e. must have a ``transform`` method).

0 commit comments

Comments
 (0)