-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Add verbose option for Pipeline and Feature Union. #8568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/pipeline.py
Outdated
return last_step.fit(Xt, y, **fit_params).transform(Xt) | ||
Xt = last_step.fit(Xt, y, **fit_params).transform(Xt) | ||
if self.verbose: | ||
self._print_final_step(final_step_start_time, time_elapsed_so_far) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have included if self.verbose
condition in the method itself, to avoid repeated checks at every calls, but this way it felt semantically more appropriate - "if verbose then print final step".
Else simply the call would state "print final step" while it internally wouldn't have printed if not verbose. I hope that is fine ?
|
This is a neat work... Thanks! |
@@ -148,7 +149,7 @@ object:: | |||
>>> pipe # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS | |||
Pipeline(..., | |||
steps=[('reduce_dim', PCA(copy=True,...)), | |||
('clf', SVC(C=1.0,...))]) | |||
('clf', SVC(C=1.0,...))], verbose=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is an irrelevant detail, maybe use ellipsis? (...
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm ambiguous on this...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well then i'll wait for others' suggestion meanwhile. I'm not sure which should be preferred either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set an analogy, transformer_weights
are shown here in same documentation. They're not having any value in this particular example.
>>> estimators = [('linear_pca', PCA()), ('kernel_pca', KernelPCA())]
>>> combined = FeatureUnion(estimators)
>>> combined # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
FeatureUnion(n_jobs=1,
transformer_list=[('linear_pca', PCA(copy=True,...)),
('kernel_pca', KernelPCA(alpha=1.0,...))],
transformer_weights=None, verbose=False)
We can use this as a tie breaker 😆
@@ -53,6 +53,10 @@ New features | |||
Enhancements | |||
............ | |||
|
|||
- Added optional parameter ``verbose`` in :class:`pipeline.Pipeline` and | |||
:class:`pipeline.FeatureUnion` for showing progress and timing of each | |||
step. :issue:`8568` by :user:`Karan Desai <karandesai-96>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add your name and link to the bottom of the page :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ✅
/ping: @jnothman Resolved conflicts with master which moved forward after I opened my PR. |
@karandesai-96 You should not merge master into your branch but rebase your branch to master. |
@glemaitre I used the web interface for a very small conflict resolving, tried it for the first time. Agreed with you, not impressive. Reverting, rebasing. |
Rebased. |
- Each line printed by Pipeline and FeatureUnion, when their verbosity mode is on, will be 70 characters long.
i often merge master into my branch. usually works just fine. I'll try look
at this soon
…On 10 Apr 2017 3:10 pm, "Karan Desai" ***@***.***> wrote:
Rebased.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67jDPLMIYHq-FvNnJaRYmbQzNPP6ks5rubmogaJpZM4MY6aU>
.
|
I cannot say that this is a good practice or the right way to do it. I
just got it while contributing in scikit-image in which I did merge and
have been forced to reset and rebase:
http://scikit-image.org/docs/0.11.x/contribute.html#divergence-between-upstream-master-and-your-feature-branch
It seems that you have the same in the guide in scikit-learn:
http://scikit-learn.org/stable/developers/contributing.html#how-to-contribute
From the top of the head, I was thinking that the reasons were: merging
make the history difficult to read but rebasing rewrite the history.
--
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
|
I think Github deals with merging in master better than it used to. I no
longer recommend rebases: that requires resolving conflicts on every commit
rather than just between the heads (and squash-merging the final PR can
handle whatever you did along the way as it'll just diff the final heads as
long as they are merge-safe). It also messes up historical commit
references and requires force pushes and other Dangerous Ideas.
…On 12 April 2017 at 21:29, Guillaume Lemaitre ***@***.***> wrote:
I cannot say that this is a good practice or the right way to do it. I
just got it while contributing in scikit-image in which I did merge and
have been forced to reset and rebase:
http://scikit-image.org/docs/0.11.x/contribute.html#
divergence-between-upstream-master-and-your-feature-branch
It seems that you have the same in the guide in scikit-learn:
http://scikit-learn.org/stable/developers/contributing.html#how-to-
contribute
From the top of the head, I was thinking that the reasons were: merging
make the history difficult to read but rebasing rewrite the history.
--
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6wjMpKHgN5NOr0o9iujLHWDvRSa6ks5rvLW0gaJpZM4MY6aU>
.
|
I completely agree with the issue linked to resolving every conflicts,
especially when resurrecting old PRs.
In fact, I personally do not have a strong opinion regarding the policy
to follow. Merge has only 1 con regarding the readability which is
partially solved when we are actually squashing/merge at the end.
Do you think that we should change the contributing guide in this regard
if there is consensus behind a only merging policy?
…--
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
|
I'd be in favour of updating the docs, yes
…On 12 Apr 2017 10:31 pm, "Guillaume Lemaitre" ***@***.***> wrote:
I completely agree with the issue linked to resolving every conflicts,
especially when resurrecting old PRs.
In fact, I personally do not have a strong opinion regarding the policy
to follow. Merge has only 1 con regarding the readability which is
partially solved when we are actually squashing/merge at the end.
Do you think that we should change the contributing guide in this regard
if there is consensus behind a only merging policy?
--
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6wiC4PMDDhZmw4c8nRaBj9sEilLdks5rvMQWgaJpZM4MY6aU>
.
|
|
||
Parameters | ||
---------- | ||
step_info : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this preferred over description
, elapsed
or something else that further ensures consistency?
I would like to see logger.short_format_time(elapsed)
used as in grid search, and this code refactored into utils with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the code in grid search you're talking about? The only logging is within joblib, right?
could you please resolve the conflicts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be an awesome feature to have, but it should really live in a logger or something. Right now it's cluttering up the already complicated pipeline code.
Hey @amueller, sorry I was away for a while I couldn't respond. I agree with your review. I will resolve the conflicts meanwhile I'll look into a more sane way for logging as you say. |
The verbosity from grid search lives in model_selection/_validate.py I
think.
…On 22 Jul 2017 2:00 pm, "Karan Desai" ***@***.***> wrote:
Hey @amueller <https://github.com/amueller>, sorry I was away for a while
I couldn't respond. I agree with your review. I will resolve the conflicts
meanwhile I'll look into a more sane way for logging as you say.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz61rnEjFevRB91GKuw2gr5qQcQuAYks5sQXPXgaJpZM4MY6aU>
.
|
Indeed, and it uses joblib.logger |
I'd like to see this progress... Are you continuing to work on it @karandesai-96 |
Verbosity in pipeline very helpful feature, especially for debugging. Any plans to merge this PR ? |
@EBazarov as soon as there is a reasonable implementation. |
Reference Issue
Fixes #5298, #5321
What does this implement/fix? Explain your changes.
This PR adds a new argument
verbose
toPipeline
andFeatureUnion
, which is False by default. Setting thisTrue
will print useful information during execution. Better to be explained by a code snippet:Common code snippet (imports and data loading):
Pipeline Snippet:
Expected Output:
FeatureUnion Snippet:
Expected Output:
Any other comments?
I had open PRs else where, so it took me a while to get back to this. Thanks @jnothman for mentioning me on issue thread.