Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] Add verbose option for Pipeline and Feature Union. #8568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

[MRG] Add verbose option for Pipeline and Feature Union. #8568

wants to merge 4 commits into from

Conversation

kdexd
Copy link

@kdexd kdexd commented Mar 10, 2017

Reference Issue

Fixes #5298, #5321

What does this implement/fix? Explain your changes.

This PR adds a new argument verbose to Pipeline and FeatureUnion, which is False by default. Setting this True will print useful information during execution. Better to be explained by a code snippet:

Common code snippet (imports and data loading):

from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import Imputer

data = load_iris(return_X_y=True)

Pipeline Snippet:

p = Pipeline([('imputer', Imputer()), ('linreg', LinearRegression())],
             verbose=True)

p.fit(data[0], data[1])
y_pred = p.predict(data[0][0].reshape(1, -1))
print("Predicted: %f" % y_pred)

Expected Output:

[Pipeline] (step 1 of 2) imputer ............................ 0.00053s
[Pipeline] (step 2 of 2) linreg ............................. 0.00043s
[Pipeline] Total time elapsed: .............................. 0.00096s

FeatureUnion Snippet:

f = FeatureUnion([('imputer', StandardScaler()),
                  ('pca', PCA(n_components=2))], verbose=True)

f.fit(data[0], data[1])
data_t = f.transform(data[0][0].reshape(1, -1))
print("Transformed: %r" % data_t)

Expected Output:

[FeatureUnion] (step 1 of 2) imputer ........................ 0.00014s
[FeatureUnion] (step 2 of 2) pca ............................ 0.00032s
[FeatureUnion] Total time elapsed: .......................... 0.00065s
Transformed: array([[-0.90068117,  1.03205722, -1.3412724 , -1.31297673, -2.68420713,
         0.32660731]])

Any other comments?

  • TODO (will keep on editing this description):
  • Initial PR work, add verbosity to Pipeline.
  • Add tests for verbosity of Pipeline.
  • Add verbosity to FeatureUnion.
  • Add tests for verbosity to FeatureUnion.
  • Modify both methods to print in such a way that all lines have 70 characters.
  • Add CHANGELOG entry.

I had open PRs else where, so it took me a while to get back to this. Thanks @jnothman for mentioning me on issue thread.

return last_step.fit(Xt, y, **fit_params).transform(Xt)
Xt = last_step.fit(Xt, y, **fit_params).transform(Xt)
if self.verbose:
self._print_final_step(final_step_start_time, time_elapsed_so_far)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have included if self.verbose condition in the method itself, to avoid repeated checks at every calls, but this way it felt semantically more appropriate - "if verbose then print final step".

Else simply the call would state "print final step" while it internally wouldn't have printed if not verbose. I hope that is fine ?

@kdexd
Copy link
Author

kdexd commented Mar 10, 2017

Pipeline performs stepwise operations sequentially, while FeatureUnion executes parallel jobs for transformers. So for Pipeline, the total time elapsed will be sum of individual times taken at each step (and it is implemented that way too). On the other hand, total time elapsed for a FeatureUnion varies on the number of jobs, hence it is separately monitored, and so it necessarily need not be summation of individual times taken at each step..

@kdexd kdexd changed the title [WIP] Add verbose option for Pipeline and Feature Union. [MRG] Add verbose option for Pipeline and Feature Union. Mar 10, 2017
@raghavrv
Copy link
Member

This is a neat work... Thanks!

@kdexd
Copy link
Author

kdexd commented Mar 10, 2017

Thanks @raghavrv 😄
Also, this PR if merged, should close #5321

@@ -148,7 +149,7 @@ object::
>>> pipe # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
Pipeline(...,
steps=[('reduce_dim', PCA(copy=True,...)),
('clf', SVC(C=1.0,...))])
('clf', SVC(C=1.0,...))], verbose=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an irrelevant detail, maybe use ellipsis? (...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm ambiguous on this...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well then i'll wait for others' suggestion meanwhile. I'm not sure which should be preferred either.

Copy link
Author

@kdexd kdexd Mar 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we set an analogy, transformer_weights are shown here in same documentation. They're not having any value in this particular example.

>>> estimators = [('linear_pca', PCA()), ('kernel_pca', KernelPCA())]
>>> combined = FeatureUnion(estimators)
>>> combined # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
FeatureUnion(n_jobs=1,
             transformer_list=[('linear_pca', PCA(copy=True,...)),
                               ('kernel_pca', KernelPCA(alpha=1.0,...))],
             transformer_weights=None, verbose=False)

We can use this as a tie breaker 😆

@@ -53,6 +53,10 @@ New features
Enhancements
............

- Added optional parameter ``verbose`` in :class:`pipeline.Pipeline` and
:class:`pipeline.FeatureUnion` for showing progress and timing of each
step. :issue:`8568` by :user:`Karan Desai <karandesai-96>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add your name and link to the bottom of the page :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅

@kdexd
Copy link
Author

kdexd commented Apr 10, 2017

/ping: @jnothman Resolved conflicts with master which moved forward after I opened my PR.

@glemaitre
Copy link
Member

@karandesai-96 You should not merge master into your branch but rebase your branch to master.
Revert to the previous commit and rebase.

@kdexd
Copy link
Author

kdexd commented Apr 10, 2017

@glemaitre I used the web interface for a very small conflict resolving, tried it for the first time. Agreed with you, not impressive.

Reverting, rebasing.

@kdexd
Copy link
Author

kdexd commented Apr 10, 2017

Rebased.

Karan Desai added 2 commits April 10, 2017 10:40
- Each line printed by Pipeline and FeatureUnion, when their
  verbosity mode is on, will be 70 characters long.
@jnothman
Copy link
Member

jnothman commented Apr 12, 2017 via email

@glemaitre
Copy link
Member

glemaitre commented Apr 12, 2017 via email

@jnothman
Copy link
Member

jnothman commented Apr 12, 2017 via email

@glemaitre
Copy link
Member

glemaitre commented Apr 12, 2017 via email

@jnothman
Copy link
Member

jnothman commented Apr 12, 2017 via email


Parameters
----------
step_info : str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this preferred over description, elapsed or something else that further ensures consistency?

I would like to see logger.short_format_time(elapsed) used as in grid search, and this code refactored into utils with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the code in grid search you're talking about? The only logging is within joblib, right?

@amueller
Copy link
Member

amueller commented Jul 7, 2017

could you please resolve the conflicts?

Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be an awesome feature to have, but it should really live in a logger or something. Right now it's cluttering up the already complicated pipeline code.

@kdexd
Copy link
Author

kdexd commented Jul 22, 2017

Hey @amueller, sorry I was away for a while I couldn't respond. I agree with your review. I will resolve the conflicts meanwhile I'll look into a more sane way for logging as you say.

@jnothman
Copy link
Member

jnothman commented Jul 23, 2017 via email

@amueller
Copy link
Member

@jnothman
Copy link
Member

I'd like to see this progress... Are you continuing to work on it @karandesai-96

@EBazarov
Copy link

Verbosity in pipeline very helpful feature, especially for debugging. Any plans to merge this PR ?

@amueller
Copy link
Member

@EBazarov as soon as there is a reasonable implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verbose option on Pipeline
6 participants