Implement SLEP009: keyword-only arguments #15005

jnothman · 2019-09-17T23:27:52Z

adrinjalali · 2019-09-18T08:45:07Z

Realistically, this should be in 0.23. We already have too much on our plate, I think.

jnothman · 2019-09-18T09:47:46Z

Hmmmm... I don't see why, if the release target is November.

qinhanmin2014 · 2019-09-18T13:20:30Z

I think they are clear for PCA's components, for Pipeline's steps, and for GridSearchCV's estimator and parameter grid.

I'm wondering whether it's easy to decide which patameters should be positional, e.g., do we think PCA(2) is reasonable? At least I don't like it.

And how shall we make the decision? I guess we don't want to open a vote for every class/function, right? So is +2 enough, or do we need +3?

rth · 2019-09-18T13:56:17Z

e.g., do we think PCA(2) is reasonable? At least I don't like it.

It's about leaving some of this to the user discretion, or forcing them to use what we think best. Yes, PCA(2) is bad, while PCA(n_components) is reasonable. At least I wouldn't object to a PR using it, would you? Users can resent a limitation of their freedom (when to use or not position args) in cases when what there is no overwhelming reason for it.

So is +2 enough, or do we need +3?

I would say +2 since the SLEP was accepted but wait at bit before merging to give the possibility for feedback?

qinhanmin2014 · 2019-09-18T14:02:34Z

At least I wouldn't object to a PR using it, would you?

I don't like it but I'm not opposed to it.

Users can resent a limitation of their freedom (when to use or not position args) in cases when what there is no overwhelming reason for it.

Hmm, not sure, but the SLEP is passed?

rth · 2019-09-18T14:09:20Z

At least I wouldn't object to a PR using it, would you?

I don't like it but I'm not opposed to it.

Let's put in another way. As a user, imagine currently I have a few 1000 lines of perfectly fine code that uses PCA(n_components) (and other comparable use cases). If tomorrow it's going to start raising warnings to change it to PCA(n_components=n_components) and require me to do maintenance work without good reason, personally I would be unhappy and will complain about it to whatever project did that.

jnothman · 2019-09-18T14:40:03Z

There's no reason to have a higher standard of review here than any other deprecation, after the SLEP is accepted. The SLEP would have been accepted if Andy was around.

amueller · 2019-09-18T15:50:18Z

@agramfort explicitly argued for accepting PCA(n_components).

I really think the point made on the mailing list about allowing users to have clear expectations is important. If we can't write down a simple rule it's hard for users to have clear expectations.

Doing some quick stats:

from collections import Counter
from inspect import signature
from sklearn.utils.testing import all_estimators

counts = Counter()

for name, est in all_estimators():
    sig = signature(est)
    if len(sig.parameters):
        first = list(sig.parameters.keys())[0]
        counts[first] += 1

There's 61 different first arguments in our estimators.

{'n_components': 25, 'alpha': 12, 'estimator': 11, 'base_estimator': 8, 'kernel': 8, 'n_clusters': 7, 'store_precision': 6, 'n_estimators': 6, 'n_neighbors': 6, 'loss': 6, 'score_func': 6, 'alphas': 5, 'fit_intercept': 5, 'criterion': 5, 'C': 3, 'eps': 3, 'input': 3, 'copy': 3, 'threshold': 3, 'penalty': 3, 'missing_values': 3, 'bandwidth': 2, 'priors': 2, 'l1_ratio': 2, 'strategy': 2, 'epsilon': 2, 'estimators': 2, 'n_bins': 2, 'n_iter': 2, 'hidden_layer_sizes': 2, 'categories': 2, 'nu': 2, 'radius': 2, 'norm': 2, 'skewedness': 1, 'with_centering': 1, 'damping': 1, 'dtype': 1, 'sample_steps': 1, 'gamma': 1, 'check_y': 1, 'transformers': 1, 'n_quantiles': 1, 'dictionary': 1, 'method': 1, 'degree': 1, 'neg_label': 1, 'steps': 1, 'patch_size': 1, 'solver': 1, 'n_features': 1, 'transformer_list': 1, 'func': 1, 'min_samples': 1, 'metric': 1, 'classes': 1, 'feature_range': 1, 'Cs': 1, 'y_min': 1, 'regressor': 1, 'n_nonzero_coefs': 1}

Maybe having a white-list of those that we allow would be useful?
Say, 'n_components', 'alpha', 'estimator', 'base_estimator', 'kernel' (this is not for SVC), 'n_clusters', 'n_estimators', 'n_neighbors', 'C', 'steps', 'regressor', 'transformers'?

Though the C is a bit of an outlier and I think having 'store_precision' be positional would not be very useful so I didn't list it. Generally I think for all meta-estimators the first argument should be positional.

thomasjpfan · 2019-09-18T17:36:17Z

Generally I think for all meta-estimators the first argument should be positional.

Agreed.

I prefer narrowing the list down to just clustering, decomposition, and meta estimator parameters: 'n_components', 'estimator', 'base_estimator', 'n_clusters', 'n_neighbors', 'steps', 'regressor', 'transformers'

amueller · 2019-09-18T18:19:56Z

Hm that's deprecating LogisticRegression(0.01) and RandomForestClassifier(100)`` ... I'm not super opposed but also could see some resistance?

jnothman · 2019-09-18T19:22:47Z

LogisticRegression(0.01) isn't a thing: the first parameter is penalty

jnothman · 2019-09-18T19:42:00Z

I might try get someone to help tackle this (either trying to gather the usage statistics, or to put the APIs into a form where all can be considered) over the next couple of days. Is someone else working on it?

amueller · 2019-10-18T08:15:17Z

A major company recently did a giant github scrape and analyzed sklearn usage. I asked them whether they can share their results.

srggrs · 2019-10-21T23:31:06Z

Hi guys,
I did some preliminary analysis for @jnothman using AST and these are the results for the aggregated data and analysis related to the repo/file.

Let me know if you have any questions. I intend to upload the code (need a bit of cleaning!) I used to run this analysis soon, so you guys can have a look.

Briefly a piece of code downloads the repos from a list of repos (grabbed from the dependents tree or by getting a list of repos using Google BigQuery on the GitHub public dataset - in the latter I run out of free usage credits) and convert the *.ipynb to *.py using nbconvert, then another code runs through all repo files in parallel and search, using AST, for callables on objects imported from all_estimators.
The code does not handle case where the import is of this form: from sklearn.ensemble import RandomForestClassifier as RFC.

thomasjpfan · 2019-10-22T01:29:11Z

@srggrs Thank you for the analysis!

It looks like nr_pos_args is bounded below by 1. Does this mean that all estimators uses at least one positional argument?

adrinjalali · 2019-10-23T08:52:08Z

Also, nr_pos_args max seems to be 1 for many estimators, which looks pretty odd to me.

jnothman · 2019-10-24T13:03:02Z

It looks like nr_pos_args is bounded below by 1. Does this mean that all estimators uses at least one positional argument?

Because the analysis here is limited to class constructors (next version should not, I think, have this limitation), the first arg is always self, and so the lower bound of 1 makes sense.

adrinjalali · 2019-10-29T13:01:46Z

I'm happy to have in 0.22 if you still think we can have it in 0.22 @jnothman

jnothman · 2019-10-29T19:59:32Z

I don't think it's worth rushing this into 0.22. more important to ship the release.

jnothman · 2020-03-15T11:42:38Z

@adrinjalali I've added a list of the subpackages we've resolved this for, and those still to go, in the PR description.

rth · 2020-04-21T12:09:07Z

Once this is done, and included in the RC, we should heavily advertise the RC to make sure people discover potential issues before the final release. If there are many complaints, we might need to relax some of the most common positional arguments.

thomasjpfan · 2020-04-22T13:53:51Z

Should we leave utils alone?

NicolasHug · 2020-04-22T14:04:27Z

Should we leave utils alone?

I'll try to open a PR but I would agree there's no strong need to get it in for the release

thomasjpfan · 2020-04-22T14:26:15Z

I'll try to open a PR but I would agree there's no strong need to get it in for the release

Okay, let's see what it looks like. In general, it would be "nice to have" since it would make the library more consistent and promote the usage of * in future util functions.

adrinjalali · 2020-04-27T18:44:50Z

I think this one is now complete. There may be missing ones, which we can deal with later with delayed deprecations.

jnothman · 2020-04-28T07:34:09Z

Thanks for all your effort to everyone involved. This is a great thing for making the parameters more findable in a year's time.

jnothman added this to the 0.22 milestone Sep 17, 2019

thomasjpfan mentioned this issue Sep 20, 2019

ENH Add Deprecating Position Arguments Helper #13311

Merged

thomasjpfan added the API label Oct 26, 2019

adrinjalali modified the milestones: 0.22, 0.23 Oct 29, 2019

This was referenced Mar 18, 2020

[MRG] API make load_* args in datasets kwarg only #16719

Merged

API make __init__ params in decomposition kwonly #16722

Merged

API make __init__ args in ensemble kwonly #16724

Merged

This was referenced Apr 7, 2020

API make feature_extraction's constructors' params kwonly #16866

Merged

API feature_selection's constructor params -> kwonly #16867

Merged

adrinjalali mentioned this issue Apr 7, 2020

[MRG] API make gaussian_process __init__ params kwarg #16870

Merged

adrinjalali mentioned this issue Apr 15, 2020

[MRG] API kwonly args in impute, inspection, kernel_ridge, and linear_model #16926

Merged

adrinjalali mentioned this issue Apr 21, 2020

API kwonly args in manifold, metrics, mixture, model_selection, multclass, multioutput #16982

Merged

This was referenced Apr 22, 2020

API Deprecate positional arguments in random_projection #16995

Merged

API Deprecate positional arguments in preprocessing #16996

Merged

API Deprecate positional arguments in pipeline #16997

Merged

This was referenced Apr 22, 2020

[MRG] API kwonly for neural_network module #17002

Merged

[MRG] API kwonly for naive_bayes #17003

Merged

API kwonly for neighbors module #17004

Merged

thomasjpfan mentioned this issue Apr 22, 2020

API Deprecate positional arguments in neural_network #17005

Closed

NicolasHug mentioned this issue Apr 22, 2020

[MRG] API kwonly for sklearn.base #17006

Merged

This was referenced Apr 22, 2020

[MRG] API kwonly for utils #17007

Merged

[MRG] API kwonly for utils #17046

Merged

adrinjalali closed this as completed Apr 27, 2020

jameslamb mentioned this issue Nov 10, 2020

Hugo/initial work saturncloud/dask-pytorch-ddp#1

Merged

TomDLT mentioned this issue Apr 29, 2021

MNT clean futurewarning for 1.0 | _deprecate_positional_args #20002

Merged

tupui mentioned this issue Sep 10, 2021

API: keywords only arguments scipy/scipy#14714

Closed

itsdawei mentioned this issue Oct 23, 2022

Deprecate positional arguments in constructors icaros-usc/pyribs#261

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement SLEP009: keyword-only arguments #15005

Implement SLEP009: keyword-only arguments #15005

jnothman commented Sep 17, 2019 •

edited by adrinjalali

Loading

adrinjalali commented Sep 18, 2019

jnothman commented Sep 18, 2019 via email

qinhanmin2014 commented Sep 18, 2019

rth commented Sep 18, 2019

qinhanmin2014 commented Sep 18, 2019

rth commented Sep 18, 2019 •

edited

Loading

jnothman commented Sep 18, 2019 via email

amueller commented Sep 18, 2019

thomasjpfan commented Sep 18, 2019

amueller commented Sep 18, 2019

jnothman commented Sep 18, 2019

jnothman commented Sep 18, 2019 via email

amueller commented Oct 18, 2019

srggrs commented Oct 21, 2019

thomasjpfan commented Oct 22, 2019

adrinjalali commented Oct 23, 2019

jnothman commented Oct 24, 2019

adrinjalali commented Oct 29, 2019

jnothman commented Oct 29, 2019 via email

jnothman commented Mar 15, 2020

rth commented Apr 21, 2020

thomasjpfan commented Apr 22, 2020

NicolasHug commented Apr 22, 2020

thomasjpfan commented Apr 22, 2020

adrinjalali commented Apr 27, 2020

jnothman commented Apr 28, 2020

Implement SLEP009: keyword-only arguments #15005

Implement SLEP009: keyword-only arguments #15005

Comments

jnothman commented Sep 17, 2019 • edited by adrinjalali Loading

adrinjalali commented Sep 18, 2019

jnothman commented Sep 18, 2019 via email

qinhanmin2014 commented Sep 18, 2019

rth commented Sep 18, 2019

qinhanmin2014 commented Sep 18, 2019

rth commented Sep 18, 2019 • edited Loading

jnothman commented Sep 18, 2019 via email

amueller commented Sep 18, 2019

thomasjpfan commented Sep 18, 2019

amueller commented Sep 18, 2019

jnothman commented Sep 18, 2019

jnothman commented Sep 18, 2019 via email

amueller commented Oct 18, 2019

srggrs commented Oct 21, 2019

thomasjpfan commented Oct 22, 2019

adrinjalali commented Oct 23, 2019

jnothman commented Oct 24, 2019

adrinjalali commented Oct 29, 2019

jnothman commented Oct 29, 2019 via email

jnothman commented Mar 15, 2020

rth commented Apr 21, 2020

thomasjpfan commented Apr 22, 2020

NicolasHug commented Apr 22, 2020

thomasjpfan commented Apr 22, 2020

adrinjalali commented Apr 27, 2020

jnothman commented Apr 28, 2020

jnothman commented Sep 17, 2019 •

edited by adrinjalali

Loading

rth commented Sep 18, 2019 •

edited

Loading