-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Implement SLEP009: keyword-only arguments #15005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Realistically, this should be in 0.23. We already have too much on our plate, I think. |
Hmmmm... I don't see why, if the release target is November.
|
I'm wondering whether it's easy to decide which patameters should be positional, e.g., do we think And how shall we make the decision? I guess we don't want to open a vote for every class/function, right? So is +2 enough, or do we need +3? |
It's about leaving some of this to the user discretion, or forcing them to use what we think best. Yes,
I would say +2 since the SLEP was accepted but wait at bit before merging to give the possibility for feedback? |
I don't like it but I'm not opposed to it.
Hmm, not sure, but the SLEP is passed? |
Let's put in another way. As a user, imagine currently I have a few 1000 lines of perfectly fine code that uses |
There's no reason to have a higher standard of review here than any other
deprecation, after the SLEP is accepted. The SLEP would have been accepted
if Andy was around.
|
@agramfort explicitly argued for accepting I really think the point made on the mailing list about allowing users to have clear expectations is important. If we can't write down a simple rule it's hard for users to have clear expectations. Doing some quick stats: from collections import Counter
from inspect import signature
from sklearn.utils.testing import all_estimators
counts = Counter()
for name, est in all_estimators():
sig = signature(est)
if len(sig.parameters):
first = list(sig.parameters.keys())[0]
counts[first] += 1 There's 61 different first arguments in our estimators.
{'n_components': 25,
'alpha': 12,
'estimator': 11,
'base_estimator': 8,
'kernel': 8,
'n_clusters': 7,
'store_precision': 6,
'n_estimators': 6,
'n_neighbors': 6,
'loss': 6,
'score_func': 6,
'alphas': 5,
'fit_intercept': 5,
'criterion': 5,
'C': 3,
'eps': 3,
'input': 3,
'copy': 3,
'threshold': 3,
'penalty': 3,
'missing_values': 3,
'bandwidth': 2,
'priors': 2,
'l1_ratio': 2,
'strategy': 2,
'epsilon': 2,
'estimators': 2,
'n_bins': 2,
'n_iter': 2,
'hidden_layer_sizes': 2,
'categories': 2,
'nu': 2,
'radius': 2,
'norm': 2,
'skewedness': 1,
'with_centering': 1,
'damping': 1,
'dtype': 1,
'sample_steps': 1,
'gamma': 1,
'check_y': 1,
'transformers': 1,
'n_quantiles': 1,
'dictionary': 1,
'method': 1,
'degree': 1,
'neg_label': 1,
'steps': 1,
'patch_size': 1,
'solver': 1,
'n_features': 1,
'transformer_list': 1,
'func': 1,
'min_samples': 1,
'metric': 1,
'classes': 1,
'feature_range': 1,
'Cs': 1,
'y_min': 1,
'regressor': 1,
'n_nonzero_coefs': 1}
Maybe having a white-list of those that we allow would be useful? Though the C is a bit of an outlier and I think having 'store_precision' be positional would not be very useful so I didn't list it. Generally I think for all meta-estimators the first argument should be positional. |
Agreed. I prefer narrowing the list down to just clustering, decomposition, and meta estimator parameters: 'n_components', 'estimator', 'base_estimator', 'n_clusters', 'n_neighbors', 'steps', 'regressor', 'transformers' |
Hm that's deprecating |
LogisticRegression(0.01) isn't a thing: the first parameter is penalty |
I might try get someone to help tackle this (either trying to gather the
usage statistics, or to put the APIs into a form where all can be
considered) over the next couple of days. Is someone else working on it?
|
A major company recently did a giant github scrape and analyzed sklearn usage. I asked them whether they can share their results. |
Hi guys, Let me know if you have any questions. I intend to upload the code (need a bit of cleaning!) I used to run this analysis soon, so you guys can have a look. Briefly a piece of code downloads the repos from a list of repos (grabbed from the dependents tree or by getting a list of repos using Google BigQuery on the GitHub public dataset - in the latter I run out of free usage credits) and convert the |
@srggrs Thank you for the analysis! It looks like |
Also, |
Because the analysis here is limited to class constructors (next version should not, I think, have this limitation), the first arg is always self, and so the lower bound of 1 makes sense. |
I'm happy to have in 0.22 if you still think we can have it in 0.22 @jnothman |
I don't think it's worth rushing this into 0.22. more important to ship the
release.
|
@adrinjalali I've added a list of the subpackages we've resolved this for, and those still to go, in the PR description. |
Once this is done, and included in the RC, we should heavily advertise the RC to make sure people discover potential issues before the final release. If there are many complaints, we might need to relax some of the most common positional arguments. |
Should we leave |
I'll try to open a PR but I would agree there's no strong need to get it in for the release |
Okay, let's see what it looks like. In general, it would be "nice to have" since it would make the library more consistent and promote the usage of |
I think this one is now complete. There may be missing ones, which we can deal with later with delayed deprecations. |
Thanks for all your effort to everyone involved. This is a great thing for making the parameters more findable in a year's time. |
SLEP009 is all but accepted.
It proposes to make most parameters keyword-only.
We should do this by first:
We might along the way establish rules of thumb and principles like "are the semantics reasonably clear when the argument is passed positionally?" As I noted on the mailing list, I think they are clear for PCA's components, for Pipeline's steps, and for GridSearchCV's estimator and parameter grid. Other parameters of those estimators seem more suitable for keyword-only. Trickier is whether n_components in TSNE should follow PCA in being positional... It's not as commonly set by users.
The text was updated successfully, but these errors were encountered: