-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Make random_state descriptions more informative and refer to Glossary #10548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @jnothman, Can I take this issue? Thanks |
Claim a module/subpackage and have a go...
…On 30 January 2018 at 00:24, Somya Anand ***@***.***> wrote:
Hi @jnothman <https://github.com/jnothman>, Can I take this issue? Thanks
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10548 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz62ie2pMFVg7uM6_MVnmWKRX-efgHks5tPcaHgaJpZM4Rwij3>
.
|
@jnothman I am sorry for being naive but can you elaborate about the module/submodule? I mean are you referring to a sub-package like Kmeans for instance? |
I think what @jnothman means is just start with one file, for example sklearn/cluster/k_means_.py, update the |
a subpackage is something like sklearn.cluster
|
Thanks. Will do that and open a PR. |
Hi! @jnothman Would you also like to replace the following comments as seen in grid_search.py? They have an extra line as compared to the one shared by you.
|
I can take grid_search.py and k_means.py(KMeans). |
leave grid_search.py alone. it is deprecated. The idea is to minimise the
content that is repeated, and available in the glossary, so that we can
give the users to most informative description about random_state's role in
the particular estimator.
|
Thanks @jnothman. WIll I need to understand these algorithms before I can replace this random_state information? |
You will need to understand the algorithms broadly, but not every detail of
their implementation. You will need to be able to find where random_state
is used, if the randomisation in the algorithm is not completely obvious.
In some cases, it may be appropriate to not even give much more detail than
just linking to the glossary; we'll have to see how it goes.
|
Okay, thank you. I will start going through the algorithms slowly. Regards, |
This comment has been minimized.
This comment has been minimized.
Since @aby0 has not claimed the sklearn.cluster module yet. I would like to claim the whole module. Please let me know if I can work on it or I should work on something else. |
Any update guys? It is a long holiday for us so let me know if I can pick this. |
I'll take the |
I'm claiming the Claiming |
For `linear_model` module. Working towards scikit-learn#10548.
This comment has been minimized.
This comment has been minimized.
We had some trouble reaching consensus on how to strike the right balance
here, iirc
|
So do pay attention to the prior PRs merged above |
@jnothman thanks! will update the PRs for to mention the reproducibility when passing an int. |
Claim sklearn/ensemble/_weight_boosting.py - 188, 324, 479, 900, 1022 |
Claim sklearn/multioutput.py - 578, 738 |
Claim : |
Claim sklearn/ensemble/_gb.py - 887, 1360 |
Claim sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py - 736, 918 |
Claim sklearn/neural_network/_rbm.py - 59 |
Claim : sklearn/svm/_classes.py - 90, 312, 546, 752 |
Claim: sklearn/feature_selection/_mutual_info.py - 226, 335, 414 |
Claim : sklearn/dummy.py - 59 |
@DatenBiene @GregoireMialon Thanks for all your contributions during last sprint. There are only 3 modules left unchecked ! Would you be interested / have time / have motivation to tackle those (no pressure !) ? |
Hi Jérémie ! I'll try to have a look at it soon
Le mer. 12 févr. 2020 à 15:53, Jérémie du Boisberranger <
[email protected]> a écrit :
… @DatenBiene <https://github.com/DatenBiene> @GregoireMialon
<https://github.com/GregoireMialon> Thanks for all your contributions
during last sprint. There are only 3 modules left unchecked !
Would you be interested / have time / have motivation to tackle those (no
pressure !) ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10548?email_source=notifications&email_token=AFY4624NQL3EAFLBGPUNAE3RCQEO3A5CNFSM4EOCFD32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELRBT2A#issuecomment-585243112>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFY4625457AU7OL4E4EUVOTRCQEO3ANCNFSM4EOCFD3Q>
.
|
Hi @jeremiedbb! I will try to finish the 3 remaining modules today 😃 Claim: sklearn/kernel_approximation.py - 41, 143, 470 |
Hi @jnothman and @jeremiedbb, looks like all the files where modified. I would be happy to help if you find any remaining issues. |
Thanks a lot @DatenBiene and all the contributors that worked to close this issue! |
We recently added a Glossary to our documentation, which describes common parameters among other things. We should now replace descriptions of
random_state
parameters to make them more concise and informative (see #10415). For example, instead ofin both KMeans and MiniBatchKMeans, we might have:
Therefore, the description should focus on what is the impact of
random_state
on the algorithm.Contributors interested in contributing this change should take on one module at a time, initially.
The list of estimators to be modified is the following:
List of files to modify using kwinata script
sklearn/dummy.py - 59
sklearn/multioutput.py - 578, 738
sklearn/kernel_approximation.py - 41, 143, 470
sklearn/multiclass.py - 687
sklearn/random_projection.py - 178, 245, 464, 586
sklearn/feature_extraction/image.py - 368, 502
sklearn/utils/random.py - 39 open PR
sklearn/utils/extmath.py - 185, 297
sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py - 736, 918
sklearn/ensemble/_hist_gradient_boosting/binning.py - 37, 112
sklearn/ensemble/_bagging.py - 503, 902
sklearn/ensemble/_gb.py - 887, 1360
sklearn/ensemble/_forest.py - 965, 1282, 1559, 1868, 2103
sklearn/ensemble/_iforest.py - 109
sklearn/ensemble/_base.py - 52
sklearn/ensemble/_weight_boosting.py - 188, 324, 479, 900, 1022
sklearn/decomposition/_truncated_svd.py - 59 Open PR
sklearn/decomposition/_kernel_pca.py - 79 Open PR
sklearn/decomposition/_dict_learning.py - 364, 485, 692, 1135, 1325 Open PR
sklearn/decomposition/_fastica.py - 205, 344 Open PR
sklearn/decomposition/_nmf.py - 290, 475, 966, 1159 Open PR
sklearn/decomposition/_pca.py - 192 Open PR
sklearn/decomposition/_sparse_pca.py - 82, 285 Open PR
sklearn/decomposition/_lda.py - 60, 79, 225 Open PR
sklearn/decomposition/_factor_analysis.py - 92 Open PR
sklearn/cluster/_kmeans.py - 56, 241, 380, 583, 700, 1150, 1370
sklearn/cluster/_spectral.py - 41, 197, 313
sklearn/cluster/_bicluster.py - 236, 383
sklearn/cluster/_mean_shift.py - 48
sklearn/preprocessing/_data.py - 2178, 2607
sklearn/impute/_iterative.py - 125
sklearn/linear_model/_ransac.py - 152 Open PR
sklearn/linear_model/_coordinate_descent.py - 580, 860, 1313, 1487, 1665, 1851, 2016, 2192 Open PR
sklearn/linear_model/_sag.py - 154 Open PR
sklearn/linear_model/_perceptron.py - 55 Open PR
sklearn/linear_model/_passive_aggressive.py - 76, 322 Open PR
sklearn/linear_model/_logistic.py - 587, 924, 1100, 1658 Open PR
sklearn/linear_model/_base.py - 65
sklearn/linear_model/_stochastic_gradient.py - 369, 811, 1419 Open PR
sklearn/linear_model/_theil_sen.py - 243 Open PR
sklearn/linear_model/_ridge.py - 325, 693, 853 Open PR
sklearn/tree/_classes.py - 653, 1033, 1322, 1552
sklearn/feature_selection/_mutual_info.py - 226, 335, 414
sklearn/metrics/cluster/_unsupervised.py - 80
sklearn/svm/_classes.py - 90, 312, 546, 752 Open PR
sklearn/svm/_base.py - 853 Open PR
sklearn/inspection/_permutation_importance.py - 81
sklearn/gaussian_process/_gpr.py - 109, 382
sklearn/gaussian_process/_gpc.py - 110, 537
sklearn/manifold/_spectral_embedding.py - 171, 387
sklearn/manifold/_locally_linear.py - 146, 252, 584
sklearn/manifold/_t_sne.py - 558
sklearn/manifold/_mds.py - 51, 198, 314
sklearn/utils/_testing.py - 521
sklearn/utils/init.py - 478, 623
sklearn/datasets/_kddcup99.py - 79
sklearn/datasets/_covtype.py - 69
sklearn/datasets/_rcv1.py - 114
sklearn/datasets/_samples_generator.py - 127, 323, 440, 531, 618, 688, 767, 904, 965, 1030, 1106, 1159, 1218, 1258, 1307, 1368, 1420, 1483, 1571, 1662
sklearn/datasets/_olivetti_faces.py - 64
sklearn/datasets/_base.py - 157
sklearn/datasets/_twenty_newsgroups.py - 187
sklearn/mixture/_bayesian_mixture.py - 166
sklearn/mixture/_base.py - 139
sklearn/mixture/_gaussian_mixture.py - 504
sklearn/model_selection/_validation.py - 1006, 1176
sklearn/model_selection/_split.py - 382, 588, 1091, 1196, 1250, 1390, 1492, 1605, 2049 Open PR
sklearn/model_selection/_search.py - 207, 1299
sklearn/neural_network/_multilayer_perceptron.py - 782, 1174
sklearn/neural_network/_rbm.py - 59
sklearn/neighbors/_kde.py - 233
sklearn/neighbors/_nca.py - 112
sklearn/covariance/_robust_covariance.py - 63, 233, 328, 545
sklearn/covariance/_elliptic_envelope.py - 40
The text was updated successfully, but these errors were encountered: