Make random_state descriptions more informative and refer to Glossary #10548

jnothman · 2018-01-29T12:46:31Z

We recently added a Glossary to our documentation, which describes common parameters among other things. We should now replace descriptions of random_state parameters to make them more concise and informative (see #10415). For example, instead of

    random_state : int, RandomState instance or None, optional, default: None
        If int, random_state is the seed used by the random number generator;
        If RandomState instance, random_state is the random number generator;
        If None, the random number generator is the RandomState instance used
        by `np.random`.

in both KMeans and MiniBatchKMeans, we might have:

KMeans:
    random_state : int, RandomState instance, default=None
        Determines random number generation for centroid initialization.
        Pass an int for reproducible results across multiple function calls.
        See :term:`Glossary <random_state>`.


MiniBatchKMeans:
    random_state : int, RandomState instance, default=None
        Determines random number generation for centroid initialization and
        random reassignment.
        Pass an int for reproducible results across multiple function calls.
        See :term:`Glossary <random_state>`.

Therefore, the description should focus on what is the impact of random_state on the algorithm.

Contributors interested in contributing this change should take on one module at a time, initially.

The list of estimators to be modified is the following:

List of files to modify using kwinata script

The text was updated successfully, but these errors were encountered:

aby0 · 2018-01-29T13:24:21Z

Hi @jnothman, Can I take this issue? Thanks

jnothman · 2018-01-29T13:36:12Z

Claim a module/subpackage and have a go...

…

On 30 January 2018 at 00:24, Somya Anand ***@***.***> wrote: Hi @jnothman <https://github.com/jnothman>, Can I take this issue? Thanks — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10548 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz62ie2pMFVg7uM6_MVnmWKRX-efgHks5tPcaHgaJpZM4Rwij3> .

aby0 · 2018-01-29T14:49:16Z

@jnothman I am sorry for being naive but can you elaborate about the module/submodule? I mean are you referring to a sub-package like Kmeans for instance?

lesteve · 2018-01-29T17:17:01Z

I think what @jnothman means is just start with one file, for example sklearn/cluster/k_means_.py, update the random_state docstring as in the top post and open a PR.

jnothman · 2018-01-29T21:55:46Z

a subpackage is something like sklearn.cluster

aby0 · 2018-01-30T10:53:15Z

Thanks. Will do that and open a PR.

ghost · 2018-01-30T19:08:57Z

Hi! @jnothman

Would you also like to replace the following comments as seen in grid_search.py? They have an extra line as compared to the one shared by you.

random_state : int, RandomState instance or None, optional (default=None)
        Pseudo random number generator state used for random uniform sampling
        from lists of possible values instead of scipy.stats distributions.
        If int, random_state is the seed used by the random number generator;
        If RandomState instance, random_state is the random number generator;
        If None, the random number generator is the RandomState instance used
        by `np.random`.

ghost · 2018-01-30T19:26:13Z

I can take grid_search.py and k_means.py(KMeans).

jnothman · 2018-01-30T21:15:39Z

leave grid_search.py alone. it is deprecated. The idea is to minimise the content that is repeated, and available in the glossary, so that we can give the users to most informative description about random_state's role in the particular estimator.

ghost · 2018-01-31T16:01:11Z

Thanks @jnothman. WIll I need to understand these algorithms before I can replace this random_state information?

jnothman · 2018-01-31T21:58:16Z

You will need to understand the algorithms broadly, but not every detail of their implementation. You will need to be able to find where random_state is used, if the randomisation in the algorithm is not completely obvious. In some cases, it may be appropriate to not even give much more detail than just linking to the glossary; we'll have to see how it goes.

ghost · 2018-02-03T03:18:28Z

Okay, thank you. I will start going through the algorithms slowly.

Regards,
Shivam Rastogi

ghost · 2018-02-10T12:01:12Z

Since @aby0 has not claimed the sklearn.cluster module yet. I would like to claim the whole module. Please let me know if I can work on it or I should work on something else.

ghost · 2018-02-15T12:00:54Z

Any update guys? It is a long holiday for us so let me know if I can pick this.

richford · 2018-02-28T20:32:31Z

I'll take the datasets module since I'm already poking around in the docstrings there for #10731.

…ossary for clusters module #10548 (#10614)

SirR4T · 2018-08-23T06:57:03Z

I'm claiming the linear_model module. ~~will raise a PR soon.~~ #11900 raised.

Claiming decomposition module next.

For `linear_model` module. Working towards scikit-learn#10548.

jnothman · 2018-08-23T10:23:26Z

We had some trouble reaching consensus on how to strike the right balance here, iirc

jnothman · 2018-08-23T10:24:05Z

So do pay attention to the prior PRs merged above

SirR4T · 2018-08-23T10:37:43Z

@jnothman thanks! will update the PRs for to mention the reproducibility when passing an int.

DatenBiene · 2020-01-29T10:53:00Z

Claim sklearn/ensemble/_weight_boosting.py - 188, 324, 479, 900, 1022

DatenBiene · 2020-01-29T15:24:35Z

Claim sklearn/multioutput.py - 578, 738

DatenBiene · 2020-01-30T09:25:50Z

Claim :
sklearn/mixture/_bayesian_mixture.py - 166
sklearn/mixture/_base.py - 139
sklearn/mixture/_gaussian_mixture.py - 504

DatenBiene · 2020-01-30T13:00:20Z

Claim sklearn/ensemble/_gb.py - 887, 1360

DatenBiene · 2020-01-30T13:13:53Z

Claim sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py - 736, 918

GregoireMialon · 2020-01-30T13:14:53Z

Claim sklearn/neural_network/_rbm.py - 59

DatenBiene · 2020-01-30T14:00:50Z

Claim :

sklearn/svm/_classes.py - 90, 312, 546, 752
sklearn/svm/_base.py - 853

DatenBiene · 2020-01-30T14:31:05Z

Claim:

sklearn/feature_selection/_mutual_info.py - 226, 335, 414
sklearn/metrics/cluster/_unsupervised.py - 80
sklearn/utils/_testing.py - 521
sklearn/utils/init.py - 478, 623

DatenBiene · 2020-01-31T13:52:43Z

Claim :

sklearn/dummy.py - 59
sklearn/random_projection.py - 178, 245, 464, 586

jeremiedbb · 2020-02-12T14:53:28Z

@DatenBiene @GregoireMialon Thanks for all your contributions during last sprint. There are only 3 modules left unchecked !

Would you be interested / have time / have motivation to tackle those (no pressure !) ?

GregoireMialon · 2020-02-13T13:05:36Z

Hi Jérémie ! I'll try to have a look at it soon Le mer. 12 févr. 2020 à 15:53, Jérémie du Boisberranger < [email protected]> a écrit :

…

@DatenBiene <https://github.com/DatenBiene> @GregoireMialon <https://github.com/GregoireMialon> Thanks for all your contributions during last sprint. There are only 3 modules left unchecked ! Would you be interested / have time / have motivation to tackle those (no pressure !) ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10548?email_source=notifications&email_token=AFY4624NQL3EAFLBGPUNAE3RCQEO3A5CNFSM4EOCFD32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELRBT2A#issuecomment-585243112>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFY4625457AU7OL4E4EUVOTRCQEO3ANCNFSM4EOCFD3Q> .

DatenBiene · 2020-04-04T07:44:37Z

Hi @jeremiedbb! I will try to finish the 3 remaining modules today 😃

Claim:

sklearn/kernel_approximation.py - 41, 143, 470
sklearn/multiclass.py - 687
sklearn/ensemble/_base.py - 52

DatenBiene · 2020-04-16T09:24:34Z

Hi @jnothman and @jeremiedbb, looks like all the files where modified. I would be happy to help if you find any remaining issues.

cmarmo · 2020-04-16T11:01:00Z

Thanks a lot @DatenBiene and all the contributors that worked to close this issue!
I think we can close this huge one!
Feel free to open new specific issues if something is still missing about random_state description.

jnothman added Documentation good first issue Easy with clear instructions to resolve help wanted labels Jan 29, 2018

This comment has been minimized.

Sign in to view

ghost mentioned this issue Feb 21, 2018

[MRG] Make random_state descriptions more informative and refer to Glossary for clusters module #10548 #10614

Merged

richford mentioned this issue Feb 28, 2018

[MRG+1] Reference glossary in random_state docstring entries in datasets module #10732

Merged

glemaitre added the Sprint label Apr 20, 2018

TomDLT pushed a commit that referenced this issue May 4, 2018

[MRG] Make random_state descriptions more informative and refer to Gl…

aca956b

…ossary for clusters module #10548 (#10614)

SirR4T added a commit to SirR4T/scikit-learn that referenced this issue Aug 23, 2018

[MRG] Fix random_state docstrings to refer Glossary

7b05b0f

For `linear_model` module. Working towards scikit-learn#10548.

SirR4T mentioned this issue Aug 23, 2018

DOC improve random_state docstrings in the linear_model module #11900

Merged

This comment has been minimized.

Sign in to view

DatenBiene mentioned this issue Jan 29, 2020

[DOC] Make random_state descriptions for AdaBoost #16278

Merged

DatenBiene mentioned this issue Jan 29, 2020

[DOC] Make random_state descriptions for ClassifierChain and RegressorChain #16291

Merged

DatenBiene mentioned this issue Jan 30, 2020

[DOC] Make random_state descriptions for Mixture Models #16307

Merged

DatenBiene mentioned this issue Jan 30, 2020

DOC Improve random_state descriptions for GradientBoosting #16314

Merged

DatenBiene mentioned this issue Jan 30, 2020

[DOC] Make random_state descriptions for Hist GradientBoosting #16315

Merged

DatenBiene mentioned this issue Jan 30, 2020

[DOC] Update random_state descriptions for SVMs #16316

Merged

GregoireMialon mentioned this issue Jan 30, 2020

DOC more informative description of random state in _rbm.py #16318

Merged

DatenBiene mentioned this issue Jan 30, 2020

[DOC] Update random_state descriptions for mutual_info, unsupervised, .… (4) #16320

Merged

DatenBiene mentioned this issue Jan 31, 2020

DOC Update random_state entry for dummy / random_projection #16347

Merged

This was referenced Apr 4, 2020

[DOC Update random_state descriptions for Kernel Approximation #16838

Merged

DOC Update random_state description for Multiclass #16839

Merged

lorentzenchr mentioned this issue Apr 5, 2020

[MRG+3] FEA Add PolynomialCountSketch to Kernel Approximation module #13003

Merged

3 tasks

DatenBiene mentioned this issue Apr 5, 2020

DOC Update random_state descriptions for ensemble/_base #16847

Merged

cmarmo closed this as completed Apr 16, 2020

cmarmo removed the help wanted label Apr 16, 2020

thomasjpfan mentioned this issue Jan 21, 2022

DOC ensure that meaning if random_state = None is specified in spectral embedding #21427

Closed

Make random_state descriptions more informative and refer to Glossary #10548

Make random_state descriptions more informative and refer to Glossary #10548

Comments

jnothman commented Jan 29, 2018 • edited by jeremiedbb Loading

aby0 commented Jan 29, 2018

jnothman commented Jan 29, 2018 via email

aby0 commented Jan 29, 2018

lesteve commented Jan 29, 2018

jnothman commented Jan 29, 2018 via email

aby0 commented Jan 30, 2018

ghost commented Jan 30, 2018

ghost commented Jan 30, 2018

jnothman commented Jan 30, 2018 via email

ghost commented Jan 31, 2018

jnothman commented Jan 31, 2018 via email

ghost commented Feb 3, 2018

This comment has been minimized.

ghost commented Feb 10, 2018

ghost commented Feb 15, 2018

richford commented Feb 28, 2018

SirR4T commented Aug 23, 2018 • edited Loading

This comment has been minimized.

jnothman commented Aug 23, 2018 via email

jnothman commented Aug 23, 2018

SirR4T commented Aug 23, 2018

DatenBiene commented Jan 29, 2020

DatenBiene commented Jan 29, 2020

DatenBiene commented Jan 30, 2020

DatenBiene commented Jan 30, 2020

DatenBiene commented Jan 30, 2020

GregoireMialon commented Jan 30, 2020

DatenBiene commented Jan 30, 2020

DatenBiene commented Jan 30, 2020

DatenBiene commented Jan 31, 2020

jeremiedbb commented Feb 12, 2020

GregoireMialon commented Feb 13, 2020 via email

DatenBiene commented Apr 4, 2020

DatenBiene commented Apr 16, 2020

cmarmo commented Apr 16, 2020

jnothman commented Jan 29, 2018 •

edited by jeremiedbb

Loading

SirR4T commented Aug 23, 2018 •

edited

Loading