-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
RFC: referring to Glossary to make parameter descriptions more focussed #10415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
IMO, I like On the side, I can recall some discussion with @lesteve IRL which point me out that developers are using the docstring directly from terminal, and for which hyper-links could make it difficult to find the detailed documentation. However and as a bottom-line, I would go for the less verbose version since I personally think that newcomers are more prone to use online (HTML) documentation and that |
Fine with me, I guess people will need to learn that |
We could use "see the glossary" |
The issue that I see with this is that many users look at the docs via jupyter notebooks and I don't think the links will work there, right? |
I suppose not... hmmm. so what do you think is the minimum detail to leave
in? keep it all in and also refer?
|
Dear core-devs, after discussion with @glemaitre, this issue is indeed a good candidate for sprints. It has been splitted in more specific issues. I'm wondering if we can close this one, being an RFC on which apparently consensus has been obtained, in favour of #10548 and #14228 , for which I'm trying to summarize the list of modules that still need an update. Also, I'm checking the |
I don't think #14228 is so different, and yes I'm happy to close this as the work is covered by those other issues mostly. But to be sure, solving these issues for an estimator requires understanding how that estimator works, and how to investigate where the randomisation/parallelisation is used. It's quite a challenging issue for a newcomer (but not inherently requiring a lot of prior scikit-learn knowledge). Yes, happy to see memory refer to Glossary, but it's much less frequent and much less ambiguous as to what it's used for. |
The conclusion of these discussions was that we are avoiding duplication of docstrings by referring to the glossary. One approach that we took in kartothek was by introducing decorators on top of each function that auto-fill the remaining parts of docs. For e.g
If maintainers like this idea, I could propose a draft. |
I would like us to refer to the Glossary in API reference for parameter descriptions that come up frequently, or which have associated caveats that are too long for parameter descriptions, most notably
n_jobs
andrandom_state
.So instead of something like:
in both KMeans and MiniBatchKMeans, we might have:
One question is how much verbosity we should have in describing how the user may parametrise random_state. We could have just
See :term:`random_state`.
, or we could haveAn int seeds the random number generator deterministically, while None uses the current np.random state. See :term:`random_state`.
Just as I see us trying to describe what is random about the algorithm when describing random_state, I would like to see
n_jobs
stating whether parallelism is only in fit, or in fit and predict, and what backend is used by default.What do others think?
The text was updated successfully, but these errors were encountered: