-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Fix #9743: Adding parameter information to docstring. #9757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks. You should use the formal Parameters section like everywhere else in the codebase. |
@jnothman Ah, I see. I think I now have it formatted properly. |
Splits the data into training/test set folds according to a predefined | ||
scheme. Each sample can be assigned to at most one test set fold, as | ||
specified by the user through the ``test_fold`` parameter. | ||
Provides train/test indices to split data into train/test sets using a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "provides" is a poor word choice here given that in this case the user provides the indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a bit tricky. The user provides via test_fold
what I want to call a "test-fold assignment" or "test-fold membership." This implicitly defines a collection of arrays which represent the indices of the data points split into each (train-fold, test-fold) pair. (There are as many folds as there are unique values (different from -1) in test_fold
.) The explicit representation of the train/test folds in the form of the collection of arrays is being "provided" by the function.
A quick way fix without introducing this colloquial term I've used above to (hopefully) help distinguish the implicit vs. explicit ways of representing a collection of test/train folds is to use "computes" insted of "provides"? Or do nothing? Another possibility is making the example in the documentation a bit larger to possibly clear up confusion, but it may be overkill. I'm open to suggestions as my only rationale to previously use "provides" is the docstring for KFold
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
okay. I can accept it as is
…On 15 Sep 2017 12:25 am, "Kye Taylor" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/model_selection/_split.py
<#9757 (comment)>
:
> @@ -1706,12 +1706,19 @@ def _validate_shuffle_split(n_samples, test_size, train_size):
class PredefinedSplit(BaseCrossValidator):
"""Predefined split cross-validator
- Splits the data into training/test set folds according to a predefined
- scheme. Each sample can be assigned to at most one test set fold, as
- specified by the user through the ``test_fold`` parameter.
+ Provides train/test indices to split data into train/test sets using a
It's subtle, but the user doesn't provide the indices that make up each
test/train fold. Rather, the user provides what I want to call a "fold
assignment" that is converted to said indices. So, instead of "provides" I
could put "converts" or "computes"... I'm open to suggestions as I
previously decided to use "provides" because it models the docstring for
KFold.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9757 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6y3kwKW7PbvyY-zHqbOjDOQXVNFEks5siTdlgaJpZM4PWz9->
.
|
I don't think this needs a second LGTM |
Thanks @taylorkm |
Cool! Thanks @jnothman |
…ing. (scikit-learn#9757) * Adding parameter information to docstring. * Removing trailing whitespace from lines. * Adding details of parameter to formal Parameters section. * Shortened lines to meet requirements.
* tag '0.19.1': (117 commits) TST Improve SelectFromModel tests (scikit-learn#9733) Name in what's new [MRG+1] Raise error when SparseSeries is passed into classification metrics (scikit-learn#7373) Fix LogisticRegressionCV default solver value in docstring (scikit-learn#9962) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) DOC fix a typo (scikit-learn#9892) [MRG+1] Ledoit-Wolf behavior explanation (scikit-learn#9500) [MRG+1] Fix typos in documentation (scikit-learn#9878) DOC: Use setattr(self, ...) instead of self.setattr(...) (scikit-learn#9866) DOC Removed a duplicate occurrence of a word in 'sklearn.neighbors.KNeighborsRegressor' docs (scikit-learn#9862) FIX docstring of negative_outlier_factor_ in LOF (scikit-learn#9809) [MRG+1] Fix scikit-learn#9743: Adding parameter information to docstring. (scikit-learn#9757) DOC: fix docstring of Imputer.fit (scikit-learn#9769) various minor spelling tweaks (scikit-learn#9783) MAINT comment on apparent inconsistency [MRG+1] DOC fix headers level in cross_validation.rst (scikit-learn#9679) Fix mailmap format (scikit-learn#9620) DOC Fix typos (scikit-learn#9577) Typo (scikit-learn#9571) ...
* releases: (117 commits) TST Improve SelectFromModel tests (scikit-learn#9733) Name in what's new [MRG+1] Raise error when SparseSeries is passed into classification metrics (scikit-learn#7373) Fix LogisticRegressionCV default solver value in docstring (scikit-learn#9962) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) DOC fix a typo (scikit-learn#9892) [MRG+1] Ledoit-Wolf behavior explanation (scikit-learn#9500) [MRG+1] Fix typos in documentation (scikit-learn#9878) DOC: Use setattr(self, ...) instead of self.setattr(...) (scikit-learn#9866) DOC Removed a duplicate occurrence of a word in 'sklearn.neighbors.KNeighborsRegressor' docs (scikit-learn#9862) FIX docstring of negative_outlier_factor_ in LOF (scikit-learn#9809) [MRG+1] Fix scikit-learn#9743: Adding parameter information to docstring. (scikit-learn#9757) DOC: fix docstring of Imputer.fit (scikit-learn#9769) various minor spelling tweaks (scikit-learn#9783) MAINT comment on apparent inconsistency [MRG+1] DOC fix headers level in cross_validation.rst (scikit-learn#9679) Fix mailmap format (scikit-learn#9620) DOC Fix typos (scikit-learn#9577) Typo (scikit-learn#9571) ...
* dfsg: (117 commits) TST Improve SelectFromModel tests (scikit-learn#9733) Name in what's new [MRG+1] Raise error when SparseSeries is passed into classification metrics (scikit-learn#7373) Fix LogisticRegressionCV default solver value in docstring (scikit-learn#9962) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) DOC fix a typo (scikit-learn#9892) [MRG+1] Ledoit-Wolf behavior explanation (scikit-learn#9500) [MRG+1] Fix typos in documentation (scikit-learn#9878) DOC: Use setattr(self, ...) instead of self.setattr(...) (scikit-learn#9866) DOC Removed a duplicate occurrence of a word in 'sklearn.neighbors.KNeighborsRegressor' docs (scikit-learn#9862) FIX docstring of negative_outlier_factor_ in LOF (scikit-learn#9809) [MRG+1] Fix scikit-learn#9743: Adding parameter information to docstring. (scikit-learn#9757) DOC: fix docstring of Imputer.fit (scikit-learn#9769) various minor spelling tweaks (scikit-learn#9783) MAINT comment on apparent inconsistency [MRG+1] DOC fix headers level in cross_validation.rst (scikit-learn#9679) Fix mailmap format (scikit-learn#9620) DOC Fix typos (scikit-learn#9577) Typo (scikit-learn#9571) ...
…ing. (scikit-learn#9757) * Adding parameter information to docstring. * Removing trailing whitespace from lines. * Adding details of parameter to formal Parameters section. * Shortened lines to meet requirements.
…ing. (scikit-learn#9757) * Adding parameter information to docstring. * Removing trailing whitespace from lines. * Adding details of parameter to formal Parameters section. * Shortened lines to meet requirements.
Reference Issue
Example: Fixes #9743
What does this implement/fix? Explain your changes.
Using docstring for
sklearn.model_selection.KFold
as a model, I added a description of how thetest_fold
parameter can be used.Any other comments?
There are no checks to see if
test_fold
parameter is valid, and it is possible that it has more/less entries than the number of samples.