[MRG] Standardize sample weights validation in DummyClassifier #15510

fbchow · 2019-11-02T22:06:08Z

Reference Issues/PRs

Partially addresses #15358 for DummyClassifier

What does this implement/fix? Explain your changes.

Replaces custom validation logic with standardized method utils.validation._check_sample_weight (relatively newly introduced).

Any other comments?

eickenberg · 2019-11-02T22:25:34Z

Looks good pending CI evaluation

amueller · 2019-11-02T23:52:32Z

sklearn/dummy.py

@@ -141,7 +142,10 @@ def fit(self, X, y, sample_weight=None):

        self.n_outputs_ = y.shape[1]

-        check_consistent_length(X, y, sample_weight)
+        check_consistent_length(X, y)


I think we might still want it here, not strong opinion, though...

even though it's checked as part of the added validation?

+1 to avoid redundant checks (even if it doesn't cost much). _check_sample_weight should yield better error messages

rth

Minor comment below, otherwise LGTM. Thanks @fbchow !

rth · 2019-11-03T22:32:52Z

sklearn/dummy.py

@@ -141,7 +142,10 @@ def fit(self, X, y, sample_weight=None):

        self.n_outputs_ = y.shape[1]

-        check_consistent_length(X, y, sample_weight)
+        check_consistent_length(X, y)


+1 to avoid redundant checks (even if it doesn't cost much). _check_sample_weight should yield better error messages

rth · 2019-11-03T22:38:13Z

sklearn/tests/test_dummy.py

+    clf = DummyClassifier().fit(X, y, sample_weight)
+    assert_array_almost_equal(clf.class_prior_, [0.2 / 1.2, 1. / 1.2])
+
+    sample_weight = np.random.rand(3, 1)


For anything below this line, I'm not sure there is much sense in adding sample_weight tests for each estimator (unless they have some special behavior there) as they would be quite redundant, instead they should be enforced by common tests in sklearn/utils/estimator_checks.py (and for a number of below point they may already be).

So I would remove the below test. The check specific to DummyClassifier above is quite nice though.

Thanks for the review!

We removed the test below.

rth · 2019-11-05T22:08:11Z

Something went wrong with the merge of master here. The diff should only show the changed code...

(cherry picked from commit 95929b9)

salliewalecka · 2019-11-12T01:02:54Z

I'm Fanny's pair from the workshop. I'm not sure what happened with the merge from master. I have just branched off of master again and cherry-picked the 2 commits for ease of fixing everything. I can either push to a new branch and create a new PR or force push my current branch to overwrite the changes. Do the maintainers have a preference?

Co-authored-by: Sallie Walecka <[email protected]> (cherry picked from commit e6bced8)

amueller · 2019-11-12T21:52:05Z

in this case I think force-pushing is fine if all the comments are addressed.

salliewalecka · 2019-11-13T00:23:22Z

Looks like CI is failing due to some warning coming from the test. However, the test is identical to the test case above it (apart from stratified sampling), so I just ended up deleting it. Pushing changes now.

salliewalecka · 2019-11-15T18:17:27Z

@amueller everything should be good now

cmarmo · 2019-11-20T08:30:13Z

@rth, @amueller , is this PR ready for merge? Thanks!

rth · 2019-11-20T08:54:55Z

sklearn/dummy.py

+        check_consistent_length(X, y)
+
+        if sample_weight is not None:
+            sample_weight = _check_sample_weight(sample_weight, X)


Actually it looks like several PRs were done for this estimator as this was added on the line below in #15505. Please remove the above 3 lines.

Otherwise LGTM.

Is it worth merging at this point? Looks like the only changes left would be one linting issue and removing sample weight from check_consistent_length. Does it make more sense to close the PR instead?

reshamas · 2019-12-24T16:08:35Z

@cmarmo wondering if there still needs to be work done on this one?

rth

Merging, thanks for contributing! This indeed leave only minor changes, and most were merged in another PR.

…t-learn#15510) Co-authored-by: Sallie Walecka <[email protected]>

fbchow changed the title ~~Standardize sample weights validation in DummyClassifier~~ [MRG] Standardize sample weights validation in DummyClassifier Nov 2, 2019

amueller reviewed Nov 2, 2019

View reviewed changes

rth reviewed Nov 3, 2019

View reviewed changes

Standardize sample weights validation in DummyClassifier

b39630e

(cherry picked from commit 95929b9)

Remove tests for each classifier

caa7871

Co-authored-by: Sallie Walecka <[email protected]> (cherry picked from commit e6bced8)

salliewalecka force-pushed the dummy-class-sample-weight branch from dfa0c46 to caa7871 Compare November 12, 2019 23:34

Remove duplicate test

ee6ce0b

rth reviewed Nov 20, 2019

View reviewed changes

FIX Remove redundant checks

8509e85

rth approved these changes Dec 26, 2019

View reviewed changes

rth merged commit 725ca8f into scikit-learn:master Dec 26, 2019

panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020

MAINT Remove redundant sample_weights check in DummyClassifier (sciki…

398a7dd

…t-learn#15510) Co-authored-by: Sallie Walecka <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Standardize sample weights validation in DummyClassifier #15510

[MRG] Standardize sample weights validation in DummyClassifier #15510

fbchow commented Nov 2, 2019 •

edited by rth

Loading

eickenberg commented Nov 2, 2019

amueller Nov 2, 2019

fbchow Nov 2, 2019

rth Nov 3, 2019

rth left a comment

rth Nov 3, 2019

rth Nov 3, 2019

fbchow Nov 5, 2019

rth commented Nov 5, 2019

salliewalecka commented Nov 12, 2019 •

edited

Loading

amueller commented Nov 12, 2019

salliewalecka commented Nov 13, 2019

salliewalecka commented Nov 15, 2019

cmarmo commented Nov 20, 2019

rth Nov 20, 2019

salliewalecka Nov 20, 2019

reshamas commented Dec 24, 2019

rth left a comment

[MRG] Standardize sample weights validation in DummyClassifier #15510

[MRG] Standardize sample weights validation in DummyClassifier #15510

Conversation

fbchow commented Nov 2, 2019 • edited by rth Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

eickenberg commented Nov 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rth commented Nov 5, 2019

salliewalecka commented Nov 12, 2019 • edited Loading

amueller commented Nov 12, 2019

salliewalecka commented Nov 13, 2019

salliewalecka commented Nov 15, 2019

cmarmo commented Nov 20, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reshamas commented Dec 24, 2019

rth left a comment

Choose a reason for hiding this comment

fbchow commented Nov 2, 2019 •

edited by rth

Loading

salliewalecka commented Nov 12, 2019 •

edited

Loading