[MRG+1] Check estimator pairwise #9701

GKjohns · 2017-09-07T03:44:08Z

Reference Issue

Fixes issue #9580.

What does this implement/fix? Explain your changes.

This allows check_estimator() to work on estimators that have the _pairwise attribute set to True. test_check_estimator_pairwise() does this by calling it on a SVC with a precomputed kernel.

The I created a function that checks for the attribute in the estimator being tested and creates a precomputed Gram matrix if the estimator accepts pairwise input. In all of the applicable estimator checks, I wrap the data X in this function

Any other comments?

Some of the checks either a) can't accept precomputed kernels or b) are set to fail in cases that don't apply to precomputed kernels. In those cases I skipped the test.

…into check_estimator_pairwise pull in upstream changes

jnothman

I've not checked how complete this is, but I like the direction it's heading in!

jnothman · 2017-09-07T08:10:03Z

sklearn/base.py

+
+
+def is_pairwise(estimator):
+    """Returns True if the given estimator has a _pairwise attribute


Pep257: summary should be one line only. More description can follow after a blank line

jnothman · 2017-09-07T08:10:28Z

sklearn/base.py

+    """Returns True if the given estimator has a _pairwise attribute
+    set to True.
+
+


One blank line only please

jnothman · 2017-09-07T08:12:15Z

sklearn/utils/estimator_checks.py

 def check_estimator_sparse_data(name, estimator_orig):
+
+    # Sparse precomputed kernels aren't supported
+    if getattr(estimator_orig, 'kernel', None) == 'precomputed':


Shouldn't you use is_pairwise here?

Actually I think we should be testing that an appropriate error is raised in to case

jnothman · 2017-09-07T08:16:56Z

sklearn/utils/estimator_checks.py

@@ -1194,6 +1223,7 @@ def check_estimators_fit_returns_self(name, estimator_orig):
    X, y = make_blobs(random_state=0, n_samples=9, n_features=4)
    # some want non-negative input
    X -= X.min()
+    X = gram_matrix_if_pairwise(X, estimator_orig)


Hmm. Sometimes pairwise is for affinities, sometimes for distances. Estimators requiring distances may not play nicely with affinities and vice-versa. This may not be something we need to deal with now, but @amueller should probably consider an estimator tag which selects between distances and affinities when pairwise. Or we can use the presence of a metric parameter as a heuristic (for now)

jnothman · 2017-09-07T08:19:17Z

sklearn/base.py

+    out : bool
+        True if _pairwise is set to True and False otherwise.
+    """
+    return getattr(estimator, "_pairwise", False)


perhaps wrap this in bool, just to be sure?

jnothman · 2017-09-07T08:20:25Z

sklearn/utils/estimator_checks.py

+
+        X_train = gram_matrix_if_pairwise(X_train, classifier_orig,
+                                          kernel=rbf_kernel)
+        X_test = gram_matrix_if_pairwise(X_test, classifier_orig,


This surely can't work. The test data needs to be the kernel applied on X_test with respect to the training.

It's a wonder that tests are passing

jnothman · 2017-09-07T08:20:51Z

sklearn/utils/tests/test_estimator_checks.py

@@ -251,3 +252,9 @@ def __init__(self):
                        check_no_fit_attributes_set_in_init,
                        'estimator_name',
                        NonConformantEstimator)
+


PEP8: extra blank line required

jnothman · 2017-09-07T08:20:56Z

sklearn/utils/estimator_checks.py

@@ -1795,3 +1834,8 @@ def check_decision_proba_consistency(name, estimator_orig):
        a = estimator.predict_proba(X_test)[:, 1]
        b = estimator.decision_function(X_test)
        assert_array_equal(rankdata(a), rankdata(b))
+
+
+def check_pairwise_estimator():


remove this please

…into example_LW_shrinkage merge upstream

…into check_estimator_pairwise Pull in upstream changes

codecov · 2017-09-18T06:55:44Z

Codecov Report

Merging #9701 into master will decrease coverage by <.01%.
The diff coverage is 94.36%.

@@            Coverage Diff             @@
##           master    #9701      +/-   ##
==========================================
- Coverage   96.19%   96.19%   -0.01%     
==========================================
  Files         336      336              
  Lines       62739    62781      +42     
==========================================
+ Hits        60353    60392      +39     
- Misses       2386     2389       +3

Impacted Files	Coverage Δ
sklearn/utils/tests/test_estimator_checks.py	`96.83% <100%> (+0.14%)`	⬆️
sklearn/neighbors/regression.py	`100% <100%> (ø)`	⬆️
sklearn/neighbors/tests/test_neighbors.py	`99.43% <66.66%> (-0.15%)`	⬇️
sklearn/utils/estimator_checks.py	`93.21% <94.82%> (-0.1%)`	⬇️
sklearn/ensemble/gradient_boosting.py	`95.76% <0%> (-0.45%)`	⬇️
sklearn/decomposition/pca.py	`95.04% <0%> (-0.15%)`	⬇️
sklearn/ensemble/tests/test_gradient_boosting.py	`96.27% <0%> (-0.04%)`	⬇️
sklearn/linear_model/stochastic_gradient.py	`98.17% <0%> (ø)`	⬆️
sklearn/feature_selection/base.py	`94.79% <0%> (ø)`	⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update abb43c1...3116c23. Read the comment docs.

jnothman · 2017-09-18T08:16:19Z

Only flake8 is failing now ...

jnothman · 2017-09-18T08:23:03Z

sklearn/utils/estimator_checks.py

@@ -404,6 +419,7 @@ def check_sample_weights_pandas_series(name, estimator_orig):
        try:
            import pandas as pd
            X = pd.DataFrame([[1, 1], [1, 2], [1, 3], [2, 1], [2, 2], [2, 3]])
+            X = gram_matrix_if_pairwise(X, estimator_orig)


X output here will still not be a DataFrame, will it? There's not much point in doing this unless we make the gram matrix a DataFrame, which we might as well do even if it's a bit of a weird use of a DataFrame

jnothman · 2017-09-18T08:25:16Z

sklearn/utils/estimator_checks.py

@@ -1795,3 +1835,4 @@ def check_decision_proba_consistency(name, estimator_orig):
        a = estimator.predict_proba(X_test)[:, 1]
        b = estimator.decision_function(X_test)
        assert_array_equal(rankdata(a), rankdata(b))
+


Why the extra blank line?

jnothman · 2017-09-18T08:27:10Z

sklearn/utils/tests/test_estimator_checks.py

+    # check that check_estimator() works on estimator with _pairwise
+    # attribute set
+    est = SVC(kernel='precomputed')
+    check_estimator(est)


Can we do this for an estimator based on a metric as well as a kernel? It's very possible that doing so will break.

Got it. Can you give a quick example of an estimator based on a metric as well as a kernel?

KNeighborsRegressor or AgglomerativeClustering

… dataframes with pairwise kernel

… as a kernel

…into check_estimator_pairwise pull in upstream changes

jnothman · 2017-09-20T02:56:39Z

sklearn/utils/tests/test_estimator_checks.py

+    check_estimator(est)
+
+
+def test_check_estimator_metric_and_kernel():


There's no kernel here. But you also need metric=precomputed for this to pertain.

Perhaps to make sure these tests are doing what they're meant to, you should assert that the estimator is pairwise

…airwise estimator

…into check_estimator_pairwise pull in upstream changes

…into check_estimator_pairwise

GKjohns · 2017-10-30T16:48:25Z

@amueller fixed, knn works for sparse X where metric != 'precomputed now

amueller · 2017-10-30T17:54:21Z

thanks, lgtm. @jnothman, still good?

jnothman

Otherwise looks good.

Could you please add a new heading in the changelog called "Changes to estimator checks" and note this change there. I'll add a blurb there eventually.

jnothman · 2017-10-30T22:25:03Z

sklearn/utils/estimator_checks.py

-                                   "different from the number of features"
-                                   " in fit.".format(name)):
-                    classifier.decision_function(X.T)
+                if not _is_pairwise(classifier):


Remind me why we don't have the decision_function pairwise case?

Or predict_proba

Looking at it now. I think my initial reaction was that transposing the pairwise matrix won't raise an error. I'll get it up and running 👍🏽

…r decision_function and predict_proba

…into check_estimator_pairwise

jnothman · 2017-11-01T06:49:11Z

I'm happy to merge once you've updated the changelog in doc/whats_new/v0.20.rst

…into check_estimator_pairwise

jnothman

Sorry my last look has uncovered a few minor things

jnothman · 2017-11-01T21:08:41Z

doc/whats_new/v0.20.rst

+Changes to estimator checks
+---------------------------
+
+- Pairwise Estimators


Please include a full description of what you're now checking, with reference to the PR and attribution, just like the other changelog entries. Thanks

jnothman · 2017-11-06T07:33:53Z

doc/whats_new/v0.20.rst

+
+- Allow tests in :func:`estimator_checks.check_estimator` to test functions
+  that accept pairwise data.
+  :issue:`9701` by :user:`Andreas Mueller <amueller>`


This conventionally mentions the contributor of the fix, not the person who raised the issue.

jnothman · 2017-11-06T07:34:03Z

sklearn/neighbors/regression.py

@@ -139,6 +140,11 @@ def predict(self, X):
        y : array of int, shape = [n_samples] or [n_samples, n_outputs]
            Target values
        """
+        if issparse(X) and self.metric == 'precomputed':
+            raise ValueError(
+                "Sparse matricies not supported for prediction with "


matricies -> matrices

jnothman · 2017-11-06T07:35:21Z

sklearn/neighbors/tests/test_neighbors.py

-            assert_true(np.mean(knn.predict(X2).round() == y) > 0.95)
+            # sparse precomputed distance matrices not supported for prediction
+            if knn.metric == 'precomputed':
+                assert_raises(ValueError, knn.predict, csr_matrix(X2))


this is never actually run it seems...

jnothman · 2017-11-06T07:36:26Z

sklearn/utils/estimator_checks.py

+def pairwise_estimator_convert_X(X, estimator, kernel=linear_kernel):
+
+    if len(X.shape) == 1:
+        X = X.reshape(-1, 1)


when is this needed? It seems the line is not currently covered by tests.

Added it in case for some reason X is a 1-D array. I'll remove it

…essor_sparse(). Already checked using test_check_estimator_pairwise()

jnothman

It may still be a good idea to assert that predicting on a sparse precomputed matrix in knn raises a ValueError, but the test where you put that assertion didn't run it with metric=precomputed.

…into check_estimator_pairwise

…y if yes

jnothman · 2017-11-09T22:35:14Z

Happy to merge when green

GKjohns · 2017-11-13T17:51:33Z

@jnothman is there anything else I should do before you merge?

jnothman · 2017-11-13T22:08:49Z

I'd sort of expected someone in a different timezone would hit the green button!

Thanks for the ping, and for your work!

amueller · 2017-11-15T17:14:41Z

Congrats @GKjohns :)

GKjohns added 7 commits August 29, 2017 02:08

initial commit

b2ac13c

add test for check_estimator on SVC(kernel='precomputed')

124622b

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

bb352be

…into check_estimator_pairwise pull in upstream changes

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

77fcb46

…into check_estimator_pairwise pull in upstream changes

change tests to run on estimators with _pairwise set to True

578865e

fix typo in is_pairwise docstring

d6f3c27

fix PEP8 issues: line length and unused import

d9fff0a

jnothman reviewed Sep 7, 2017

View reviewed changes

GKjohns added 5 commits September 12, 2017 21:11

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

4f1dc37

…into example_LW_shrinkage merge upstream

use is_pairwise() to check for precomputed kernel

e89b9e4

fix precomputed test/train matricies for check_class_weight_classifiers

c9c6a49

fix PEP8 issues

7894231

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

bbdc67f

…into check_estimator_pairwise Pull in upstream changes

add final empty line

d3fcb3e

jnothman reviewed Sep 18, 2017

View reviewed changes

GKjohns added 6 commits September 18, 2017 12:39

ensure check_sample_weights_pandas_series actually operates on pandas…

ffeb68e

… dataframes with pairwise kernel

remove blank lines as end of file, flake8

298fa84

remove unused import

a631951

add estimator check for estimators that are based on a metric as well…

b58e6bf

… as a kernel

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

e222145

…into check_estimator_pairwise pull in upstream changes

add extra line, PEP8

273d8ee

jnothman reviewed Sep 20, 2017

View reviewed changes

GKjohns added 6 commits September 20, 2017 12:43

add check to ensure test_check_estimator_pairwise actually checks a p…

68bacdb

…airwise estimator

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

0efc4a6

…into check_estimator_pairwise pull in upstream changes

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

92cbe2b

…into check_estimator_pairwise

alter gram_matrix_if_pairwise to account for pairwise metrics

142eab4

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

14f5f5f

…into check_estimator_pairwise

make test for 2d y features work

c06e404

GKjohns added 2 commits October 30, 2017 11:22

PEP8 fix line length

892771f

PEP8 again

ad65839

amueller approved these changes Oct 30, 2017

View reviewed changes

jnothman approved these changes Oct 30, 2017

View reviewed changes

GKjohns added 3 commits October 31, 2017 20:18

change check_classifiers_train() test to check pairwise eatimators fo…

b56899b

…r decision_function and predict_proba

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

7050908

…into check_estimator_pairwise

PEP8 line length fix

e055fb8

GKjohns added 3 commits November 1, 2017 12:21

update whats_new with changes to estimator checks

efeb067

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

8c9b92f

…into check_estimator_pairwise

add change details to whats_new

3116c23

jnothman reviewed Nov 6, 2017

View reviewed changes

GKjohns added 2 commits November 6, 2017 11:59

remove unused lines in estimator_checks.pairwise_estimator_convert_X()

42fa8f4

remove assert_raises() for precomputed metric in test_kneighbors_regr…

69d7876

…essor_sparse(). Already checked using test_check_estimator_pairwise()

jnothman reviewed Nov 6, 2017

View reviewed changes

GKjohns added 3 commits November 9, 2017 16:35

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

3223b23

…into check_estimator_pairwise

check if test data is sparse, check for ValueError instead of accurac…

44f4dd6

…y if yes

remove redundant backslash

b5ede88

jnothman merged commit de0581a into scikit-learn:master Nov 13, 2017

GKjohns deleted the check_estimator_pairwise branch November 13, 2017 23:19

qinhanmin2014 mentioned this pull request Nov 14, 2017

Ensure check_estimators works with SVC(kernel='precomputed') #9580

Closed

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

TST Check estimator pairwise (scikit-learn#9701)

1eb2e8a

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

TST Check estimator pairwise (scikit-learn#9701)

94c6217

jnothman mentioned this pull request Jul 5, 2018

Add common tests for pairwise non-classifiers #11433

Open



		def is_pairwise(estimator):
		"""Returns True if the given estimator has a _pairwise attribute

		"""Returns True if the given estimator has a _pairwise attribute
		set to True.

		check_estimator(est)


		def test_check_estimator_metric_and_kernel():

Uh oh!

[MRG+1] Check estimator pairwise #9701

[MRG+1] Check estimator pairwise #9701

Uh oh!

Conversation

GKjohns commented Sep 7, 2017

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jnothman commented Sep 18, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GKjohns commented Oct 30, 2017

Uh oh!

amueller commented Oct 30, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Nov 1, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

codecov bot commented Sep 18, 2017 •

edited

Loading