[MRG+1] Raise error when SparseSeries is passed into classification metrics #7373

nielsenmarkus11 · 2016-09-08T23:34:20Z

Reference Issue

What does this implement/fix? Explain your changes.

This change raises an error when the type is a pandas SparseSeries of either the y_true or y_score input.

Any other comments?

maniteja123 · 2016-09-09T06:25:45Z

sklearn/metrics/ranking.py

@@ -22,6 +22,7 @@
 import warnings
 import numpy as np
 from scipy.sparse import csr_matrix
+import pandas


Hi, thanks for the PR. But AFAIR, it is a convention in general to import pandas and also put it in a try catch block ( like here ). I suppose this is necessary since Pandas is not a compulsory requirement for installing scikit-learn and this import would raise an error in case Pandas isn't installed. Other than that, LGTM.

actually, you can not import pandas AT ALL. What you refer to is a test, and you can't really do that same thing here. You can check the class name, though, if you like.

@amueller sorry for my ignorance. Will keep it in mind from next time.

@maniteja123 don't sweat it, any help is appreciated :)

Okay. I will be removing this line. Thanks.

jnothman · 2016-09-12T12:56:05Z

This shouldn't just be in roc_* but should fix all metrics by putting in type_of_target. Needs tests.

nielsenmarkus11 · 2016-09-12T17:07:28Z

@jnothman, just for clarity, since roc_* doesn't check using the type_of_target would you still recommend adding the type_of_target call to the function within roc_* or to _binary_clf_curve?

Thanks

…arget' function. Finally, add 'type_of_target' call to _binary_clf_curve

Update from Original

into sparse

nielsenmarkus11 · 2016-09-12T18:45:35Z

I've removed the import pandas code and incorporated a check within the type_of_target code. Tested both in Python 3.5.2 and 2.7.12. Are there any additional tests that I need to do?

amueller · 2016-09-12T18:50:15Z

you should add a test. I'm not sure if we have a mock for a sparse series but there are several pandas mocks that you could use.

nielsenmarkus11 · 2016-09-12T19:00:10Z

Just to make sure I understand correctly... I should add a new test to this code in sklean/utils/tests/test_multiclass.py, correct?

def test_type_of_target():
    for group, group_examples in iteritems(EXAMPLES):
        for example in group_examples:
            assert_equal(type_of_target(example), group,
                         msg=('type_of_target(%r) should be %r, got %r'
                              % (example, group, type_of_target(example))))

    for example in NON_ARRAY_LIKE_EXAMPLES:
        msg_regex = 'Expected array-like \(array or non-string sequence\).*'
        assert_raises_regex(ValueError, msg_regex, type_of_target, example)

    for example in MULTILABEL_SEQUENCES:
        msg = ('You appear to be using a legacy multi-label data '
               'representation. Sequence of sequences are no longer supported;'
               ' use a binary array or sparse matrix instead.')
        assert_raises_regex(ValueError, msg, type_of_target, example)

amueller · 2016-09-12T20:27:11Z

yeah that sounds like a reasonable place

nielsenmarkus11 · 2016-09-12T22:40:32Z

sklearn/metrics/ranking.py

@@ -294,6 +294,11 @@ def _binary_clf_curve(y_true, y_score, pos_label=None, sample_weight=None):
    thresholds : array, shape = [n_thresholds]
        Decreasing score values.
    """
+    # Check to make sure y_true is valid
+    y_type = type_of_target(y_true)
+    if y_type != "binary":


Also, in some of the other testing I'm getting the error: ValueError: multiclass format is not supported. I was under the impression that the _binary_clf_curve required 'binary' data. Should it also be allowed to accept 'multiclass' data?

nielsenmarkus11 · 2016-09-12T22:43:14Z

sklearn/utils/tests/test_multiclass.py

+        from pandas import SparseSeries
+    except ImportError:
+        pass
+    y = SparseSeries([1, 0, 0, 1, 0])


So I'm seeing the error in the automatic checks that states: UnboundLocalError: local variable 'SparseSeries' referenced before assignment Do I need to put all of the test code within the try block?

…l_curve_pos_label since as multiclass it doesn't make sense

…inary_clf_curve to test new logic in _binary_clf_curve function

nielsenmarkus11 · 2016-09-13T15:42:25Z

Okay. I think it should be good now. I understand the supposed issue with test_precision_recall_curve_pos_label and put this back in. Perhaps the documentation can be updated to include the pos_label exception for the y_true input. I've added the test back in as well as a test_binary_clf_curve function.

jnothman

Otherwise, this LGTM.

jnothman · 2017-05-28T14:35:40Z

sklearn/utils/tests/test_multiclass.py

+        msg = "y cannot be class 'SparseSeries'."
+        assert_raises_regex(ValueError, msg, type_of_target, y)
+    except ImportError:
+        pass


Please wrap only the import statement and use raise SkipTest("Pandas not found") as elsewhere

nielsenmarkus11 · 2017-05-30T15:56:19Z

sklearn/utils/tests/test_multiclass.py

+
+    y = SparseSeries([1, 0, 0, 1, 0])
+    msg = "y cannot be class 'SparseSeries'."
+    assert_raises_regex(ValueError, msg, type_of_target, y)



Per @jnothman 's request, I've only wrapped the import and added raise SkipTest("Pandas not found") otherwise.

jnothman

LGTM

GaelVaroquaux · 2017-06-01T11:47:00Z

sklearn/metrics/ranking.py

+    y_type = type_of_target(y_true)
+    if not (y_type == "binary" or
+            (y_type == "multiclass" and pos_label is not None)):
+        raise ValueError("{0} format is not supported".format(y_type))


It's a nitpick, but it would help the user to give a different error message if y_type == "multiclass" and pos_label is None.

Beside, I am surprised, but it is really the case that in multiclass settings we require the pos_label not to be specified? I would have though the opposite. Is there an error in the condition above, or in my assumptions on our code?

As @jnothman pointed out: the code is correct, I was confused by the double negation.

Still, a different error message would help.

Still, a different error message would help.

Agreed. The error message template I generally try to follow is something like:

Allowed values for parameter_name are ['value1', 'value2', 'value3']. Instead you provided 'parameter_name={parameter_value}'

@lesteve, my choice of error message was copy pasted from other portions of this code. I chose the language to be consistent with the other instances of similar errors in ranking.py.

lesteve · 2017-06-01T11:59:30Z

Hmmm I am a bit confused on this one. I commented on the issue, see #7352 (comment).

jnothman · 2017-06-01T13:27:15Z

Gael i think the confusion is yours in reading a double negation

…

On 1 Jun 2017 9:59 pm, "Loïc Estève" ***@***.***> wrote: Hmmm I am a bit confused on this one. I commented on the issue, see #7352 (comment) <#7352 (comment)> . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7373 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz62XaeqMndbeD52eOJWf6AAP9osh8ks5r_qejgaJpZM4J4jvQ> .

lesteve · 2017-06-01T14:00:51Z

sklearn/utils/multiclass.py

@@ -234,6 +234,10 @@ def type_of_target(y):
        raise ValueError('Expected array-like (array or non-string sequence), '
                         'got %r' % y)

+    sparseseries = (y.__class__.__name__ == 'SparseSeries')


Just testing the name of the class is a bit dodgy, I think it would be better to use an isinstance.

So, I've gone back to using the name of the class, per prior comments from @amueller on commit d21c7e38674388f97e146aef67f42bef2fe5d2d2, pandas should not be imported at all except in test.

jnothman · 2017-06-01T22:09:27Z

You could be more specific, if that's a concern, by checking the type's module name starts with 'pandas.'. you could also avoid __class__ with type() etc. You could check the object's mro to see if it is a subclass of SparseSeries. , as I've said, I think this is fine

…

On 2 Jun 2017 7:31 am, "nielsenmarkus11" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In sklearn/utils/multiclass.py <#7373 (comment)> : > @@ -234,6 +234,10 @@ def type_of_target(y): raise ValueError('Expected array-like (array or non-string sequence), ' 'got %r' % y) + sparseseries = (y.__class__.__name__ == 'SparseSeries') So, I've gone back to using the name of the class, per prior comments from @amueller <https://github.com/amueller> on commit d21c7e3 <d21c7e3>. Per his comments pandas should not be imported at all except in test. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7373 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz67_7fTexHGOhdpageYjs3sFNq_shks5r_y2ZgaJpZM4J4jvQ> .

jnothman · 2017-10-03T03:32:28Z

Another review here?

nielsenmarkus11 · 2017-10-06T13:56:55Z

Ping @amueller

GaelVaroquaux · 2017-10-06T15:00:36Z

LGTM. Merging given @jnothman 's +1

GaelVaroquaux · 2017-10-06T15:00:46Z

Thanks!

nielsenmarkus11 · 2017-10-06T22:13:12Z

Thank you!

@jnothman

…etrics (scikit-learn#7373) * Raise error when SparseSeries is passed into roc_curve * Changed "y_true" in second if block to "y_score" * Remove code to import pandas and add sparseseries check to 'type_of_target' function. Finally, add 'type_of_target' call to _binary_clf_curve * Remove pandas import and old comparison in roc_curve. * Add test for 'type_of_target' function * Add white space after commas * Correct other white space issues * Move type_of_target test into try clause, remove test_precision_recall_curve_pos_label since as multiclass it doesn't make sense * Add test_precision_recall_curve_pos_label back in and also add test_binary_clf_curve to test new logic in _binary_clf_curve function * Correct syntax and formatting. * Remove trailing white space * Correct validation logic * Update test_multiclass.py per @jnothman 's request. * Import SkipTest function. * Remove extra white space from line 303

* tag '0.19.1': (117 commits) TST Improve SelectFromModel tests (scikit-learn#9733) Name in what's new [MRG+1] Raise error when SparseSeries is passed into classification metrics (scikit-learn#7373) Fix LogisticRegressionCV default solver value in docstring (scikit-learn#9962) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) DOC fix a typo (scikit-learn#9892) [MRG+1] Ledoit-Wolf behavior explanation (scikit-learn#9500) [MRG+1] Fix typos in documentation (scikit-learn#9878) DOC: Use setattr(self, ...) instead of self.setattr(...) (scikit-learn#9866) DOC Removed a duplicate occurrence of a word in 'sklearn.neighbors.KNeighborsRegressor' docs (scikit-learn#9862) FIX docstring of negative_outlier_factor_ in LOF (scikit-learn#9809) [MRG+1] Fix scikit-learn#9743: Adding parameter information to docstring. (scikit-learn#9757) DOC: fix docstring of Imputer.fit (scikit-learn#9769) various minor spelling tweaks (scikit-learn#9783) MAINT comment on apparent inconsistency [MRG+1] DOC fix headers level in cross_validation.rst (scikit-learn#9679) Fix mailmap format (scikit-learn#9620) DOC Fix typos (scikit-learn#9577) Typo (scikit-learn#9571) ...

* releases: (117 commits) TST Improve SelectFromModel tests (scikit-learn#9733) Name in what's new [MRG+1] Raise error when SparseSeries is passed into classification metrics (scikit-learn#7373) Fix LogisticRegressionCV default solver value in docstring (scikit-learn#9962) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) DOC fix a typo (scikit-learn#9892) [MRG+1] Ledoit-Wolf behavior explanation (scikit-learn#9500) [MRG+1] Fix typos in documentation (scikit-learn#9878) DOC: Use setattr(self, ...) instead of self.setattr(...) (scikit-learn#9866) DOC Removed a duplicate occurrence of a word in 'sklearn.neighbors.KNeighborsRegressor' docs (scikit-learn#9862) FIX docstring of negative_outlier_factor_ in LOF (scikit-learn#9809) [MRG+1] Fix scikit-learn#9743: Adding parameter information to docstring. (scikit-learn#9757) DOC: fix docstring of Imputer.fit (scikit-learn#9769) various minor spelling tweaks (scikit-learn#9783) MAINT comment on apparent inconsistency [MRG+1] DOC fix headers level in cross_validation.rst (scikit-learn#9679) Fix mailmap format (scikit-learn#9620) DOC Fix typos (scikit-learn#9577) Typo (scikit-learn#9571) ...

* dfsg: (117 commits) TST Improve SelectFromModel tests (scikit-learn#9733) Name in what's new [MRG+1] Raise error when SparseSeries is passed into classification metrics (scikit-learn#7373) Fix LogisticRegressionCV default solver value in docstring (scikit-learn#9962) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) DOC fix a typo (scikit-learn#9892) [MRG+1] Ledoit-Wolf behavior explanation (scikit-learn#9500) [MRG+1] Fix typos in documentation (scikit-learn#9878) DOC: Use setattr(self, ...) instead of self.setattr(...) (scikit-learn#9866) DOC Removed a duplicate occurrence of a word in 'sklearn.neighbors.KNeighborsRegressor' docs (scikit-learn#9862) FIX docstring of negative_outlier_factor_ in LOF (scikit-learn#9809) [MRG+1] Fix scikit-learn#9743: Adding parameter information to docstring. (scikit-learn#9757) DOC: fix docstring of Imputer.fit (scikit-learn#9769) various minor spelling tweaks (scikit-learn#9783) MAINT comment on apparent inconsistency [MRG+1] DOC fix headers level in cross_validation.rst (scikit-learn#9679) Fix mailmap format (scikit-learn#9620) DOC Fix typos (scikit-learn#9577) Typo (scikit-learn#9571) ...

@jnothman

…etrics (scikit-learn#7373) * Raise error when SparseSeries is passed into roc_curve * Changed "y_true" in second if block to "y_score" * Remove code to import pandas and add sparseseries check to 'type_of_target' function. Finally, add 'type_of_target' call to _binary_clf_curve * Remove pandas import and old comparison in roc_curve. * Add test for 'type_of_target' function * Add white space after commas * Correct other white space issues * Move type_of_target test into try clause, remove test_precision_recall_curve_pos_label since as multiclass it doesn't make sense * Add test_precision_recall_curve_pos_label back in and also add test_binary_clf_curve to test new logic in _binary_clf_curve function * Correct syntax and formatting. * Remove trailing white space * Correct validation logic * Update test_multiclass.py per @jnothman 's request. * Import SkipTest function. * Remove extra white space from line 303

@jnothman

…etrics (scikit-learn#7373) * Raise error when SparseSeries is passed into roc_curve * Changed "y_true" in second if block to "y_score" * Remove code to import pandas and add sparseseries check to 'type_of_target' function. Finally, add 'type_of_target' call to _binary_clf_curve * Remove pandas import and old comparison in roc_curve. * Add test for 'type_of_target' function * Add white space after commas * Correct other white space issues * Move type_of_target test into try clause, remove test_precision_recall_curve_pos_label since as multiclass it doesn't make sense * Add test_precision_recall_curve_pos_label back in and also add test_binary_clf_curve to test new logic in _binary_clf_curve function * Correct syntax and formatting. * Remove trailing white space * Correct validation logic * Update test_multiclass.py per @jnothman 's request. * Import SkipTest function. * Remove extra white space from line 303

mniels17 and others added 2 commits September 8, 2016 17:29

Raise error when SparseSeries is passed into roc_curve

d21c7e3

Changed "y_true" in second if block to "y_score"

9c0ca7a

maniteja123 reviewed Sep 9, 2016
View reviewed changes

mniels17 and others added 5 commits September 12, 2016 12:19

Remove code to import pandas and add sparseseries check to 'type_of_t…

ab44db1

…arget' function. Finally, add 'type_of_target' call to _binary_clf_curve

Merge pull request #1 from scikit-learn/master

01299b0

Update from Original

Merge pull request #2 from nielsenmarkus11/master

0bcb604

Update from Original

Merge branch 'sparse' of https://github.com/nielsenmarkus11/scikit-learn

5bb4589

into sparse

Remove pandas import and old comparison in roc_curve.

1c56ea4

Add test for 'type_of_target' function

bdffb30

nielsenmarkus11 reviewed Sep 12, 2016
View reviewed changes

Add white space after commas

c469546

nielsenmarkus11 reviewed Sep 12, 2016
View reviewed changes

mniels17 added 6 commits September 12, 2016 16:59

Correct other white space issues

4f47906

Move type_of_target test into try clause, remove test_precision_recal…

0a4c9cf

…l_curve_pos_label since as multiclass it doesn't make sense

Add test_precision_recall_curve_pos_label back in and also add test_b…

0aa27c9

…inary_clf_curve to test new logic in _binary_clf_curve function

Correct syntax and formatting.

643d257

Remove trailing white space

81e308d

Correct validation logic

41e51ed

amueller added this to the 0.19 milestone Oct 27, 2016

jnothman approved these changes May 28, 2017

View reviewed changes

jnothman changed the title ~~Raise error when SparseSeries is passed into roc_curve~~ [MRG+1] Raise error when SparseSeries is passed into roc_curve May 28, 2017

Update test_multiclass.py per @jnothman 's request.

d435ee1

nielsenmarkus11 commented May 30, 2017

View reviewed changes

mniels17 and others added 2 commits May 31, 2017 16:06

Merge remote-tracking branch 'upstream/master' into sparse

af1fbb2

Remove extra white space from line 303

28b452a

jnothman approved these changes Jun 1, 2017

View reviewed changes

GaelVaroquaux reviewed Jun 1, 2017

View reviewed changes

lesteve reviewed Jun 1, 2017

View reviewed changes

nielsenmarkus11 force-pushed the sparse branch from a4b2fb6 to 28b452a Compare June 1, 2017 21:26

jnothman added Bug Enhancement Waiting for Reviewer labels Jun 14, 2017

jnothman changed the title ~~[MRG+1] Raise error when SparseSeries is passed into roc_curve~~ [MRG+1] Raise error when SparseSeries is passed into classification metrics Jun 18, 2017

Merge branch 'master' into sparse

e1890b1

GaelVaroquaux merged commit 3a48f0a into scikit-learn:master Oct 6, 2017

nielsenmarkus11 deleted the sparse branch October 6, 2017 22:14

nielsenmarkus11 restored the sparse branch October 6, 2017 22:14

amueller mentioned this pull request Oct 20, 2017

Release of version 0.19.1 #9607

Merged

varunagrawal mentioned this pull request Feb 25, 2018

precision_recall_curve no longer supports multilabel-indicator type #10690

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] Raise error when SparseSeries is passed into classification metrics #7373

[MRG+1] Raise error when SparseSeries is passed into classification metrics #7373

nielsenmarkus11 commented Sep 8, 2016

maniteja123 Sep 9, 2016

amueller Sep 9, 2016

maniteja123 Sep 9, 2016

amueller Sep 9, 2016

nielsenmarkus11 Sep 12, 2016

jnothman commented Sep 12, 2016

nielsenmarkus11 commented Sep 12, 2016 •

edited

Loading

nielsenmarkus11 commented Sep 12, 2016

amueller commented Sep 12, 2016

nielsenmarkus11 commented Sep 12, 2016

amueller commented Sep 12, 2016

nielsenmarkus11 Sep 12, 2016

nielsenmarkus11 Sep 12, 2016

nielsenmarkus11 commented Sep 13, 2016

jnothman left a comment

jnothman May 28, 2017

nielsenmarkus11 May 30, 2017

jnothman left a comment

GaelVaroquaux Jun 1, 2017

GaelVaroquaux Jun 1, 2017

lesteve Jun 1, 2017

nielsenmarkus11 Jun 1, 2017

lesteve commented Jun 1, 2017

jnothman commented Jun 1, 2017 via email

lesteve Jun 1, 2017

nielsenmarkus11 Jun 1, 2017 •

edited

Loading

jnothman commented Jun 1, 2017 via email

jnothman commented Oct 3, 2017

nielsenmarkus11 commented Oct 6, 2017

GaelVaroquaux commented Oct 6, 2017

GaelVaroquaux commented Oct 6, 2017

nielsenmarkus11 commented Oct 6, 2017

[MRG+1] Raise error when SparseSeries is passed into classification metrics #7373

[MRG+1] Raise error when SparseSeries is passed into classification metrics #7373

Conversation

nielsenmarkus11 commented Sep 8, 2016

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman commented Sep 12, 2016

nielsenmarkus11 commented Sep 12, 2016 • edited Loading

nielsenmarkus11 commented Sep 12, 2016

amueller commented Sep 12, 2016

nielsenmarkus11 commented Sep 12, 2016

amueller commented Sep 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nielsenmarkus11 commented Sep 13, 2016

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lesteve commented Jun 1, 2017

jnothman commented Jun 1, 2017 via email

Choose a reason for hiding this comment

nielsenmarkus11 Jun 1, 2017 • edited Loading

Choose a reason for hiding this comment

jnothman commented Jun 1, 2017 via email

jnothman commented Oct 3, 2017

nielsenmarkus11 commented Oct 6, 2017

GaelVaroquaux commented Oct 6, 2017

GaelVaroquaux commented Oct 6, 2017

nielsenmarkus11 commented Oct 6, 2017

nielsenmarkus11 commented Sep 12, 2016 •

edited

Loading

nielsenmarkus11 Jun 1, 2017 •

edited

Loading