ENH Support for XFAIL/XPASS in common tests #16306

rth · 2020-01-30T10:16:27Z

This adds support for marking common tests as a known failure in pytest.

Motivation

There are common checks that should pass for estimators, but in reality fail. These are typically either skipped by raising SkipTest, or not committed in master until all estimators pass (e.g. #15015).

With this approach we can instead mark such tests as a known failure, which will not show as an error but will be shown in the final test report e.g.,

XFAIL sklearn/tests/test_common.py::test_estimators[BernoulliRBM()-check_methods_subset_invariance]
  reason: score_samples of BernoulliRBM is not invariant when applied to a subset.

In addition, we can mark such tests as a known failure without raising the exception, the test will continue to be executed and will be marked as XFAIL (if it fails) or XPASS (if it passes) at the end. This can be implemented by passing the pytest request fixture to the check (optional). The use case of XPASS is when the failure was fixed in a PR but the corresponding common check was not modified.

If pytest is not installed or we are not inside a scikit-learn pytest session, this will not change the behavior of any way and a skip test will be raised, as before. I don't think this feature is useful for contrib projects that use check_estimators, since tests marked as failure are scikit-learn estimators exclusively.

The goal of this is that for checks that require compliance of multiple estimators, first add a check marked as XFAIL. Then let contributors add PRs to fix individual estimators, and at any time in the process show up to date information on master on what was fixed and what wasn't. I don't think this needs additional documentation at this point, rather individual issues should explain what needs to be done in their case.

In particular I would like to apply this for #15015 (comment), #16290 and possibly #16286 so it would be nice if it was merged during the Paris sprint.

Possibly cc @glemaitre @adrinjalali @jeremiedbb @lesteve

adrinjalali · 2020-01-30T10:20:53Z

I don't know pytest enough to comment on the correctness of it. My question is, how can we then run the tests and get the failing list if we want to?

rth · 2020-01-30T10:23:10Z

My question is, how can we then run the tests and get the failing list if we want to?

Yes, forgot about that. Running pytest with the --runxfail option will run tests marked as a known failure (but also all the normal tests), and errors will be reported with tracebacks as usual.

rth · 2020-01-30T10:24:24Z

setup.cfg

@@ -13,7 +13,7 @@ addopts =
    --ignore maint_tools
    --doctest-modules
    --disable-pytest-warnings
-    -rs
+    -rxXs


This says to show XFAIL and XPASS tests in the final summary report (that previously only included SKIP)

rth · 2020-01-30T10:28:50Z

BTW, the current XPASS output for common tests,

XPASS sklearn/tests/test_common.py::test_estimators[MiniBatchSparsePCA()-check_methods_subset_invariance] transform of MiniBatchSparsePCA is not invariant when applied to a subset.
XPASS sklearn/tests/test_common.py::test_estimators[NuSVC()-check_methods_subset_invariance] decision_function of NuSVC is not invariant when applied to a subset.
XPASS sklearn/tests/test_common.py::test_estimators[SparsePCA()-check_methods_subset_invariance] transform of SparsePCA is not invariant when applied to a subset.

indicates that while these checks were skipped in the past, they currently pass without an exception (at least on my laptop).

adrinjalali · 2020-01-30T10:32:47Z

Yes, forgot about that. Running pytest with the --runxfail option will run tests marked as a known failure (but also all the normal tests), and errors will be reported with tracebacks as usual.

Could we have this in our guides somewhere please? :d.

@cmarmo fixing these failing, or unmarking the currently passing ones, may be a bunch of good first issues if you have a chance to create issues related to them every now and then :)

rth · 2020-01-30T10:35:27Z

Could we have this in our guides somewhere please? :d.

It's in the pytest documentation https://docs.pytest.org/en/latest/skipping.html#ignoring-xfail I don't think we should copy pytest documentation inside scikit-learn :)

rth · 2020-01-30T10:43:57Z

sklearn/tests/test_common.py

@@ -87,13 +88,20 @@ def _tested_estimators():


 @parametrize_with_checks(_tested_estimators())
-def test_estimators(estimator, check):
+def test_estimators(estimator, check, request):


request is a built-in pytest fixture providing information of the requesting test function.

adrinjalali · 2020-01-30T10:47:09Z

tips.rst does have some though, which is where I learned some useful ones when I started, and found it useful. pytest's doc was just too much to go through to find the useful ones. But no strong feelings.

rth · 2020-01-30T10:52:02Z

sklearn/utils/estimator_checks.py

+    request: default=None
+        result of the pytest request fixture.
+    """
+    if not getattr(sys, "_is_pytest_session", False):


This is set at the begining of the test session in our conftest.py

adrinjalali · 2020-01-30T12:57:07Z

I'm happy to have this for those tests to move forward and for us to fix them gradually.

jnothman

Not really reviewed yet

sklearn/utils/estimator_checks.py

Co-Authored-By: Joel Nothman <[email protected]>

thomasjpfan · 2020-01-30T21:11:22Z

sklearn/utils/estimator_checks.py

+        else:
+            # mark test as XFAIL and continue excecution to see if it will
+            # actually fail.
+            request.applymarker(pytest.mark.xfail(run=False, reason=reason))


I would think we need to set run=True to continue execution.

Suggested change

request.applymarker(pytest.mark.xfail(run=False, reason=reason))

request.applymarker(pytest.mark.xfail(run=True, reason=reason))

But it seems like the function will continue to run regardless of the parameter. (This marker will not stop the function from running).

Either way, this is the desired behavior.

But it seems like the function will continue to run regardless of the parameter. (This marker will not stop the function from running).

Hah, yes, I found this is some github discussion a while ago. Not too sure about different parameters, it does work as expected though :)

Edit: reverted to the default run=True which is consistent with your comment.

sklearn/tests/test_common.py

jeremiedbb · 2020-01-31T13:35:37Z

According to the coverage the branch for NaiveBayes is left over

jeremiedbb · 2020-01-31T13:35:57Z

sklearn/utils/estimator_checks.py

+        raise SkipTest('XFAIL ' + str(reason))
+    try:
+        import pytest
+        if request is None:


can request be None ?

Not in the way it's used now. You are right -- simplified this function.

jeremiedbb · 2020-01-31T13:37:09Z

sklearn/utils/estimator_checks.py

+            # mark test as XFAIL and continue excecution to see if it will
+            # actually fail.
+            request.applymarker(pytest.mark.xfail(reason=reason))
+    except ImportError:


time to make pytest a dependency for the whole test suite ? :D
(I'm kidding I don't want to start a discussion here)

rth · 2020-01-31T16:07:11Z

According to the coverage the branch for NaiveBayes is left over

Yes, because they don't have class_weight so the check is never run. It's not really a known failure. Removed it.

jeremiedbb

I'm fine with this implementation

thomasjpfan · 2020-01-31T17:57:06Z

Having a check in the public API being able to xfail itself based on estimator name still feels a little strange.

Can this be more generic, i.e. something like this:

def _skiptest(reason):
    raise SkipTest('XFAIL ' + reason)

def check_class_weight_classifiers(name, estimator_orig, xfailed=_skiptest):
    if name == "NuSVC":
        xfailed("Not testing NuSVC class weight as it is ignored.")
    ...


# in test_common.py
def _xfailed_func(request, reason):
    request.applymarker(pytest.mark.xfail(reason=reason))

def test_estimators(estimator, check, request):
    ...
    args = {}
    if "xfailed" in signature(check).parameters:
        args['xfailed'] = parital(_xfailed_func, request)
    check(estimator, *args)

glemaitre · 2020-02-19T18:24:31Z

I think that I prefer this solution. This is close to what is already written. I think that in all the cases, we should move either toward solving these failures or just tagging them as known with estimator tags.
Adding the solution with the global dictionary will duplicate the effort of the estimator tag.

jnothman

I agree with @thomasjpfan that it remains awkward that the duty of marking an estimator as exceptional belongs to the check. It makes more sense for that marking to happen in test_common, and now that we have turned check_estimator into a check generator, it seems as if that should be possible. One difficulty here is that the check is only skipped for selected methods. Does than not then sound like a limitation that could be expressed through tags? Or area tags inappropriate because they imply that this is correct behaviour, while here you are trying to express that it is incorrect behaviour??

glemaitre · 2020-02-19T21:15:32Z

it remains awkward that the duty of marking an estimator as exceptional belongs to the check

I am fine with the changes proposed here: #16306 (comment)

I just feel the approach in this PR (with the changes proposed as well) better suited than the alternative #16328 due to the introduction of a new global dictionary (which is something that we tried to get rid of when we had some global list to avoid some tests).

One difficulty here is that the check is only skipped for selected methods. Does than not then sound like a limitation that could be expressed through tags? Or area tags inappropriate because they imply that this is correct behaviour, while here you are trying to express that it is incorrect behaviour??

Maybe the check should be more granular and the _skip_test tag could be a list of the tests to be skipped?

rth · 2020-02-20T09:47:16Z

I would be OK making this work with estimator tags by generalizing _skip_test to take a list of regexp, introducing an equivalent _xfail_test estimator tag.

That would address @thomasjpfan 's concerns without having to maintain a dict of skipped checks in test_estimators (or globally) and would generalize nicely to contrib projects.

rth · 2020-02-20T09:54:07Z

Thanks for the suggestion in #16306 (comment) @thomasjpfan , I agree it's an improvement, haven't had time to address it so far. Now however I'm more leaning toward a estimator tag solution.

rth added 5 commits January 30, 2020 10:16

Add known failure flag to common tests

efecd48

Better pytest integration

f185db8

Passing request more systematically

f988c0b

More comments

cf6ba82

Detect pytest session

c1733aa

rth added module:utils Sprint labels Jan 30, 2020

Remove outdated import

c8ca885

rth commented Jan 30, 2020

View reviewed changes

rth mentioned this pull request Jan 30, 2020

[MRG] Add common test and estimator tag for preserving float32 dtype in transformers #16290

Closed

rth commented Jan 30, 2020

View reviewed changes

rth added 2 commits January 30, 2020 11:49

Add Pytest --runxfail to tips

1c3f50c

Typo

43efc85

rth commented Jan 30, 2020

View reviewed changes

rth mentioned this pull request Jan 30, 2020

List of estimators with known incorrect handling of sample_weight #16298

Open

54 tasks

thomasjpfan self-requested a review January 30, 2020 19:12

thomasjpfan mentioned this pull request Jan 30, 2020

[MRG] Adds XFAIL/XPASS to common tests #16328

Closed

jnothman reviewed Jan 31, 2020

View reviewed changes

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved

Update sklearn/utils/estimator_checks.py

97d6a6a

Co-Authored-By: Joel Nothman <[email protected]>

thomasjpfan reviewed Jan 31, 2020

View reviewed changes

Add Thomas' comment

bc81abd

jeremiedbb reviewed Jan 31, 2020

View reviewed changes

sklearn/tests/test_common.py Outdated Show resolved Hide resolved

Simplify signature check

abd91f7

jeremiedbb reviewed Jan 31, 2020

View reviewed changes

rth added 2 commits January 31, 2020 17:01

Simplify _raise_xfail

c33e32e

Remove NB skip tests since its never run

29d770f

jeremiedbb approved these changes Jan 31, 2020

View reviewed changes

glemaitre approved these changes Feb 19, 2020

View reviewed changes

jnothman reviewed Feb 19, 2020

View reviewed changes

rth mentioned this pull request Feb 20, 2020

ENH XFAIL in common tests with estimator tags (v3) #16502

Merged

glemaitre closed this Feb 20, 2020

rth deleted the common-test-known-failure branch February 20, 2020 21:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Support for XFAIL/XPASS in common tests #16306

ENH Support for XFAIL/XPASS in common tests #16306

rth commented Jan 30, 2020 •

edited

Loading

adrinjalali commented Jan 30, 2020

rth commented Jan 30, 2020 •

edited

Loading

rth Jan 30, 2020

rth commented Jan 30, 2020

adrinjalali commented Jan 30, 2020

rth commented Jan 30, 2020

rth Jan 30, 2020

adrinjalali commented Jan 30, 2020

rth Jan 30, 2020 •

edited

Loading

adrinjalali commented Jan 30, 2020

jnothman left a comment

thomasjpfan Jan 30, 2020

rth Jan 31, 2020 •

edited

Loading

jeremiedbb commented Jan 31, 2020

jeremiedbb Jan 31, 2020

rth Jan 31, 2020

jeremiedbb Jan 31, 2020

rth commented Jan 31, 2020

jeremiedbb left a comment

thomasjpfan commented Jan 31, 2020

glemaitre commented Feb 19, 2020

jnothman left a comment

glemaitre commented Feb 19, 2020

rth commented Feb 20, 2020

rth commented Feb 20, 2020

	request.applymarker(pytest.mark.xfail(run=False, reason=reason))
	request.applymarker(pytest.mark.xfail(run=True, reason=reason))

ENH Support for XFAIL/XPASS in common tests #16306

ENH Support for XFAIL/XPASS in common tests #16306

Conversation

rth commented Jan 30, 2020 • edited Loading

Motivation

adrinjalali commented Jan 30, 2020

rth commented Jan 30, 2020 • edited Loading

rth Jan 30, 2020

Choose a reason for hiding this comment

rth commented Jan 30, 2020

adrinjalali commented Jan 30, 2020

rth commented Jan 30, 2020

rth Jan 30, 2020

Choose a reason for hiding this comment

adrinjalali commented Jan 30, 2020

rth Jan 30, 2020 • edited Loading

Choose a reason for hiding this comment

adrinjalali commented Jan 30, 2020

jnothman left a comment

Choose a reason for hiding this comment

thomasjpfan Jan 30, 2020

Choose a reason for hiding this comment

rth Jan 31, 2020 • edited Loading

Choose a reason for hiding this comment

jeremiedbb commented Jan 31, 2020

jeremiedbb Jan 31, 2020

Choose a reason for hiding this comment

rth Jan 31, 2020

Choose a reason for hiding this comment

jeremiedbb Jan 31, 2020

Choose a reason for hiding this comment

rth commented Jan 31, 2020

jeremiedbb left a comment

Choose a reason for hiding this comment

thomasjpfan commented Jan 31, 2020

glemaitre commented Feb 19, 2020

jnothman left a comment

Choose a reason for hiding this comment

glemaitre commented Feb 19, 2020

rth commented Feb 20, 2020

rth commented Feb 20, 2020

rth commented Jan 30, 2020 •

edited

Loading

rth commented Jan 30, 2020 •

edited

Loading

rth Jan 30, 2020 •

edited

Loading

rth Jan 31, 2020 •

edited

Loading