Thanks to visit codestin.com
Credit goes to github.com

Skip to content

TST: Mark slow tests in astropy/samp #16095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 12, 2024

Conversation

neutrinoceros
Copy link
Contributor

Description

A quick follow up to #16064
I found the 10 longest (non-slow, not parametrized) tests with pytest astropy/ --timer-top-n 20 (with pytest-timer), which is also about the same set of tests that take
longer than a second on my machine (supposedly a pretty fast M2).
On my install, pytest astropy takes about 2'15 on main, and 1'55 with this branch, hence I claim that skipping those 10 tests (out of 24k) might save about 10% CI time.

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

Copy link
Contributor

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

Copy link
Contributor

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

@neutrinoceros neutrinoceros marked this pull request as ready for review February 23, 2024 08:15
@neutrinoceros neutrinoceros requested a review from saimn as a code owner February 23, 2024 08:15
@pllim pllim added this to the v6.1.0 milestone Feb 23, 2024
@pllim
Copy link
Member

pllim commented Feb 23, 2024

I'll have to ponder later if the cost of not running this for all combo is worth the benefit. Not like we just sit staring at the CI until it finishes.

@mhvk
Copy link
Contributor

mhvk commented Feb 25, 2024

I like this idea, but it would be good to get some input on whether these specific tests are things that are likely to fail on just one architecture.

Also, some of the slowest tests may be ones that are excessively parametrized (or the hypothesis ones in astropy.time.tests.test_precision). I think there may be a way to look at those - maybe numpy/numpy#25472 (comment)? But obviously fine to do that later.

pllim
pllim previously requested changes Feb 26, 2024
Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I am okay with marking SAMP as slow, but not comfortable with io.fits and utils. FITS is a can of worms and utils is backbone of many things. I want them both to be tested widely. A bit of slowness in CI is a price I am willing to pay (not that we're paying, har har).

Thank you for your understanding!

@@ -204,6 +204,7 @@ def test_disable_image_compression(self):
with fits.open(self.data("comp.fits")) as hdul:
assert isinstance(hdul[1], fits.CompImageHDU)

@pytest.mark.slow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looking at the actual test, I think the only reason it is slow is because it sleeps for 1 second! Can we change that to 0.1 s?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works on my system but I'm not sure how portable this is, maybe the test would fail on slower archs/containers/VMs ? Anyway, time.sleep(1) is used in combinations with pytest.mark.slow in other similar tests, but admittedly a difference is that in this one case, sleep is actually responsible for most of the test's time, so let's try that.

@@ -48,6 +48,7 @@ def test_open(self):
assert ghdu.data[0].data.shape == naxes[::-1]
assert ghdu.data[0].parnames == parameters

@pytest.mark.slow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the same here! It sleeps for 1 second.

@@ -927,6 +927,7 @@ def test_image_update_header(self):

# The test below raised a `ResourceWarning: unclosed transport` exception
# due to a bug in Python <=3.10 (cf. cpython#90476)
@pytest.mark.slow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note that even cutting 90% of sleep time in this test only makes it 5 times longer than the second slowest in the module. That's about 0.2s on my machine, which is probably acceptable. Let's try !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said before, I am not comfortable marking anything in io.fits as slow here. I will defer to @saimn .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well there is no good value here, depends on the system, disk, caching etc.
So 1s is maybe a bit too much but seems reasonable to avoid wasting time debugging issues on CI or on exotic platforms.

@@ -17,6 +17,7 @@ def test_SAMPHubServer():
SAMPHubServer(web_profile=False, mode="multiple", pool_size=1)


@pytest.mark.slow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And another one that sleeps 1 sec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The very next test also combines sleep(1) + pytest.mark.slow. In other tests, the file system would be the bottleneck and I think it's reasonable to assume that 1s is (too) generous, but here I'm actually not so sure: starting a server seems like a much more involved task than changing flags on a file.

@@ -155,6 +155,7 @@ def test_progress_bar_as_generator():
assert sum == 1225


@pytest.mark.slow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_progress_bar_func.func also sleeps (though not as long)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, sleep time is actually not dominant, so I think it should still be marked as slow with no other changes.

@mhvk
Copy link
Contributor

mhvk commented Feb 26, 2024

Looking through the actual tests, quite a few are slow just because they are sleeping - with 12000 tests, one really should not do even sleep(1) as a matter of course (but then, the FITS tests are not the newest...). I'm less sure why the utils/data tests are so slow; those tests construct whole lists of fake URLs, but by inspection it is not clear how many those actually are.

@namurphy
Copy link
Contributor

namurphy commented Mar 1, 2024

Over in PlasmaPy, we've started marking tests as slow if they take longer than ∼0.25–0.5 seconds. By doing that, and caching .tox between runs (PlasmaPy/PlasmaPy#2552), we've been able to get our "skip slow" tests to finish in about a minute. That covers ∼90% of our ∼4300 tests. We still run the full test suite once in CI to get code coverage, and we use cron jobs our test suite to cover multiple architectures and versions of Python. For CI, it's been incredibly helpful to get rapid feedback.

This a long way of saying...thank you for doing this! 😺

@mhvk
Copy link
Contributor

mhvk commented Mar 1, 2024

I like the idea of caching - thanks for linking to that example setup!!

@neutrinoceros neutrinoceros force-pushed the tests/rfc/mark_slow_tests branch from 5fd66b3 to 6d53ac6 Compare March 11, 2024 14:23
@@ -925,9 +925,6 @@ def test_image_update_header(self):
with fits.open(self.temp("test0.fits")) as hdul:
assert (orig_data == hdul[1].data).all()

# The test below raised a `ResourceWarning: unclosed transport` exception
# due to a bug in Python <=3.10 (cf. cpython#90476)
@pytest.mark.filterwarnings("ignore:unclosed transport <asyncio.sslproto")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this cleanup is actually orthogonal, I've opened #16183 for it.

@neutrinoceros
Copy link
Contributor Author

Sorry I forgot about this one for a couple weeks !
In addressing @mhvk's review, I actually reverted all slow markers that I previously added in io.fits tests, and instead made them "faster" (or less sleepy, depending how you want to look at it).
@pllim, would you be happy with the change if I just revert all markers I added to utils tests too ?

@neutrinoceros neutrinoceros force-pushed the tests/rfc/mark_slow_tests branch from 6d53ac6 to 93f55eb Compare March 11, 2024 14:48
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'll approve for all but utils, deferring to @pllim for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also not comfortable in marking utils tests as slow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Much appreciated.

@neutrinoceros neutrinoceros force-pushed the tests/rfc/mark_slow_tests branch from 93f55eb to 40fdec3 Compare March 11, 2024 15:59
@neutrinoceros neutrinoceros changed the title TST: mark top 10 slowest tests with pytest.mark.slow (save ~10% on CI) TST: Mark slow tests in astropy/samp and reduce sleeping time in long running io/fits tests Mar 11, 2024
@pllim pllim dismissed their stale review March 11, 2024 16:08

utils address, deferring fits

@pllim pllim removed the utils label Mar 11, 2024
@saimn
Copy link
Contributor

saimn commented Mar 12, 2024

Finally coming to this one (after catching up will the flood of notifications, taking some time off is getting harder...).
So when adding the slow mark, we used to have tests that take way more than 10 sec. (see for example the run with slow tests, https://github.com/astropy/astropy/actions/runs/8011602338/job/21885207661#step:10:1721)

Now for a run without slow tests (https://github.com/astropy/astropy/actions/runs/8011602338/job/21885207469#step:10:1717), it seems more reasonable, with only a few tests takings more than 2 sec. (on CI, which is usually much ower):

4.10s call     docs/timeseries/lombscarglemb.rst::lombscarglemb.rst
3.35s call     .tox/py310-test-alldeps/Lib/site-packages/astropy/coordinates/tests/test_angles.py::test_angle_multithreading
2.34s call     .tox/py310-test-alldeps/Lib/site-packages/astropy/nddata/tests/test_nduncertainty.py::test_for_leak_with_uncertainty
2.13s call     .tox/py310-test-alldeps/Lib/site-packages/astropy/coordinates/tests/test_angles.py::test_str_repr_angles_nan[input0-nan-nan deg-Angle]
2.06s setup    .tox/py310-test-alldeps/Lib/site-packages/astropy/coordinates/tests/test_angles.py::test_str_repr_angles_nan[input0-nan-nan deg-Angle]
2.00s call     .tox/py310-test-alldeps/Lib/site-packages/astropy/io/fits/tests/test_image.py::TestImageFunctions::test_open_scaled_in_update_mode

So unless we decide to skip all tests >1s or even less I would prefer to keep the 3 tests doing a sleep in io.fits.

@saimn
Copy link
Contributor

saimn commented Mar 12, 2024

And local results with a faster cpu:

2.85s call     docs/timeseries/lombscarglemb.rst::lombscarglemb.rst
2.01s call     astropy/io/fits/tests/test_image.py::TestImageFunctions::test_open_scaled_in_update_mode
1.90s call     astropy/nddata/tests/test_nduncertainty.py::test_for_leak_with_uncertainty
1.52s call     astropy/timeseries/periodograms/lombscargle_multiband/tests/test_lombscargle_multiband.py::test_unit_conversions[False-flexible]
1.50s call     astropy/timeseries/periodograms/lombscargle_multiband/tests/test_lombscargle_multiband.py::test_unit_conversions[True-flexible]
1.35s call     astropy/time/tests/test_precision.py::test_sidereal_lat_independent[mean]
1.26s teardown astropy/samp/tests/test_web_profile.py::TestWebProfile::test_main
1.12s teardown astropy/samp/tests/test_web_profile.py::TestWebProfile::test_web_profile
1.01s call     astropy/samp/tests/test_hub.py::test_SAMPHubServer_run
1.00s call     astropy/io/fits/hdu/compressed/tests/test_compressed.py::TestCompressedImage::test_open_comp_image_in_update_mode
1.00s call     astropy/io/fits/tests/test_groups.py::TestGroupsFunctions::test_open_groups_in_update_mode
...
================= 28290 passed, 323 skipped, 231 xfailed in 173.34s (0:02:53) ==================

Sure we could speedup even more by removing some tests but it seems difficult to put a limit (and what about tests that run faster but are repeated dozens of time).

@neutrinoceros
Copy link
Contributor Author

Thanks for the insight, I had no idea that 20s tests were a thing 🤯
Let me just revert the changes to io/fits completely so we can all forget about this !

@neutrinoceros neutrinoceros force-pushed the tests/rfc/mark_slow_tests branch from 40fdec3 to 0016bad Compare March 12, 2024 18:26
@neutrinoceros neutrinoceros changed the title TST: Mark slow tests in astropy/samp and reduce sleeping time in long running io/fits tests TST: Mark slow tests in astropy/samp Mar 12, 2024
Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your understanding!

@pllim pllim removed the io.fits label Mar 12, 2024
@pllim pllim enabled auto-merge March 12, 2024 18:31
@pllim pllim merged commit 2390bf8 into astropy:main Mar 12, 2024
@neutrinoceros neutrinoceros deleted the tests/rfc/mark_slow_tests branch March 12, 2024 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants