From d1cd4c50975f6fd1f05353452e2b64d20894c5cc Mon Sep 17 00:00:00 2001 From: Nathan Goldbaum Date: Thu, 23 May 2024 12:11:26 -0600 Subject: [PATCH 1/5] Add thread safety section to flaky test docs --- AUTHORS | 1 + changelog/12356.doc.rst | 2 ++ doc/en/explanation/flaky.rst | 6 ++++++ 3 files changed, 9 insertions(+) create mode 100644 changelog/12356.doc.rst diff --git a/AUTHORS b/AUTHORS index cc53ce10d4f..18c60750e30 100644 --- a/AUTHORS +++ b/AUTHORS @@ -289,6 +289,7 @@ Mike Lundy Milan Lesnek Miro HronĨok mrbean-bremen +Nathan Goldbaum Nathaniel Compton Nathaniel Waisbrot Ned Batchelder diff --git a/changelog/12356.doc.rst b/changelog/12356.doc.rst new file mode 100644 index 00000000000..312c26d3298 --- /dev/null +++ b/changelog/12356.doc.rst @@ -0,0 +1,2 @@ +Added a subsection to the documentation for debugging flaky tests to mention +lack of thread safety in pytest as a possible source of flakyness. diff --git a/doc/en/explanation/flaky.rst b/doc/en/explanation/flaky.rst index 41cbe847989..c86f81a7a68 100644 --- a/doc/en/explanation/flaky.rst +++ b/doc/en/explanation/flaky.rst @@ -30,6 +30,12 @@ Overly strict assertion Overly strict assertions can cause problems with floating point comparison as well as timing issues. :func:`pytest.approx` is useful here. +Pytest is not thread safe +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Pytest is not designed to be safe to use in a multithreaded environment. Multiple pytest tests cannot run simultaneously in different threads within a single Python process and pytest assumes that only one test per process is ever executing. + +It is possible to use threads within a single test, but care must be taken to avoid using primitives provided by pytest inside a multithreaded context. For example, :func:`pytest.warns` is not thread safe because it is implemented using the standard library :class:`warnings.catch_warnings` context manager, which is not thread safe. Fixtures are also not automatically thread safe and care should be taken sharing the values returned by fixtures between threads. If you are running a test that uses threads and are seeing flaky test results, do not discount the possibility that the test is implicitly using global state in pytest itself. Pytest features ^^^^^^^^^^^^^^^ From 8fc5d7098df1bb504ca55aec1265502b469941b7 Mon Sep 17 00:00:00 2001 From: Bruno Oliveira Date: Fri, 24 May 2024 07:53:46 -0300 Subject: [PATCH 2/5] Apply suggestions from code review --- doc/en/explanation/flaky.rst | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/doc/en/explanation/flaky.rst b/doc/en/explanation/flaky.rst index c86f81a7a68..8f2009b7efb 100644 --- a/doc/en/explanation/flaky.rst +++ b/doc/en/explanation/flaky.rst @@ -30,12 +30,19 @@ Overly strict assertion Overly strict assertions can cause problems with floating point comparison as well as timing issues. :func:`pytest.approx` is useful here. -Pytest is not thread safe -~~~~~~~~~~~~~~~~~~~~~~~~~ +pytest thread safety +~~~~~~~~~~~~~~~~~~~~ + +pytest is single-threaded, executing its tests always in the same thread, sequentially, never spawning any threads itself. + +Even in case of plugins which run tests in parallel, for example `pytest-xdist`_, usually work by spawns multiple *processes* and running tests in batches, without using threads. + +It is of course possible (and common) for tests and fixtures to spawn threads themselves as part of their testing workflow (for example, a fixture that starts a server thread in the background, or a test which executes production code which itself spawns threads), but some care must be taken: -Pytest is not designed to be safe to use in a multithreaded environment. Multiple pytest tests cannot run simultaneously in different threads within a single Python process and pytest assumes that only one test per process is ever executing. +* Make sure to eventually wait on any spawned threads -- for example at the end of a test, or during teardown of a fixture. +* Avoid using primitives provided by pytest (:func:`pytest.warns`, :func:`pytest.raises`, etc) from multiple threads, as they are not thread-safe. -It is possible to use threads within a single test, but care must be taken to avoid using primitives provided by pytest inside a multithreaded context. For example, :func:`pytest.warns` is not thread safe because it is implemented using the standard library :class:`warnings.catch_warnings` context manager, which is not thread safe. Fixtures are also not automatically thread safe and care should be taken sharing the values returned by fixtures between threads. If you are running a test that uses threads and are seeing flaky test results, do not discount the possibility that the test is implicitly using global state in pytest itself. +If your test suite uses threads and your are seeing flaky test results, do not discount the possibility that the test is implicitly using global state in pytest itself. Pytest features ^^^^^^^^^^^^^^^ From 7ec5055f8c6a14a61907e3f3ae880bb025d1291d Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 24 May 2024 10:54:04 +0000 Subject: [PATCH 3/5] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- doc/en/explanation/flaky.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/en/explanation/flaky.rst b/doc/en/explanation/flaky.rst index 8f2009b7efb..7ab85f13d38 100644 --- a/doc/en/explanation/flaky.rst +++ b/doc/en/explanation/flaky.rst @@ -40,9 +40,9 @@ Even in case of plugins which run tests in parallel, for example `pytest-xdist`_ It is of course possible (and common) for tests and fixtures to spawn threads themselves as part of their testing workflow (for example, a fixture that starts a server thread in the background, or a test which executes production code which itself spawns threads), but some care must be taken: * Make sure to eventually wait on any spawned threads -- for example at the end of a test, or during teardown of a fixture. -* Avoid using primitives provided by pytest (:func:`pytest.warns`, :func:`pytest.raises`, etc) from multiple threads, as they are not thread-safe. +* Avoid using primitives provided by pytest (:func:`pytest.warns`, :func:`pytest.raises`, etc) from multiple threads, as they are not thread-safe. -If your test suite uses threads and your are seeing flaky test results, do not discount the possibility that the test is implicitly using global state in pytest itself. +If your test suite uses threads and your are seeing flaky test results, do not discount the possibility that the test is implicitly using global state in pytest itself. Pytest features ^^^^^^^^^^^^^^^ From b67dab9eca95a9f1c2cc98da789a4ceeb30ea425 Mon Sep 17 00:00:00 2001 From: Bruno Oliveira Date: Fri, 24 May 2024 08:03:03 -0300 Subject: [PATCH 4/5] Update flaky.rst --- doc/en/explanation/flaky.rst | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/doc/en/explanation/flaky.rst b/doc/en/explanation/flaky.rst index 7ab85f13d38..68e2880848f 100644 --- a/doc/en/explanation/flaky.rst +++ b/doc/en/explanation/flaky.rst @@ -30,22 +30,22 @@ Overly strict assertion Overly strict assertions can cause problems with floating point comparison as well as timing issues. :func:`pytest.approx` is useful here. -pytest thread safety -~~~~~~~~~~~~~~~~~~~~ +Thread safety +~~~~~~~~~~~~~ pytest is single-threaded, executing its tests always in the same thread, sequentially, never spawning any threads itself. -Even in case of plugins which run tests in parallel, for example `pytest-xdist`_, usually work by spawns multiple *processes* and running tests in batches, without using threads. +Even in case of plugins which run tests in parallel, for example `pytest-xdist`_, usually work by spawning multiple *processes* and running tests in batches, without using multiple threads. -It is of course possible (and common) for tests and fixtures to spawn threads themselves as part of their testing workflow (for example, a fixture that starts a server thread in the background, or a test which executes production code which itself spawns threads), but some care must be taken: +It is of course possible (and common) for tests and fixtures to spawn threads themselves as part of their testing workflow (for example, a fixture that starts a server thread in the background, or a test which executes production code that spawns threads), but some care must be taken: -* Make sure to eventually wait on any spawned threads -- for example at the end of a test, or during teardown of a fixture. +* Make sure to eventually wait on any spawned threads -- for example at the end of a test, or during the teardown of a fixture. * Avoid using primitives provided by pytest (:func:`pytest.warns`, :func:`pytest.raises`, etc) from multiple threads, as they are not thread-safe. If your test suite uses threads and your are seeing flaky test results, do not discount the possibility that the test is implicitly using global state in pytest itself. -Pytest features -^^^^^^^^^^^^^^^ +Related features +^^^^^^^^^^^^^^^^ Xfail strict ~~~~~~~~~~~~ @@ -136,3 +136,6 @@ Resources * `Flaky Tests at Google and How We Mitigate Them `_ by John Micco, 2016 * `Where do Google's flaky tests come from? `_ by Jeff Listfield, 2017 + + +.. _pytest-xdist: https://github.com/pytest-dev/pytest-xdist From 7be9ad45923f84222c42fc2b18f13edca2025409 Mon Sep 17 00:00:00 2001 From: Bruno Oliveira Date: Fri, 24 May 2024 08:10:53 -0300 Subject: [PATCH 5/5] Update flaky.rst --- doc/en/explanation/flaky.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/en/explanation/flaky.rst b/doc/en/explanation/flaky.rst index 68e2880848f..cb6c3983424 100644 --- a/doc/en/explanation/flaky.rst +++ b/doc/en/explanation/flaky.rst @@ -18,7 +18,7 @@ System state Broadly speaking, a flaky test indicates that the test relies on some system state that is not being appropriately controlled - the test environment is not sufficiently isolated. Higher level tests are more likely to be flaky as they rely on more state. -Flaky tests sometimes appear when a test suite is run in parallel (such as use of pytest-xdist). This can indicate a test is reliant on test ordering. +Flaky tests sometimes appear when a test suite is run in parallel (such as use of `pytest-xdist`_). This can indicate a test is reliant on test ordering. - Perhaps a different test is failing to clean up after itself and leaving behind data which causes the flaky test to fail. - The flaky test is reliant on data from a previous test that doesn't clean up after itself, and in parallel runs that previous test is not always present