-
Notifications
You must be signed in to change notification settings - Fork 280
Description
Proposed new feature or change:
The issue
Several SimPEG tests need to create some form of synthetic data through a pseudo-random generator, usually done through the numpy.random module. One best practice for having reproducible tests is to set a random seed, with the goal of ensuring that every run of the tests suite is done against the same random values. These seeds are currently being set with the numpy.random.seed() function.
In several tests, these seeds are defined globally, outside the test functions and methods. For example:
simpeg/tests/dask/test_grav_inversion_linear.py
Lines 17 to 27 in 875fc32
| from SimPEG.potential_fields import gravity | |
| import shutil | |
| np.random.seed(43) | |
| class GravInvLinProblemTest(unittest.TestCase): | |
| def setUp(self): | |
| # Create a self.mesh | |
| dx = 5.0 |
Pytest then sets the seeds when collecting all tests, therefore setting a single global seed for all tests. This means that now the random state of the tests might change depending on the order these tests are run. If for example, someone introduces some additional tests "in the middle", it might change the random state of the following tests, with the chance of them failing.
Minimal working example
Details
For example, let's say we have two identical test files: test_first.py and test_second.py
import numpy as np
np.random.seed(5)
def test_value():
random_value = np.random.randint(low=0, high=10)
assert random_value == 3When running pytest to run all the tests inside them, we see that the second one fails:
pytest test_first.py test_second.py===================================== test session starts =====================================
platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/santi/tmp/testing-random
plugins: anyio-3.6.2
collected 2 items
test_first.py . [ 50%]
test_second.py F [100%]
========================================== FAILURES ===========================================
_________________________________________ test_value __________________________________________
def test_value():
random_value = np.random.randint(low=0, high=10)
> assert random_value == 3
E assert 6 == 3
test_second.py:8: AssertionError
=================================== short test summary info ===================================
FAILED test_second.py::test_value - assert 6 == 3
================================= 1 failed, 1 passed in 0.21s =================================
Solution
A way to solve this is to define local random seeds within each test and ditching the globally defined seeds.
Moreover, it would be nice to move from the np.random.seed() and np.random.___() functions to the new Numpy's random number generator objects.
For example, to create an array of 100 random elements with a gaussian distribution we can do the following:
import numpy as np
random_num_generator = np.random.default_rng(seed=42)
gaussian = random_num_generator.normal(loc=0.0, scale=10.0, size=100)This random number generator objects are future proof (the NumPy's RandomState is considered legacy) and it creates a more clear code, making it easy to understand which random state is being used when generating random numbers.
Related Issues and PRs
I started working on this in #1286, feel free to use that as inspiration. It would be nice to continue the work with the other test functions.