Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MNT: Set local random seeds in all SimPEG tests #1289

@santisoler

Description

@santisoler

Proposed new feature or change:

The issue

Several SimPEG tests need to create some form of synthetic data through a pseudo-random generator, usually done through the numpy.random module. One best practice for having reproducible tests is to set a random seed, with the goal of ensuring that every run of the tests suite is done against the same random values. These seeds are currently being set with the numpy.random.seed() function.

In several tests, these seeds are defined globally, outside the test functions and methods. For example:

from SimPEG.potential_fields import gravity
import shutil
np.random.seed(43)
class GravInvLinProblemTest(unittest.TestCase):
def setUp(self):
# Create a self.mesh
dx = 5.0

Pytest then sets the seeds when collecting all tests, therefore setting a single global seed for all tests. This means that now the random state of the tests might change depending on the order these tests are run. If for example, someone introduces some additional tests "in the middle", it might change the random state of the following tests, with the chance of them failing.

Minimal working example

Details

For example, let's say we have two identical test files: test_first.py and test_second.py

import numpy as np

np.random.seed(5)

def test_value():
    random_value = np.random.randint(low=0, high=10)
    assert random_value == 3

When running pytest to run all the tests inside them, we see that the second one fails:

pytest test_first.py test_second.py
===================================== test session starts =====================================
platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/santi/tmp/testing-random
plugins: anyio-3.6.2
collected 2 items

test_first.py .                                                                         [ 50%]
test_second.py F                                                                        [100%]

========================================== FAILURES ===========================================
_________________________________________ test_value __________________________________________

    def test_value():
        random_value = np.random.randint(low=0, high=10)
>       assert random_value == 3
E       assert 6 == 3

test_second.py:8: AssertionError
=================================== short test summary info ===================================
FAILED test_second.py::test_value - assert 6 == 3
================================= 1 failed, 1 passed in 0.21s =================================

Solution

A way to solve this is to define local random seeds within each test and ditching the globally defined seeds.

Moreover, it would be nice to move from the np.random.seed() and np.random.___() functions to the new Numpy's random number generator objects.

For example, to create an array of 100 random elements with a gaussian distribution we can do the following:

import numpy as np

random_num_generator = np.random.default_rng(seed=42)
gaussian = random_num_generator.normal(loc=0.0, scale=10.0, size=100)

This random number generator objects are future proof (the NumPy's RandomState is considered legacy) and it creates a more clear code, making it easy to understand which random state is being used when generating random numbers.

Related Issues and PRs

I started working on this in #1286, feel free to use that as inspiration. It would be nice to continue the work with the other test functions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    maintenanceMaintaining code base without actual functionality changes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions