Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DOC Fix the formatting for environment variables in docs #22833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 89 additions & 79 deletions doc/computing/parallelism.rst
Original file line number Diff line number Diff line change
Expand Up @@ -163,14 +163,16 @@ Python runtime

:func:`sklearn.set_config` controls the following behaviors:

:assume_finite:
`assume_finite`
~~~~~~~~~~~~~~~

used to skip validation, which enables faster computations but may
lead to segmentation faults if the data contains NaNs.
Used to skip validation, which enables faster computations but may lead to
segmentation faults if the data contains NaNs.

:working_memory:
`working_memory`
~~~~~~~~~~~~~~~~

the optimal size of temporary arrays used by some algorithms.
The optimal size of temporary arrays used by some algorithms.

.. _environment_variable:

Expand All @@ -179,83 +181,91 @@ Environment variables

These environment variables should be set before importing scikit-learn.

:SKLEARN_ASSUME_FINITE:
`SKLEARN_ASSUME_FINITE`
~~~~~~~~~~~~~~~~~~~~~~~

Sets the default value for the `assume_finite` argument of
:func:`sklearn.set_config`.

`SKLEARN_WORKING_MEMORY`
~~~~~~~~~~~~~~~~~~~~~~~~

Sets the default value for the `working_memory` argument of
:func:`sklearn.set_config`.

`SKLEARN_SEED`
~~~~~~~~~~~~~~

Sets the seed of the global random generator when running the tests, for
reproducibility.

Note that scikit-learn tests are expected to run deterministically with
explicit seeding of their own independent RNG instances instead of relying on
the numpy or Python standard library RNG singletons to make sure that test
results are independent of the test execution order. However some tests might
forget to use explicit seeding and this variable is a way to control the intial
state of the aforementioned singletons.

`SKLEARN_TESTS_GLOBAL_RANDOM_SEED`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Controls the seeding of the random number generator used in tests that rely on
the `global_random_seed`` fixture.

All tests that use this fixture accept the contract that they should
deterministically pass for any seed value from 0 to 99 included.

If the `SKLEARN_TESTS_GLOBAL_RANDOM_SEED` environment variable is set to
`"any"` (which should be the case on nightly builds on the CI), the fixture
will choose an arbitrary seed in the above range (based on the BUILD_NUMBER or
the current day) and all fixtured tests will run for that specific seed. The
goal is to ensure that, over time, our CI will run all tests with different
seeds while keeping the test duration of a single run of the full test suite
limited. This will check that the assertions of tests written to use this
fixture are not dependent on a specific seed value.

The range of admissible seed values is limited to [0, 99] because it is often
not possible to write a test that can work for any possible seed and we want to
avoid having tests that randomly fail on the CI.

Valid values for `SKLEARN_TESTS_GLOBAL_RANDOM_SEED`:

- `SKLEARN_TESTS_GLOBAL_RANDOM_SEED="42"`: run tests with a fixed seed of 42
- `SKLEARN_TESTS_GLOBAL_RANDOM_SEED="40-42"`: run the tests with all seeds
between 40 and 42 included
- `SKLEARN_TESTS_GLOBAL_RANDOM_SEED="any"`: run the tests with an arbitrary
seed selected between 0 and 99 included
- `SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all"`: run the tests with all seeds
between 0 and 99 included. This can take a long time: only use for individual
tests, not the full test suite!

If the variable is not set, then 42 is used as the global seed in a
deterministic manner. This ensures that, by default, the scikit-learn test
suite is as deterministic as possible to avoid disrupting our friendly
third-party package maintainers. Similarly, this variable should not be set in
the CI config of pull-requests to make sure that our friendly contributors are
not the first people to encounter a seed-sensitivity regression in a test
unrelated to the changes of their own PR. Only the scikit-learn maintainers who
watch the results of the nightly builds are expected to be annoyed by this.

When writing a new test function that uses this fixture, please use the
following command to make sure that it passes deterministically for all
admissible seeds on your local machine:

Sets the default value for the `assume_finite` argument of
:func:`sklearn.set_config`.

:SKLEARN_WORKING_MEMORY:

Sets the default value for the `working_memory` argument of
:func:`sklearn.set_config`.

:SKLEARN_SEED:

Sets the seed of the global random generator when running the tests,
for reproducibility.

Note that scikit-learn tests are expected to run deterministically with
explicit seeding of their own independent RNG instances instead of relying
on the numpy or Python standard library RNG singletons to make sure that
test results are independent of the test execution order. However some
tests might forget to use explicit seeding and this variable is a way to
control the intial state of the aforementioned singletons.

:SKLEARN_TESTS_GLOBAL_RANDOM_SEED:

Controls the seeding of the random number generator used in tests that
rely on the `global_random_seed`` fixture.

All tests that use this fixture accept the contract that they should
deterministically pass for any seed value from 0 to 99 included.

If the SKLEARN_TESTS_GLOBAL_RANDOM_SEED environment variable is set to
"any" (which should be the case on nightly builds on the CI), the fixture
will choose an arbitrary seed in the above range (based on the BUILD_NUMBER
or the current day) and all fixtured tests will run for that specific seed.
The goal is to ensure that, over time, our CI will run all tests with
different seeds while keeping the test duration of a single run of the full
test suite limited. This will check that the assertions of tests
written to use this fixture are not dependent on a specific seed value.

The range of admissible seed values is limited to [0, 99] because it is
often not possible to write a test that can work for any possible seed and
we want to avoid having tests that randomly fail on the CI.

Valid values for SKLEARN_TESTS_GLOBAL_RANDOM_SEED:

- SKLEARN_TESTS_GLOBAL_RANDOM_SEED="42": run tests with a fixed seed of 42
- SKLEARN_TESTS_GLOBAL_RANDOM_SEED="40-42": run the tests with all seeds
between 40 and 42 included
- SKLEARN_TESTS_GLOBAL_RANDOM_SEED="any": run the tests with an arbitrary
seed selected between 0 and 99 included
- SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all": run the tests with all seeds
between 0 and 99 included

If the variable is not set, then 42 is used as the global seed in a
deterministic manner. This ensures that, by default, the scikit-learn test
suite is as deterministic as possible to avoid disrupting our friendly
third-party package maintainers. Similarly, this variable should not be set
in the CI config of pull-requests to make sure that our friendly
contributors are not the first people to encounter a seed-sensitivity
regression in a test unrelated to the changes of their own PR. Only the
scikit-learn maintainers who watch the results of the nightly builds are
expected to be annoyed by this.

When writing a new test function that uses this fixture, please use the
following command to make sure that it passes deterministically for all
admissible seeds on your local machine:
.. prompt:: bash $

SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest -v -k test_your_test_name
SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest -v -k test_your_test_name

:SKLEARN_SKIP_NETWORK_TESTS:
`SKLEARN_SKIP_NETWORK_TESTS`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When this environment variable is set to a non zero value, the tests
that need network access are skipped. When this environment variable is
not set then network tests are skipped.
When this environment variable is set to a non zero value, the tests that need
network access are skipped. When this environment variable is not set then
network tests are skipped.

:SKLEARN_ENABLE_DEBUG_CYTHON_DIRECTIVES:
`SKLEARN_ENABLE_DEBUG_CYTHON_DIRECTIVES`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When this environment variable is set to a non zero value, the `Cython`
derivative, `boundscheck` is set to `True`. This is useful for finding
segfaults.
When this environment variable is set to a non zero value, the `Cython`
derivative, `boundscheck` is set to `True`. This is useful for finding
segfaults.
4 changes: 4 additions & 0 deletions doc/themes/scikit-learn-modern/static/css/theme.css
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@ h1 code, h2 code, h3 code, h4 code, h5 code, h6 code {
background-color: transparent;
}

h4 .section-number, h5 .section-number, h6 .section-number {
display: none;
}

h1:hover a.headerlink,
h2:hover a.headerlink,
h3:hover a.headerlink,
Expand Down
6 changes: 3 additions & 3 deletions sklearn/tests/random_seed.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
See the documentation for the SKLEARN_TESTS_GLOBAL_RANDOM_SEED
variable for insrtuctions on how to use this fixture.

https://scikit-learn.org/dev/computing/parallelism.html#environment-variables
https://scikit-learn.org/dev/computing/parallelism.html#sklearn-tests-global-random-seed
"""
import pytest
from os import environ
Expand Down Expand Up @@ -63,7 +63,7 @@ def global_random_seed(self, request):
See the documentation for the SKLEARN_TESTS_GLOBAL_RANDOM_SEED
variable for insrtuctions on how to use this fixture.

https://scikit-learn.org/dev/computing/parallelism.html#environment-variables
https://scikit-learn.org/dev/computing/parallelism.html#sklearn-tests-global-random-seed
"""
yield request.param

Expand All @@ -77,5 +77,5 @@ def pytest_report_header(config):
"To reproduce this test run, set the following environment variable:",
f' SKLEARN_TESTS_GLOBAL_RANDOM_SEED="{config.option.random_seeds[0]}"',
"See: https://scikit-learn.org/dev/computing/parallelism.html"
"#environment-variables",
"#sklearn-tests-global-random-seed",
]