Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add n_jobs and random_state to global config #23732

Open
@davidgilbertson

Description

@davidgilbertson

Describe the workflow you want to enable

I would like to be able to set n_jobs=-1 in one place and have this take effect in any function with an n_jobs parameter. Same for random_state.

Perhaps there are other parameters that fit the theme: "if a user sets this in one instance, they probably want to set it in all instances".

Describe your proposed solution

Expand the accepted parameters of sklearn.set_config, update all functions to fall back to the config value if the parameter isn't passed.

This would require changing a default of None to a sentinel (ala _NoValue in NumPy), to allow a user to override the global config with the value None, while still allowing the code to detect if the argument was passed.

Rudimentary mockup:

_sentinel = object()

config = {}


def resolve_arg_value(arg_name, passed_value, default_value):
    if passed_value is not _sentinel:
        return passed_value
    
    if arg_name in config:
        return config[arg_name]
    
    return default_value


def do_something(random_state=_sentinel):
    random_state = resolve_arg_value("random_state", random_state, None)
    print(f"{random_state!s}")


do_something('Hi')  # 'Hi'
do_something()  # None
do_something(None)  # None

config['random_state'] = 77

do_something('Hi')  # 'Hi'
do_something()  # 77
do_something(None)  # None

config.pop('random_state')

do_something('Hi')  # 'Hi'
do_something()  # None
do_something(None)  # None

I'll admit, having to add something like resolve_arg_value("random_state", random_state, None) to tons of functions sounds painful, but I think for the user, being able to set and forget a random state/other params, potentially based on an environment variable would be nice.

sklearn.set_config(
    random_state=None if os.environ["PROD"] else 0,
    n_jobs=-1,
)

Describe alternatives you've considered, if relevant

No response

Additional context

I see that the tests get a global random seed. #22749

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions