Description
Describe the workflow you want to enable
I would like to be able to set n_jobs=-1
in one place and have this take effect in any function with an n_jobs
parameter. Same for random_state
.
Perhaps there are other parameters that fit the theme: "if a user sets this in one instance, they probably want to set it in all instances".
Describe your proposed solution
Expand the accepted parameters of sklearn.set_config, update all functions to fall back to the config value if the parameter isn't passed.
This would require changing a default of None
to a sentinel (ala _NoValue
in NumPy), to allow a user to override the global config with the value None
, while still allowing the code to detect if the argument was passed.
Rudimentary mockup:
_sentinel = object()
config = {}
def resolve_arg_value(arg_name, passed_value, default_value):
if passed_value is not _sentinel:
return passed_value
if arg_name in config:
return config[arg_name]
return default_value
def do_something(random_state=_sentinel):
random_state = resolve_arg_value("random_state", random_state, None)
print(f"{random_state!s}")
do_something('Hi') # 'Hi'
do_something() # None
do_something(None) # None
config['random_state'] = 77
do_something('Hi') # 'Hi'
do_something() # 77
do_something(None) # None
config.pop('random_state')
do_something('Hi') # 'Hi'
do_something() # None
do_something(None) # None
I'll admit, having to add something like resolve_arg_value("random_state", random_state, None)
to tons of functions sounds painful, but I think for the user, being able to set and forget a random state/other params, potentially based on an environment variable would be nice.
sklearn.set_config(
random_state=None if os.environ["PROD"] else 0,
n_jobs=-1,
)
Describe alternatives you've considered, if relevant
No response
Additional context
I see that the tests get a global random seed. #22749