Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Sep 24, 2025

Fix #32087

This was seen in #32087 in different test functions but the problematic code is the same in sklearn/utils/_metadata_requests.py.

I can reproduce locally with the following (it fails ~5-10 times out of 20 on my machine on main):

for i in $(seq 20); do pytest --parallel-threads 10 --iterations 1 sklearn/tests/test_pipeline.py -k 'test_metadata_routing_for_pipeline[decision_function]'; done
Failure details
____________________________________________________________________________________________________________________________ ERROR at call of test_metadata_routing_for_pipeline[decision_function] ____________________________________________________________________________________________________________________________

method = 'decision_function'

    @pytest.mark.parametrize("method", sorted(set(METHODS) - {"split", "partial_fit"}))
    @config_context(enable_metadata_routing=True)
    def test_metadata_routing_for_pipeline(method):
        """Test that metadata is routed correctly for pipelines."""
    
        def set_request(est, method, **kwarg):
            """Set requests for a given method.
    
            If the given method is a composite method, set the same requests for
            all the methods that compose it.
            """
            if method in COMPOSITE_METHODS:
                methods = COMPOSITE_METHODS[method]
            else:
                methods = [method]
    
            for method in methods:
                getattr(est, f"set_{method}_request")(**kwarg)
            return est
    
        X, y = np.array([[1]]), np.array([1])
        sample_weight, prop, metadata = [1], "a", "b"
    
        # test that metadata is routed correctly for pipelines when requested
        est = SimpleEstimator()
>       est = set_request(est, method, sample_weight=True, prop=True)

sklearn/tests/test_pipeline.py:2233: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
sklearn/tests/test_pipeline.py:2225: in set_request
    getattr(est, f"set_{method}_request")(**kwarg)
sklearn/utils/_metadata_requests.py:1352: in func
    requests = _instance._get_metadata_request()
sklearn/utils/_metadata_requests.py:1540: in _get_metadata_request
    requests=self._get_class_level_metadata_request_values(method),
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'sklearn.tests.test_pipeline.SimpleEstimator'>, method = 'predict'

    @classmethod
    def _get_class_level_metadata_request_values(cls, method: str):
        """Get class level metadata request values.
    
        This method first checks the `method`'s signature for passable metadata and then
        updates these with the metadata request values set at class level via the
        ``__metadata_request__{method}`` class attributes.
    
        This method (being a class-method), does not take request values set at
        instance level into account.
        """
        # Here we use `isfunction` instead of `ismethod` because calling `getattr`
        # on a class instead of an instance returns an unbound function.
        if not hasattr(cls, method) or not inspect.isfunction(getattr(cls, method)):
            return dict()
        # ignore the first parameter of the method, which is usually "self"
        signature_items = list(
            inspect.signature(getattr(cls, method)).parameters.items()
        )[1:]
        params = defaultdict(
            str,
            {
                param_name: None
                for param_name, param_info in signature_items
                if param_name not in {"X", "y", "Y", "Xt", "yt"}
                and param_info.kind
                not in {param_info.VAR_POSITIONAL, param_info.VAR_KEYWORD}
            },
        )
        # Then overwrite those defaults with the ones provided in
        # `__metadata_request__{method}` class attributes, which take precedence over
        # signature sniffing.
    
        # need to go through the MRO since this is a classmethod and
        # ``vars`` doesn't report the parent class attributes. We go through
        # the reverse of the MRO so that child classes have precedence over
        # their parents.
        substr = f"__metadata_request__{method}"
        for base_class in reversed(inspect.getmro(cls)):
>           for attr, value in vars(base_class).items():
E           RuntimeError: dictionary changed size during iteration

sklearn/utils/_metadata_requests.py:1497: RuntimeError

Here is my current understanding of the problem:

  • when debugging what has changed in the dictionary when it fails, it's always the __slotnames__ attribute has been added
  • __slotnames__ attribute is added to a class when copy.deepcopy is called on an instance of this class and the metadata routing code is using copy.deepcopy.
  • in this metadata routing code, we only care about attributes that starts with __metadata_request__. We don't care whether __slotnames__ has been added or not.

If we want to avoid doing a copy, I guess another option would be to use a lock lock to ensure that the for loop and the copy.deepcopy can not happen at the same time but it seems a bit more complicated than making the copy.

Copy link

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: e305d1a. Link to the linter CI: here

Comment on lines +1497 to +1502
# Copy is needed with free-threaded context to avoid
# RuntimeError: dictionary changed size during iteration.
# copy.deepcopy applied on an instance of base_class adds
# __slotnames__ attribute to base_class.
base_class_items = vars(base_class).copy().items()
for attr, value in base_class_items:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where's this copy coming from? Where do we have the slot names (other than tags)?

Also, this only lowers the probability of encountering the issue I think, since you can still have an issue in the middle of vars(base_class).copy()?

I'd like to understand where the issue actually comes from. __slotnames__ seems to be just there:

>>> class Test:
...     pass
...     
>>> a = Test()
>>> a.b = 10
>>> import copy
>>> dir(copy.deepcopy(a))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__firstlineno__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slotnames__', '__static_attributes__', '__str__', '__subclasshook__', '__weakref__', 'b']
>>> dir(a)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__firstlineno__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slotnames__', '__static_attributes__', '__str__', '__subclasshook__', '__weakref__', 'b']

and not necessarily added?

Copy link
Member Author

@lesteve lesteve Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To try to make my original statement more precise: __slotnames__ is not present after the class definition but is added when you do a copy (or deepcopy).

import copy

class A: pass

print(f'after class definition: {"__slotnames__" in vars(A)=}')
copy.copy(A())
print(f'after copy: {"__slotnames__" in vars(A)=}')

Output:

after class definition: "__slotnames__" in vars(A)=False
after copy: "__slotnames__" in vars(A)=True

Here is the stack-trace that shows where the addition of the __slotnames__ attribute (through deepcopy) comes from:

-> self._bootstrap_inner()
  /home/lesteve/micromamba/envs/py314t/lib/python3.14t/threading.py(1081)_bootstrap_inner()
-> self._context.run(self.run)
  /home/lesteve/micromamba/envs/py314t/lib/python3.14t/threading.py(1023)run()
-> self._target(*self._args, **self._kwargs)
  /home/lesteve/micromamba/envs/py314t/lib/python3.14t/site-packages/pytest_run_parallel/plugin.py(60)closure()
-> fn(*args, **kwargs)
  /home/lesteve/micromamba/envs/py314t/lib/python3.14t/contextlib.py(85)inner()
-> return func(*args, **kwds)
  /home/lesteve/dev/scikit-learn/sklearn/tests/test_pipeline.py(2234)test_metadata_routing_for_pipeline()
-> est = set_request(est, "fit", sample_weight=True, prop=True)
  /home/lesteve/dev/scikit-learn/sklearn/tests/test_pipeline.py(2225)set_request()
-> getattr(est, f"set_{method}_request")(**kwarg)
  /home/lesteve/dev/scikit-learn/sklearn/utils/_metadata_requests.py(1352)func()
-> requests = _instance._get_metadata_request()
  /home/lesteve/dev/scikit-learn/sklearn/utils/_metadata_requests.py(1530)_get_metadata_request()
-> requests = get_routing_for_object(self._metadata_request)
  /home/lesteve/dev/scikit-learn/sklearn/utils/_metadata_requests.py(1218)get_routing_for_object()
-> return deepcopy(obj)
  /home/lesteve/micromamba/envs/py314t/lib/python3.14t/copy.py(146)deepcopy()
-> rv = reductor(4)
> /home/lesteve/micromamba/envs/py314t/lib/python3.14t/copyreg.py(160)_slotnames()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

⚠️ CI failed on Linux_free_threaded.pylatest_free_threaded (last failure: Sep 25, 2025) ⚠️
2 participants