FIX Fix free-threaded failure because dictionary changed size during iteration #32264

lesteve · 2025-09-24T13:10:32Z

Fix #32087

This was seen in #32087 in different test functions but the problematic code is the same in sklearn/utils/_metadata_requests.py.

I can reproduce locally with the following (it fails ~5-10 times out of 20 on my machine on main):

for i in $(seq 20); do pytest --parallel-threads 10 --iterations 1 sklearn/tests/test_pipeline.py -k 'test_metadata_routing_for_pipeline[decision_function]'; done

Failure details

____________________________________________________________________________________________________________________________ ERROR at call of test_metadata_routing_for_pipeline[decision_function] ____________________________________________________________________________________________________________________________

method = 'decision_function'

    @pytest.mark.parametrize("method", sorted(set(METHODS) - {"split", "partial_fit"}))
    @config_context(enable_metadata_routing=True)
    def test_metadata_routing_for_pipeline(method):
        """Test that metadata is routed correctly for pipelines."""
    
        def set_request(est, method, **kwarg):
            """Set requests for a given method.
    
            If the given method is a composite method, set the same requests for
            all the methods that compose it.
            """
            if method in COMPOSITE_METHODS:
                methods = COMPOSITE_METHODS[method]
            else:
                methods = [method]
    
            for method in methods:
                getattr(est, f"set_{method}_request")(**kwarg)
            return est
    
        X, y = np.array([[1]]), np.array([1])
        sample_weight, prop, metadata = [1], "a", "b"
    
        # test that metadata is routed correctly for pipelines when requested
        est = SimpleEstimator()
>       est = set_request(est, method, sample_weight=True, prop=True)

sklearn/tests/test_pipeline.py:2233: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
sklearn/tests/test_pipeline.py:2225: in set_request
    getattr(est, f"set_{method}_request")(**kwarg)
sklearn/utils/_metadata_requests.py:1352: in func
    requests = _instance._get_metadata_request()
sklearn/utils/_metadata_requests.py:1540: in _get_metadata_request
    requests=self._get_class_level_metadata_request_values(method),
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'sklearn.tests.test_pipeline.SimpleEstimator'>, method = 'predict'

    @classmethod
    def _get_class_level_metadata_request_values(cls, method: str):
        """Get class level metadata request values.
    
        This method first checks the `method`'s signature for passable metadata and then
        updates these with the metadata request values set at class level via the
        ``__metadata_request__{method}`` class attributes.
    
        This method (being a class-method), does not take request values set at
        instance level into account.
        """
        # Here we use `isfunction` instead of `ismethod` because calling `getattr`
        # on a class instead of an instance returns an unbound function.
        if not hasattr(cls, method) or not inspect.isfunction(getattr(cls, method)):
            return dict()
        # ignore the first parameter of the method, which is usually "self"
        signature_items = list(
            inspect.signature(getattr(cls, method)).parameters.items()
        )[1:]
        params = defaultdict(
            str,
            {
                param_name: None
                for param_name, param_info in signature_items
                if param_name not in {"X", "y", "Y", "Xt", "yt"}
                and param_info.kind
                not in {param_info.VAR_POSITIONAL, param_info.VAR_KEYWORD}
            },
        )
        # Then overwrite those defaults with the ones provided in
        # `__metadata_request__{method}` class attributes, which take precedence over
        # signature sniffing.
    
        # need to go through the MRO since this is a classmethod and
        # ``vars`` doesn't report the parent class attributes. We go through
        # the reverse of the MRO so that child classes have precedence over
        # their parents.
        substr = f"__metadata_request__{method}"
        for base_class in reversed(inspect.getmro(cls)):
>           for attr, value in vars(base_class).items():
E           RuntimeError: dictionary changed size during iteration

sklearn/utils/_metadata_requests.py:1497: RuntimeError

Here is my current understanding of the problem:

when debugging what has changed in the dictionary when it fails, it's always the __slotnames__ attribute has been added
__slotnames__ attribute is added to a class when copy.deepcopy is called on an instance of this class and the metadata routing code is using copy.deepcopy.
in this metadata routing code, we only care about attributes that starts with __metadata_request__. We don't care whether __slotnames__ has been added or not.

If we want to avoid doing a copy, I guess another option would be to use a lock lock to ensure that the for loop and the copy.deepcopy can not happen at the same time but it seems a bit more complicated than making the copy.

…iteration

github-actions · 2025-09-24T13:11:34Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: e305d1a. Link to the linter CI: here}

adrinjalali · 2025-09-25T10:31:09Z

sklearn/utils/_metadata_requests.py

+            # Copy is needed with free-threaded context to avoid
+            # RuntimeError: dictionary changed size during iteration.
+            # copy.deepcopy applied on an instance of base_class adds
+            # __slotnames__ attribute to base_class.
+            base_class_items = vars(base_class).copy().items()
+            for attr, value in base_class_items:


where's this copy coming from? Where do we have the slot names (other than tags)?

Also, this only lowers the probability of encountering the issue I think, since you can still have an issue in the middle of vars(base_class).copy()?

I'd like to understand where the issue actually comes from. __slotnames__ seems to be just there:

>>> class Test: ... pass ... >>> a = Test() >>> a.b = 10 >>> import copy >>> dir(copy.deepcopy(a)) ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__firstlineno__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slotnames__', '__static_attributes__', '__str__', '__subclasshook__', '__weakref__', 'b'] >>> dir(a) ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__firstlineno__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slotnames__', '__static_attributes__', '__str__', '__subclasshook__', '__weakref__', 'b']

and not necessarily added?

To try to make my original statement more precise: __slotnames__ is not present after the class definition but is added when you do a copy (or deepcopy).

import copy class A: pass print(f'after class definition: {"__slotnames__" in vars(A)=}') copy.copy(A()) print(f'after copy: {"__slotnames__" in vars(A)=}')

Output:

after class definition: "__slotnames__" in vars(A)=False after copy: "__slotnames__" in vars(A)=True

Here is the stack-trace that shows where the addition of the __slotnames__ attribute (through deepcopy) comes from:

-> self._bootstrap_inner() /home/lesteve/micromamba/envs/py314t/lib/python3.14t/threading.py(1081)_bootstrap_inner() -> self._context.run(self.run) /home/lesteve/micromamba/envs/py314t/lib/python3.14t/threading.py(1023)run() -> self._target(*self._args, **self._kwargs) /home/lesteve/micromamba/envs/py314t/lib/python3.14t/site-packages/pytest_run_parallel/plugin.py(60)closure() -> fn(*args, **kwargs) /home/lesteve/micromamba/envs/py314t/lib/python3.14t/contextlib.py(85)inner() -> return func(*args, **kwds) /home/lesteve/dev/scikit-learn/sklearn/tests/test_pipeline.py(2234)test_metadata_routing_for_pipeline() -> est = set_request(est, "fit", sample_weight=True, prop=True) /home/lesteve/dev/scikit-learn/sklearn/tests/test_pipeline.py(2225)set_request() -> getattr(est, f"set_{method}_request")(**kwarg) /home/lesteve/dev/scikit-learn/sklearn/utils/_metadata_requests.py(1352)func() -> requests = _instance._get_metadata_request() /home/lesteve/dev/scikit-learn/sklearn/utils/_metadata_requests.py(1530)_get_metadata_request() -> requests = get_routing_for_object(self._metadata_request) /home/lesteve/dev/scikit-learn/sklearn/utils/_metadata_requests.py(1218)get_routing_for_object() -> return deepcopy(obj) /home/lesteve/micromamba/envs/py314t/lib/python3.14t/copy.py(146)deepcopy() -> rv = reductor(4) > /home/lesteve/micromamba/envs/py314t/lib/python3.14t/copyreg.py(160)_slotnames()

lesteve added 2 commits September 24, 2025 14:52

FIX Fix free-threaded failure because dictionary changed size during …

33f1d0d

…iteration

[free-threaded]

e305d1a

github-actions bot added the module:utils label Sep 24, 2025

adrinjalali reviewed Sep 25, 2025

View reviewed changes

adrinjalali approved these changes Sep 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX Fix free-threaded failure because dictionary changed size during iteration #32264

FIX Fix free-threaded failure because dictionary changed size during iteration #32264

Uh oh!

lesteve commented Sep 24, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

adrinjalali Sep 25, 2025

Uh oh!

lesteve Sep 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

FIX Fix free-threaded failure because dictionary changed size during iteration #32264

Are you sure you want to change the base?

FIX Fix free-threaded failure because dictionary changed size during iteration #32264

Uh oh!

Conversation

lesteve commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 24, 2025

✔️ Linting Passed

Uh oh!

adrinjalali Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

lesteve Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lesteve commented Sep 24, 2025 •

edited

Loading

lesteve Sep 25, 2025 •

edited

Loading