Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Making the extension contract stable through version upgrades #31728

Open
@fkiraly

Description

@fkiraly

Describe the workflow you want to enable

Currently, every time scikit-learn releases a new minor version - e.g., 1.5.0, 1.6.0, 1.7.0 - compliant extensions, e.g., custom transformers, classifiers, etc, break - specifically, referring to the API conformance as tested through check_estimator or parametrize_with_checks.

These repeated breakages in the "extender contract" contrast the stability of the usage contract, which is stable and professionally managed.

For a package like scikit-learn which means to be a standard not just for ML algorithms but also an API standard that everyone uses, this is not a good state to be in - "do not break user code" is the maxim that gets broken for power users writing extensions.

Of course maintaining downwards compatibility is not always possible, but nothing should break without a proper warning.

Describe your proposed solution

The main reason imo why this keeps happening is that scikit-learn is not using a proper pattern that ensures stability of the extension contract - and also no secondary deprecation patterns in relation to it.

A simple pattern that could improve a lot would be the "template pattern", in a specific form to separate likely changing parts such as the boilerplate (e.g., validate_data vs _validate_data and such) from the extension locus.
Reference: https://refactoring.guru/design-patterns/template-method

Examples of how this can be used to improve stability:

  • sktime, for a different API, has a separation between fit calling an internal _fit, where change-prone boilerplate is sandwiched between a stable user contract (fit) and a stable extender contract (_fit); similarly predict and _predict
  • feature-engine overrides the BaseTransformer scikit-learn extension contract with a similar pattern using super() calls in fit etc.

In particular the fit/_fit pairing that combines strategy and template pattern can be introduced easily via pure internal refactoring - I would go with that, and also add a layer of tests that checks the stability of the extension contract.

As a side effect, this would likely reduce the amount of copy-pasted boilerplate among the core estimators in scikit-learn proper as well.

Describe alternatives you've considered, if relevant

The obvious alternative is "do nothing" - which will lead to packages with extensions producing their own version of mitigating patterns for the unstable status quo.

The problem with this is that the extending packages are mutually incompatible with their extender pattern. It would be better if there is a single stable extension pattern prescribed - or at least recommended - by scikit-learn.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions