Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RFC: Breaking Changes for Version 2 #25776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adrinjalali opened this issue Mar 7, 2023 · 35 comments
Closed

RFC: Breaking Changes for Version 2 #25776

adrinjalali opened this issue Mar 7, 2023 · 35 comments
Labels
Breaking Change Issue resolution would not be easily handled by the usual deprecation cycle. RFC

Comments

@adrinjalali
Copy link
Member

A while ago we talked about the possibility of a version 2 with breaking changes. Specifically, this came up in the context of SLEP006: metadata routing, but there are other things we have wanted to break which we can do all in a single release. In this issue we can talk about the possibility of such a release, and maybe a few things we'd like to include in it.

I'll try to summarize the challenges of keeping backward compatibility (BC) in the context of SLEP006:

  • MetaEstimator(Estimator()).fit(X, y, sample_weight) will raise according to SLEP006 since sample_weight is not requested by any object. For backward compatibility, during the transition period, it'll only warn and assume sample_weight is requested by Estimator().
  • What should the behavior of GridSearchCV(Estimator(), scorer=scorer).fit(X, y, sample_weight) be during the transition period? If we keep backward compatibility, it should warn (whatever the warning be), route sample_weight to Estimator, but not to the scorer. And that's only the case because we're keeping BC. From the user's perspective, it's very weird.
  • Pipeline ATM in certain methods like fit, fit_transform, predict, routes *_params to the fit/etc of the last estimator only. And decision_function and transform, score_samples have no routing at all. Keeping this BC, proves to be challenging, and a bit nasty. ref: ENH SLEP006: add routing to Pipeline #24270
  • In quite a few meta-estimators, we check if the sub-estimator has sample_weight in the signature of fit, and we pass the given sample weight to it if that's the case, and not if that's not the case. With routing, we would have a much better idea of when to pass sample weight and when not, but if we're keeping BC, it's really challenging to see if we should forward sample weights or not. Both AdaBoost (FEAT add routing to AdaBoost's fit #24026) and Bagging (SLEP006: Metadata routing for bagging #24250) have been very challenging, to the point that we might end up having to check if a sub-estimator is a meta-estimator itself or not, and check signature of sub-estimator's fit, and check the given routing info to keep BC.

I'm pretty sure there are other issues we haven't foreseen, and they'll come up as we work on this project. We thought it'd be easy once we get to this point, but BC has proven to make it very tricky and has stalled the progress.

All of the above is adding substantial complexity which is probably error prone, and buggy. We have had to spend quite a bit of time every time we go back to review those bits to understand why we've done what we've done. Even as a temporary solution they're not very maintainable.

My proposal here is to have a version 2 which is not BC with previous releases in certain ways, and target a few breaking changes for it. We can aim to have the release in October 2023 or April/May 2024 as the version 2.

cc @scikit-learn/core-devs

@github-actions github-actions bot added the Needs Triage Issue requires triage label Mar 7, 2023
@adrinjalali adrinjalali added RFC Breaking Change Issue resolution would not be easily handled by the usual deprecation cycle. and removed Needs Triage Issue requires triage labels Mar 7, 2023
@thomasjpfan
Copy link
Member

I prefer not to introduce multiple breaking changes in a single release. This would make it much harder to update to v2. If we need to release a breaking v2, I prefer to only break for SLEP006 and nothing else. From reviewing the PRs in SLEP006, I agree it is much clearer to release SLEP006 with breaking changes.

Assuming we adopt SLEP006 with breaking changes in v2, my remaining concerns are:

  1. Silently changing behavior. Are there cases where existing code works, but will have new behavior under SLEP006 without raising an error?
  2. Migration guide for third party estimator developers. Given that v2 is a breaking change, it will take a little longer for users to go to v2, so it's reasonable to say that third party estimator developers would want to support v1 and v2 for a little while. How much work does a third party developer need to do to make sure their code works with v1 and v2 ?

@adrinjalali
Copy link
Member Author

Silently changing behavior. Are there cases where existing code works, but will have new behavior under SLEP006 without raising an error?

AFAICT, no. All previously magically working code will raise and expect user to explicitly specify routing requests.

Migration guide for third party estimator developers. Given that v2 is a breaking change, it will take a little longer for users to go to v2, so it's reasonable to say that third party estimator developers would want to support v1 and v2 for a little while. How much work does a third party developer need to do to make sure their code works with v1 and v2 ?

For third party estimators, if:

  • they inherit from BaseEstimator
  • their required/accepted parameters for their fit/predict/etc methods are explicitly specified in those methods rather than *args and **kwargs

there is no work to be done. Everything works out of the box.

For meta-estimators, they can check our version, and do the old routing or the new one, but I'm not sure if we want to recommend that. Since their user's experience would depend on what sklearn they have installed. They probably are better off to do a breaking change release the same way as we do.

For people who don't want to inhering from BaseEstimator, they're kinda on their own. But if I have the time, I might submit a PR to one of them to have an example of how it can be done. They'd need to vendor different versions of our routing logic.

@glemaitre
Copy link
Member

they can check our version, and do the old routing or the new one, but I'm not sure if we want to recommend that

Having a library that silently behaves differently depending on the version of the dependence is a killer. We should probably make explicit recommendations.

They'd need to vendor different versions of our routing logic.

From some experience in imbalanced-learn, it is quite nice when you can vendor a framework by copy-pasting a single file. Here, since there are some breaking changes involved and due to the previous point, I would think that you can indeed import from scikit-learn directly instead of vendoring the file.

IMO, code-wise it is worth making the breaking change. As a user, I don't think this is a big issue to start raising an error instead of a BC warning because it is mostly linked with cases where careful methodological attention should be made regarding the delegation of the weights.

Thus, my only concern is third-party libraries: if a package starts to pin for sklearn>=2.0 (because we advocate for breaking changes) and has dependencies that also depend on scikit-learn but that did not make the switch to the new API, then you end up with an awful user experience because you cannot install the package. I don't know how to make this transition as smooth as possible, indeed.

@jeremiedbb
Copy link
Member

Does it necessarily have to be a breaking change ? I'm thinking that we could have a long running deprecation and raise a future warning where the behavior will break in 2.0, even if it lasts longer than 2 releases. Obviously users will start having tons of warnings, but we could add a global config option to disable all these warnings.

@adrinjalali
Copy link
Member Author

Thus, my only concern is third-party libraries: if a package starts to pin for sklearn>=2.0 (because we advocate for breaking changes) and has dependencies that also depend on scikit-learn but that did not make the switch to the new API, then you end up with an awful user experience because you cannot install the package. I don't know how to make this transition as smooth as possible, indeed.

That to me is less of a concern. For those packages, pip will choose a pre version 2 sklearn to install. This is a very normal thing to me. Users can't immediately use a new tensorflow version, or a python version, or any other dependency when there is a new version, and have to wait till the packages they use support this new version.

But to make the transition easier, we can certainly help third parties with the adaptation. Again, this only affects meta estimators and people who don't inherit from BaseEstimator.

Does it necessarily have to be a breaking change ?

@jeremiedbb the whole point of this issue is that from our experience and the reasons I outlined above, it's pretty impossible/intractable/unmaintainable to have a non breaking implementation, and in many cases the two together just don't make sense.

@jeremiedbb
Copy link
Member

I meant "silent breaking change"

and in many cases the two together just don't make sense.

That's why did not say we should have both at the same time. I just propose to warn now when users are, let's say passing sample_weight to a meta_estimator. The warning would just say "you won't be able to do that in the future, checkout <some link pointing to infos on the future api>". It would reach a lot more users and third party developers imo and give them more time to get prepared and understand how they'll need to change their code base.

@adrinjalali
Copy link
Member Author

I see. I'd say the effort of doing that and figuring out where exactly we should be warning is not worth it, and most users can't do anything about it anyway. It's only to warn third party developers. We better open an issue in their corresponding repositories and warn them that this is coming.

@adrinjalali
Copy link
Member Author

So it seems we don't have a disagreement on doing a breaking release for SLEP006, we might talk about other things to include later. I'll prepare a PR to simplify the code accordingly then.

@thomasjpfan
Copy link
Member

We better open an issue in their corresponding repositories and warn them that this is coming.

Opening an issue in repositories that depends on scikit-learn does not scale well.

Concretely, here are the pain points I see with a breaking transition:

  1. Supporting SLEP006 will be harder without inheriting from BaseEstimator.
  2. Libraries that want to support v1 and v2 will have a hard time. Ultimately, they will have to decide between bounding scikit-learn by <2.0 or >=2.0. (Both are valid decisions)
  3. Installing multiple libraries will be harder because the bounds for scikit-learn can be incompatible, i.e. <2.0 and >=2.0.

NumPy and Pandas have recently used feature flags to trigger backward incompatible behavior to allow third party libraries to adopt to the changes and to teach them about the behavior change. With a feature flag, we maintain the pre-SLEP006 and post-SLEP006 routing behavior, but we do not go through the deprecation cycle. We state that in v2.0, the feature flag will do nothing and we will only use SLEP006 behavior.

What does everyone think about using a feature flag?

@adrinjalali
Copy link
Member Author

@thomasjpfan how do you propose we maintain both routing behaviors? Our implementations are changing signatures of our main methods, as well as introducing routing. I'm not sure how your suggestion would be implemented here, and how it would reduce the complexity of the implementation. Would it mean:

	if use_slep6:
		new_routing()
	else:
		use_old_code()

everywhere in our meta-estimators? You do realize the code won't be as clean as the snippet above, right? 😁

@lorentzenchr
Copy link
Member

Let us acknowledge that

  1. we already release breaking changes, but usually with our 2 release deprecation cycle; and
  2. there are major improvements that cannot reasonably be deprecated, like SLEP006.

Ideally, the SLEP would have discussed that, but it popped up later during implementation - which is fine.

I think that we explicitly need to decide the following:
Do we want to make SLEP006 compatibility (of 3rd party objects) optional or mandatory?

My opinion is clearly mandatory.

The available options then depend on the answer. Out of my head they are:

  1. Do a breaking change without deprecation. Therefore call it v2.0
  2. Make a new lightweight package for BaseEstimator, v1 as is, v2 with SLEP006.
  3. Opt-in option (via global config?) to choose within scikit-learn if SLEP006 is enabled
  4. Please add your additional option

Assuming the above summary has no major flaw or missing pieces, I am leaning towards 1 and maybe 2.

@adrinjalali
Copy link
Member Author

Supporting SLEP006 will be harder without inheriting from BaseEstimator.

With all the modern machinery we have in BaseEstimator, we really shouldn't worry about supporting people who don't want to inherit from it.

Do a breaking change without deprecation. Therefore call it v2.0

+1

Make a new lightweight package for BaseEstimator, v1 as is, v2 with SLEP006.

I think this would be a good idea anyway, It would help folks to add that as a dependency if they're reluctant to add scikit-learn as a dependency. But I think of that as orthogonal to this discussion, they would need to fix their dependency to scikit-learn-core instead of scikit-learn. I think it's a good idea anyway.

Opt-in option (via global config?) to choose within scikit-learn if SLEP006 is enabled

I can't imagine a feasibly good implementation for this.

@lorentzenchr
Copy link
Member

Do we need a SLEP for making a BaseEstimator / scikit-learn-core package?

@betatim
Copy link
Member

betatim commented Mar 15, 2023

I wasn't part of the team when the decision was made to go with SLEP6, so I will make general comments.

In general having a breaking release (without deprecation cycles) is a major pain for everyone involved. Including the maintainers, though they get a bit of a benefit because implementation is often less complex. A major source of the pain is that the breakage will catch lots of people by surprise. For example a third part library will usually only specify scikit-learn or scikit-learn>0.19 as a dependency. Hardly any add the <2 specifier. You can have opinions on whether they should add this upper limit or not, but in practice today nearly no one has it. This means once scikit-learn v2 exists, all these packages will break until there is a new release of that package (with an upper version specified). And even then you will still get failures in cases where people install a third party scikit-learn package that is a dependency of some other package, as some of those packages will say "we depend on third-party-sklearn=1.2.3", where that third-party-sklearn release is one that doesn't contain an upper version limit in its dependencies (You and I have opinions on whether you should have dependencies like this, but that doesn't matter because people do this :-/).

Given enough time, this will sort itself out. But in the mean time people will arrive in the scikit-learn repo with questions and complaints about things not working. These need dealing with, and this is how the maintainers that hoped to save themselves work by making a breaking release without deprecation cycle end up having additional work/pain.

How long is the time frame? How many people will be effected? I don't know. From my experience with Jupyter and mybinder.org makes me say "much longer than you think and many more (vocal) people than you think".

This means I think it is worth trying to estimate the number of impacted packages based on code search/inspection on GitHub. Yes, opening an issue in third-party repos doesn't scale, but maybe we are lucky and there aren't that many? Similarly dealing with complaints from users of third party packages doesn't scale either :-/

I like the idea of feature flags. It is a bit fiddly and for a few releases will lead to duplicated code, but it can be a good way to transition. Python is flexible enough so that you can have from sklearn.experimental import sleep6 modify what gets imported when the user runs from sklearn.foo import BarClassifier (to address the function signature concern).

TL;DR: IMHO we need some data on how many people will be surprised with breakage by this particular backwards incompatible change without deprecation cycles.


As a alternative for Christians "your alternative here" list: just because a SLEP has been accepted doesn't mean it has to be implemented. If since accepting a SLEP people realised that implementing it will be much more costly than they thought, it can be wise to just stop working on it/reconsider. Otherwise known as the oh so popular sunk cost fallacy.

@adrinjalali
Copy link
Member Author

I don't think the situation is as dire as we think.

This doesn't affect 99% of people who never pass sample_weight or any other metadata anywhere, and even if they do, it'd be affecting only the users who actually use our meta-estimators, which is sadly not that many. And the ones who do, are advanced users who know how to fix the issues.

.... from sklearn.experimental import sleep6 ...

I really think y'all are underestimating the complexity this brings. The only feasible way I see we can do that, is to copy the whole sklearn folder and have two versions, freeze the old version and never touch it again, and only apply improvements to the new folder. Otherwise, the required code to implement this is completely all over the place, i.e. in every single meta-estimator.

Also, I do understand the challenges for the users when it comes to breaking changes, but I don't really see a feasible way forward at this point which doesn't involve that, and this is keeping us behind. I like @lorentzenchr 's expression of "5 years too late".

On a personal level, I've been working on this topic for over 4 years now, and I have been under the impression that we care about this. and I think it'd be a pity to shelf this feature.

@betatim
Copy link
Member

betatim commented Mar 16, 2023

.... from sklearn.experimental import sleep6 ...

I really think y'all are underestimating the complexity this brings.

I'd put it the other way around: I might be overestimating the complexity of the change, and as a result thinking that doing something complicated like modifying code through an import statement is "a good trade-off". The reason I am thinking that this change is a complex one, both in terms of code and for users, is that I've read the SLEP, some of the discussions, peaked at the related PRs and left confused/thinking "not sure I can wrap my head around this".

Also, I do understand the challenges for the users when it comes to breaking changes, but I don't really see a feasible way forward at this point which doesn't involve that, and this is keeping us behind. I like @lorentzenchr 's expression of "5 years too late".

On a personal level, I've been working on this topic for over 4 years now, and I have been under the impression that we care about this. and I think it'd be a pity to shelf this feature.

It is hard, especially because so much work and thought has gone into it. For me as a bystander it is amazing for how long this work was under discussion and then how long people have worked on it. It also means that as an outsider I look at it as a project with a huge cost and unclear benefits. But I think a large part of the unclearness is because I've not been involved a lot. There are also plenty of things that I've bet against in the past that turned out to be winners, so what do I know about the future.

The reason I wanted to explicitly bring up the idea of "stop working on this" is that I see the same warning signs I've learnt to look for when out and about in the mountains. They include things like "we've invested so much" (e.g. we spent a day travelling here, then paid for a hotel, permits, ...), "we are nearly there, so let's push" (we've done 95% of the ascent and there are only 50 meters left), "let's make an exception just this one time" (don't ski slopes over 30 degrees when the avalanche risk is Considerable, but today is different!), lots of people in the group going quiet ("I had a bad feeling, but no one else said anything, so I didn't say anything either"), etc. In a lot of cases where people end up in trouble, it is a factor like those that lead to it. The only way I know to help deal with this "get-there-itis" is that you explicitly talk about it. Even when you don't think you are in a situation where you are effected by it. To make it a normal thing that you always talk about. People don't get in trouble 100m from the start, people get in trouble 100m from the top.

So my goal is not to suggest to stop this, but to have it as an explicit option that is discussed and then decided upon.

@lorentzenchr
Copy link
Member

@betatim What I learned is that I should go in the mountains with you 🗻

@adrinjalali To better weigh the consequences of a decision, could you summarize (or point to if already existing) what a breaking change for SLEP006 will actually break, within scikit-learn users and for 3rd party library developers?

@adrinjalali
Copy link
Member Author

Things which will NOT break:

Estimator().fit(X, y, sample_weight)

for all estimators (not meta-estimators), no matter whether they inhering from BaseEstimator or not.

Also, no code will break if there are no metadata (e.g. sample_weight passed to fit). So a complex pipeline like this example still works.

What breaks:

GridSearchCV(estimator=Estimator(), param_grid=grid).fit(X, y, sample_weight)

It will raise with "Metadata (sample_weight) is provided but not explicitly requested", and it would need to be fixed as:

GridSearchCV(
	estimator=Estimator().set_fit_request(sample_weight=True),
	param_grid=grid
).fit(X, y, sample_weight)

For Pipeline, passing metadata to steps completely changes:

pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())])
pipe.fit(X_train, y_train, svc__sample_weight=sw)

will raise. Instead, the code becomes:

pipe = Pipeline(
    [
        ("scaler", StandardScaler().set_fit_request(sample_weight=False)),
        ("svc", SVC().set_fit_request(sample_weight=True)),
    ]
)
pipe.fit(X_train, y_train, sample_weight=sw)

For other meta-estimators, it's case by case, for example for AdaBoostClassifier, it'll be:

AdaBoostClassifier(SVC()).fit(X, y)

should probably now raise, and the user needs to say:

AdaBoostClassifier(SVC().set_fit_request(sample_weight=True)).fit(X, y)

to support passing sample_weight to the learner. Note that this is very specific to AdaBoostClassifier for the way it works.

Another meta-estimator such as BaggingClassifier, would also break, but would be a lot more consistent, since right now we check for the signature of fit of the given estimator to BaggingClassifier, and that decides whether we should pass sample_weight to it or not, and with this breaking change, the user needs to define the request values, and that's used to route sample_weight instead.

@adrinjalali
Copy link
Member Author

Regarding the discussions around this issue, there are in different categories:

  • what do we think of the syntax (also raised by @betatim )

    • These issues were discussed in detail during the development of the SLEP. I'm happy to build on top of those discussions, but I don't think it's fruitful for us to go back to the basics and discuss weather we want this feature or not, or if we want to change the overall syntax. A reasonable path forward would be to discuss pain points of the syntax, and find convenient methods/context managers/ways to make things easier for some usual usecases. An example is to make a Pipeline pass sample_weight around easier, which we can achieve with a context manager for instance.
  • what the implications are for third party libraries (raised by @thomasjpfan and @glemaitre )

    • This is something which we've figured during the implementation of the SLEP, and at this point it seems some libraries would have to do a breaking change release to support sklearn>=2.0. We still would like to make it as easy as we can for those libraries to migrate, and for that we do two things:
      • whenever they get an error, they should have very clear ways to follow and fix their library. Basically, good documentation. Luckily through different re-implementations we've gotten to the point where implementing this is actually easy with very few lines of code.
      • proactively try and fix issues with widely used libraries to make most users' lives easier.

Does this sound like a good summary of people's worries and concerns?

@adrinjalali
Copy link
Member Author

Although one thing we could consider would be if we're doing a breaking change release, would we break things in a different way for this? As in, we wanted to do a smooth transition kinda solution, but if we are breaking things anyway, would we do a different thing? @jnothman reminded me of this.

@adrinjalali
Copy link
Member Author

re: non-verbose declaration, we have this issue: #23928

@lorentzenchr
Copy link
Member

@adrinjalali Thanks for the good and precise summary 🙏

@amueller
Copy link
Member

I like @betatim's point of being explicit about the trade-off of doing it vs not doing it. @adrinjalali gave a great summary of one aspects of the downsides of doing it (breaking code). I am more concerned about the added complexity to the codebase. Hearing that it would be near-impossible to hide behind a feature flag makes me a bit queasy about the complexity of the implementation.

@adrinjalali
Copy link
Member Author

@amueller the reason it's not easy to enable this with a feature flag is not the added complexity. It's rather due to the inconsistencies and bugs we have in the codebase right now when it comes to metadata routing. With the current implementation, this is how you'd implement a meta-estimator with the routing:

class MetaRegressor(MetaEstimatorMixin, RegressorMixin, BaseEstimator):
    def __init__(self, estimator):
        self.estimator = estimator

    def fit(self, X, y, **fit_params):
        params = process_routing(self, "fit", fit_params)
        self.estimator_ = clone(self.estimator).fit(X, y, **params.estimator.fit)

    def get_metadata_routing(self):
        router = MetadataRouter(owner=self.__class__.__name__).add(
            estimator=self.estimator, method_mapping="one-to-one"
        )
        return router

I would argue the complexity added above is rather minimal.

However, while working on this project, I've realized in man of our meta-estimators we're not consistent with the way we route metadata. Sometimes that means the wrong logic inside the meta-estimator, sometimes it means simply not having it ATM which would mean the fix would change the signature of the method, e.g. adding a props or fit_params to a fit or transrorm, etc.

In this case, what would it mean to do things with a feature flag? It would mean keeping two routing logics for quite a while and maintain both, and in some cases implementing new inconsistent behavior for cases where we're adding parameters to existing methods.

The feature flag idea is, however, still doable, compared to staying backward compatible. But that means an if/else statement pretty much in every meta-estimator. Is that what we wanna do? I would rather not I think.

@lorentzenchr
Copy link
Member

It's rather due to the inconsistencies and bugs we have in the codebase right now when it comes to metadata routing.

Just a stupid question: Would it make sense to first fix those bugs before introducing SLEP006?

@adrinjalali
Copy link
Member Author

@lorentzenchr I personally don't want to go through the whole meta-estimator codebase twice for this 😁 And I think doing so, would probably add a year or two to the whole timeline lol.

@thomasjpfan
Copy link
Member

thomasjpfan commented Mar 30, 2023

For the feature flag option, I was thinking of using the SLEP006 routing mechanism to define the backward compatible routing. For example, when the SLEP006 feature flag is off, AdaBoostClassifier will assume that the base_estimator requested the sample weights without looking at base_estimator.

With the SLEP006 feature flag off, Pipeline translates the dunder syntax from fit and assume every estimator that was explicitly passed a sample_weight in pipeline.fit is requesting it.

Concretely, I am thinking of this design:

class MyMetaEstimator(...):

    def fit(self, X, y, **fit_params):
        if SLEP006_ON:
            params = process_routing(self, "fit", fit_params)
        else:
            # define params using backward compatible semantics
            params = ...

        self.estimator_ = clone(self.estimator).fit(X, y, **params.estimator.fit)

(There is another design where each meta-estimator defines it's own backward compatible MetadataRouter, but I suspect writing params is easier with the pre-SLEP006 semantics.)

@betatim
Copy link
Member

betatim commented Mar 31, 2023

I can't make it to the dedicated meeting today (🛩️). After some thinking and discussing I think my main worry is about making the "I don't know either scikit-learn, please just make it happen, you are the expert after all" user story harder for little gain. To personify the project for a moment: scikit-learn is an expert in machine-learning, people do and should delegate decisions to it. Delegation is convenient and for the vast majority of people the outcomes are better (day trading vs passive index investing, wiring your own house vs a professional electrician doing it, using Python to do science vs C++).

For me this means that as a user I've expressed everything that there is to express with

est = AdaBoostClassifier(LogisticRegression())
est.fit(X, y, sample_weight=sw)

Most likely the average user doesn't even know that there are options and what the trade-offs between them are. As an expert who does know, most of the time I am not using this in "expert mode" either, so it is inconvenient to have to make choices.

I also believe that if there are sample weights involved then everything in the "chain" (be that a Pipeline or an estimator that does internal CV or an estimator that uses a metric to do early stopping) has to support weights. Here "supports" means "does something sensible when weights are involved". This sensible thing could be "take them into account when re-sampling samples" (say for some kind of stratified sampler) or "ignore the weights" (say for something like MinMaxScaler).

Something to avoid are cases like #25906. To me this is a bug. If a user passes sample weights in the metric inside LogisticRegressionCV has to support that. Otherwise the user selected an invalid combination and should receive an exception.

This does not mean we should forbid/make it impossible for experts to re-jig things so that they can use LogisticRegressionCV with weights for the estimator but ignore them during metric calculation. However this is the case where extra effort from the user should be required.

I wrote more about this in #23928 (comment)

It feels like introducing "your chain of tools has to be consistent when it comes to using/not using sample weights" is something that can be done in a backwards incompatible with deprecation cycle way. For additional sample aligned properties like protected group membership I think we have less constraints from current user's code because AFAIK it is virtually impossible to do this now anyway. This means I am less worried about requiring explicit routing setup for these use-cases. It would still be nice if there was a way to make the "happy path" work without much extra code from the user.

Most users have no sample weights, a few less have sample weights, even fewer have additional sample aligned properties and nearly no users are experts-doing-expert things. The defaults and convenience of using scikit-learn should reflect that (least amount of extra code for no weights, most extra code and "know what you are doing" for the experts doing expert things).

@jeremiedbb
Copy link
Member

jeremiedbb commented Mar 31, 2023

I also believe that if there are sample weights involved then everything in the "chain" (be that a Pipeline or an estimator that does internal CV or an estimator that uses a metric to do early stopping) has to support weights. Here "supports" means "does something sensible when weights are involved". This sensible thing could be "take them into account when re-sampling samples" (say for some kind of stratified sampler) or "ignore the weights" (say for something like MinMaxScaler).

To me it also feels like a bug that not everything in the chain has to be sample weight aware. I think that if we want to handle sample weigths properly and consistently in scikit-learn, we first need to be clear about what is a sample weight. There are already discussions about that (I can't retrieve), but the way I see sample weights is that if an observation point has a weight w, then everything should act as if there were w copies of this point in the dataset.

Something to avoid are cases like #25906. To me this is a bug. If a user passes sample weights in the metric inside LogisticRegressionCV has to support that. Otherwise the user selected an invalid combination and should receive an exception.

Thinking about sample weights like I described above, any step that silently ignores sample weights is a bug in my mind. Thus, even if adding support for sample weights to an estimator changes the behavior when it's used in a meta estimator, I consider it a bug fix. This is why I tend to think that meta-estimators should raise if any of the step does not support sample weights.

I understand however that metadata routing is not just about sample weights and that all meta-estimators have different ways of doing things for technical reasons, and I haven't spent enough time on this subject, unlike @adrinjalali :), to have a strong opinion on how to design the best API to handle that, so my thought are not very deep.

@ogrisel
Copy link
Member

ogrisel commented Mar 31, 2023

I agree with Tim's point that we should strive to have scikit-learn "make it easy/simple to use ML correctly". I am not 100% sure what is the correct way to handle sample_weight in complex pipelines (with transformers that accept sample weights), scorers (e.g. #25906) or meta estimators but this is probably something we should improve.

However I acknowledge that fixing the sample_weight bugs everywhere might take time and I don't think we should constrain us to fix all the consistency issues of sample_weight handling in scikit-learn before making it possible to implement the other use cases that SLEP6 would enable (e.g. routing nested CV groups or other side metadata for fairness-constrained learning or fairness evaluation by dedicated scorers).

So here is an idea: what about merging a partial integration of SLEP6 to route any metadata except sample_weight? This way we would have:

  • no breaking changes for the first release that ships SLEP6;
  • no extra API complexity for the most common scikit-learn usage cases;
  • but we could still release an API extension that allows the new use cases (nested group-aware CV and fairness evaluation) that are currently impossible to implement with the current scikit-learn API.

Then in parallel we fix the sample_weight bugs for scorers and meta-estimators. And once we have a clear consensus of what should be the default sample_weight routing logic for pipelines (maybe the one described by @jeremiedbb above), then we can discuss how to extend the metadata router to make it possible for the use to customize sample_weight routing away from the default routing strategy.

@adrinjalali
Copy link
Member Author

The issue with this alternative, @ogrisel , is:

  • complicated routing mechanism in meta-estimators: instead of doing routing the way we do in SLEP006, we'd need to keep the current logic, and the slep6 routing.
  • all third party developers would need to do the same, which is more complicated to explain, or else the user would have a very odd inconsistent experience
  • it's buggy, as we have in our codebase already: how do we treat passed sample_weight=None? slep6 knows how to deal with it, w/o it we don't. There are some cases where we've special cased already for backward compatibility in scorers.
  • in itself, it's inconsistent, and we're treating sample_weight differently than other metadata, which isn't good UX.
  • we'll have all the same issues which we're trying to fix with slep6, i.e:
    • changing signature of a method would change behavior of existing code
    • we won't know what to do if there are **kwargs instead of an explicit sample_weight
    • passing a meta-estimator which doesn't have sample_weight in its signature but has a sub-estimator which has sample_weight in its signature is still broken, as it is now

@adrinjalali
Copy link
Member Author

This is the conclusion/summary of the meeting we had now.

Present: @agramfort @lorentzenchr @glemaitre @jeremiedbb @ogrisel @adrinjalali

Feature Flag

Release with a feature flag, which can be set through our config mechanism, either globally or a context manager. By default the feature is not enabled. We can immediately start warning users that they should use the new feature, and since we're doing a warning, we can switch to this feature at some future time w/o having to do a breaking release, since we're doing our usual deprecation cycle here (probably longer than the usual cycle)

It will probably look like:

sklearn.set_config(metadata_routing/slep6/advanced_metadata_routing/etc="enabled")

I'll create a dedicated issue to track this.

Default Request Values

Allow users to set default request values globally. By default, users would have to set request values everywhere, but using a config value (globally or in a context manager) they can set request value to true everywhere. This will looks like:

sklearn.set_global_fit_request(sample_weight=True)
sklearn.set_global_score_request(sample_weight=True)

As a result of the above code, sample_weight would be routed everywhere it can go.

Additionally, we will develop a default request value explicitly for sample_weight, which users can enable by setting request values to "auto". It will be subject to change form version to version, but they'll be following our recommendations. This is not a blocker for the release of this feature, and can be developed in parallel, and afterwards.

I'll create corresponding issues for this topic as well.

This is in combination with an inspection tool we shall develop to show users where the metadata they pass can go, for which we have an issue: #18936


This should be a good compromise to satisfy the concerns raised here so far. Anything I missed?

I'll create the corresponding issues and discussions for this next week and we can discuss them further in the next drafting meeting we have scheduled next week.

@lorentzenchr
Copy link
Member

A little add-on to @adrinjalali summary above. We agree on:

  • The current state of routing, in particular for sample_weight is broken.
    Example: Use sample_weight when validating LogisticRegressionCV #25906
  • Metadata routing according to SLEP006 will solve that problem in a general way.
  • sample_weight has a special place and needs more attention.
    According to our tradition and claim, we should provide ways to make simple things simple, e.g. by good defaults, global config options, context managers, etc., etc. and by fixing bugs.

@jjerphan
Copy link
Member

For maintainers' information and as previously mentioned to @adrinjalali, I prefer abstaining from participating in discussions on SLEP006: I am unaware of past discussions' conclusions, and I think there are already enough and better-informed maintainers participating.

I trust you to make the best decisions.

@adrinjalali
Copy link
Member Author

With all the points raised here having their own separate issue, and us not doing a breaking release now, I think we can close this one. Thanks everybody for the fruitful discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaking Change Issue resolution would not be easily handled by the usual deprecation cycle. RFC
Projects
None yet
Development

No branches or pull requests

9 participants