-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
RFC: Breaking Changes for Version 2 #25776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I prefer not to introduce multiple breaking changes in a single release. This would make it much harder to update to v2. If we need to release a breaking v2, I prefer to only break for SLEP006 and nothing else. From reviewing the PRs in SLEP006, I agree it is much clearer to release SLEP006 with breaking changes. Assuming we adopt SLEP006 with breaking changes in v2, my remaining concerns are:
|
AFAICT, no. All previously magically working code will raise and expect user to explicitly specify routing requests.
For third party estimators, if:
there is no work to be done. Everything works out of the box. For meta-estimators, they can check our version, and do the old routing or the new one, but I'm not sure if we want to recommend that. Since their user's experience would depend on what sklearn they have installed. They probably are better off to do a breaking change release the same way as we do. For people who don't want to inhering from |
Having a library that silently behaves differently depending on the version of the dependence is a killer. We should probably make explicit recommendations.
From some experience in IMO, code-wise it is worth making the breaking change. As a user, I don't think this is a big issue to start raising an error instead of a BC warning because it is mostly linked with cases where careful methodological attention should be made regarding the delegation of the weights. Thus, my only concern is third-party libraries: if a package starts to pin for |
Does it necessarily have to be a breaking change ? I'm thinking that we could have a long running deprecation and raise a future warning where the behavior will break in 2.0, even if it lasts longer than 2 releases. Obviously users will start having tons of warnings, but we could add a global config option to disable all these warnings. |
That to me is less of a concern. For those packages, But to make the transition easier, we can certainly help third parties with the adaptation. Again, this only affects meta estimators and people who don't inherit from
@jeremiedbb the whole point of this issue is that from our experience and the reasons I outlined above, it's pretty impossible/intractable/unmaintainable to have a non breaking implementation, and in many cases the two together just don't make sense. |
I meant "silent breaking change"
That's why did not say we should have both at the same time. I just propose to warn now when users are, let's say passing sample_weight to a meta_estimator. The warning would just say "you won't be able to do that in the future, checkout <some link pointing to infos on the future api>". It would reach a lot more users and third party developers imo and give them more time to get prepared and understand how they'll need to change their code base. |
I see. I'd say the effort of doing that and figuring out where exactly we should be warning is not worth it, and most users can't do anything about it anyway. It's only to warn third party developers. We better open an issue in their corresponding repositories and warn them that this is coming. |
So it seems we don't have a disagreement on doing a breaking release for SLEP006, we might talk about other things to include later. I'll prepare a PR to simplify the code accordingly then. |
Opening an issue in repositories that depends on scikit-learn does not scale well. Concretely, here are the pain points I see with a breaking transition:
NumPy and Pandas have recently used feature flags to trigger backward incompatible behavior to allow third party libraries to adopt to the changes and to teach them about the behavior change. With a feature flag, we maintain the pre-SLEP006 and post-SLEP006 routing behavior, but we do not go through the deprecation cycle. We state that in v2.0, the feature flag will do nothing and we will only use SLEP006 behavior. What does everyone think about using a feature flag? |
@thomasjpfan how do you propose we maintain both routing behaviors? Our implementations are changing signatures of our main methods, as well as introducing routing. I'm not sure how your suggestion would be implemented here, and how it would reduce the complexity of the implementation. Would it mean: if use_slep6:
new_routing()
else:
use_old_code() everywhere in our meta-estimators? You do realize the code won't be as clean as the snippet above, right? 😁 |
Let us acknowledge that
Ideally, the SLEP would have discussed that, but it popped up later during implementation - which is fine. I think that we explicitly need to decide the following: My opinion is clearly mandatory. The available options then depend on the answer. Out of my head they are:
Assuming the above summary has no major flaw or missing pieces, I am leaning towards 1 and maybe 2. |
With all the modern machinery we have in
+1
I think this would be a good idea anyway, It would help folks to add that as a dependency if they're reluctant to add
I can't imagine a feasibly good implementation for this. |
Do we need a SLEP for making a |
I wasn't part of the team when the decision was made to go with SLEP6, so I will make general comments. In general having a breaking release (without deprecation cycles) is a major pain for everyone involved. Including the maintainers, though they get a bit of a benefit because implementation is often less complex. A major source of the pain is that the breakage will catch lots of people by surprise. For example a third part library will usually only specify Given enough time, this will sort itself out. But in the mean time people will arrive in the scikit-learn repo with questions and complaints about things not working. These need dealing with, and this is how the maintainers that hoped to save themselves work by making a breaking release without deprecation cycle end up having additional work/pain. How long is the time frame? How many people will be effected? I don't know. From my experience with Jupyter and mybinder.org makes me say "much longer than you think and many more (vocal) people than you think". This means I think it is worth trying to estimate the number of impacted packages based on code search/inspection on GitHub. Yes, opening an issue in third-party repos doesn't scale, but maybe we are lucky and there aren't that many? Similarly dealing with complaints from users of third party packages doesn't scale either :-/ I like the idea of feature flags. It is a bit fiddly and for a few releases will lead to duplicated code, but it can be a good way to transition. Python is flexible enough so that you can have TL;DR: IMHO we need some data on how many people will be surprised with breakage by this particular backwards incompatible change without deprecation cycles. As a alternative for Christians "your alternative here" list: just because a SLEP has been accepted doesn't mean it has to be implemented. If since accepting a SLEP people realised that implementing it will be much more costly than they thought, it can be wise to just stop working on it/reconsider. Otherwise known as the oh so popular sunk cost fallacy. |
I don't think the situation is as dire as we think. This doesn't affect 99% of people who never pass
I really think y'all are underestimating the complexity this brings. The only feasible way I see we can do that, is to copy the whole Also, I do understand the challenges for the users when it comes to breaking changes, but I don't really see a feasible way forward at this point which doesn't involve that, and this is keeping us behind. I like @lorentzenchr 's expression of "5 years too late". On a personal level, I've been working on this topic for over 4 years now, and I have been under the impression that we care about this. and I think it'd be a pity to shelf this feature. |
I'd put it the other way around: I might be overestimating the complexity of the change, and as a result thinking that doing something complicated like modifying code through an import statement is "a good trade-off". The reason I am thinking that this change is a complex one, both in terms of code and for users, is that I've read the SLEP, some of the discussions, peaked at the related PRs and left confused/thinking "not sure I can wrap my head around this".
It is hard, especially because so much work and thought has gone into it. For me as a bystander it is amazing for how long this work was under discussion and then how long people have worked on it. It also means that as an outsider I look at it as a project with a huge cost and unclear benefits. But I think a large part of the unclearness is because I've not been involved a lot. There are also plenty of things that I've bet against in the past that turned out to be winners, so what do I know about the future. The reason I wanted to explicitly bring up the idea of "stop working on this" is that I see the same warning signs I've learnt to look for when out and about in the mountains. They include things like "we've invested so much" (e.g. we spent a day travelling here, then paid for a hotel, permits, ...), "we are nearly there, so let's push" (we've done 95% of the ascent and there are only 50 meters left), "let's make an exception just this one time" (don't ski slopes over 30 degrees when the avalanche risk is Considerable, but today is different!), lots of people in the group going quiet ("I had a bad feeling, but no one else said anything, so I didn't say anything either"), etc. In a lot of cases where people end up in trouble, it is a factor like those that lead to it. The only way I know to help deal with this "get-there-itis" is that you explicitly talk about it. Even when you don't think you are in a situation where you are effected by it. To make it a normal thing that you always talk about. People don't get in trouble 100m from the start, people get in trouble 100m from the top. So my goal is not to suggest to stop this, but to have it as an explicit option that is discussed and then decided upon. |
@betatim What I learned is that I should go in the mountains with you 🗻 @adrinjalali To better weigh the consequences of a decision, could you summarize (or point to if already existing) what a breaking change for SLEP006 will actually break, within scikit-learn users and for 3rd party library developers? |
Things which will NOT break: Estimator().fit(X, y, sample_weight) for all estimators (not meta-estimators), no matter whether they inhering from Also, no code will break if there are no metadata (e.g. What breaks: GridSearchCV(estimator=Estimator(), param_grid=grid).fit(X, y, sample_weight) It will raise with GridSearchCV(
estimator=Estimator().set_fit_request(sample_weight=True),
param_grid=grid
).fit(X, y, sample_weight) For pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())])
pipe.fit(X_train, y_train, svc__sample_weight=sw) will raise. Instead, the code becomes: pipe = Pipeline(
[
("scaler", StandardScaler().set_fit_request(sample_weight=False)),
("svc", SVC().set_fit_request(sample_weight=True)),
]
)
pipe.fit(X_train, y_train, sample_weight=sw) For other meta-estimators, it's case by case, for example for AdaBoostClassifier(SVC()).fit(X, y) should probably now raise, and the user needs to say: AdaBoostClassifier(SVC().set_fit_request(sample_weight=True)).fit(X, y) to support passing Another meta-estimator such as |
Regarding the discussions around this issue, there are in different categories:
Does this sound like a good summary of people's worries and concerns? |
Although one thing we could consider would be if we're doing a breaking change release, would we break things in a different way for this? As in, we wanted to do a smooth transition kinda solution, but if we are breaking things anyway, would we do a different thing? @jnothman reminded me of this. |
re: non-verbose declaration, we have this issue: #23928 |
@adrinjalali Thanks for the good and precise summary 🙏 |
I like @betatim's point of being explicit about the trade-off of doing it vs not doing it. @adrinjalali gave a great summary of one aspects of the downsides of doing it (breaking code). I am more concerned about the added complexity to the codebase. Hearing that it would be near-impossible to hide behind a feature flag makes me a bit queasy about the complexity of the implementation. |
@amueller the reason it's not easy to enable this with a feature flag is not the added complexity. It's rather due to the inconsistencies and bugs we have in the codebase right now when it comes to metadata routing. With the current implementation, this is how you'd implement a meta-estimator with the routing: class MetaRegressor(MetaEstimatorMixin, RegressorMixin, BaseEstimator):
def __init__(self, estimator):
self.estimator = estimator
def fit(self, X, y, **fit_params):
params = process_routing(self, "fit", fit_params)
self.estimator_ = clone(self.estimator).fit(X, y, **params.estimator.fit)
def get_metadata_routing(self):
router = MetadataRouter(owner=self.__class__.__name__).add(
estimator=self.estimator, method_mapping="one-to-one"
)
return router I would argue the complexity added above is rather minimal. However, while working on this project, I've realized in man of our meta-estimators we're not consistent with the way we route metadata. Sometimes that means the wrong logic inside the meta-estimator, sometimes it means simply not having it ATM which would mean the fix would change the signature of the method, e.g. adding a In this case, what would it mean to do things with a feature flag? It would mean keeping two routing logics for quite a while and maintain both, and in some cases implementing new inconsistent behavior for cases where we're adding parameters to existing methods. The feature flag idea is, however, still doable, compared to staying backward compatible. But that means an if/else statement pretty much in every meta-estimator. Is that what we wanna do? I would rather not I think. |
Just a stupid question: Would it make sense to first fix those bugs before introducing SLEP006? |
@lorentzenchr I personally don't want to go through the whole meta-estimator codebase twice for this 😁 And I think doing so, would probably add a year or two to the whole timeline lol. |
For the feature flag option, I was thinking of using the SLEP006 routing mechanism to define the backward compatible routing. For example, when the SLEP006 feature flag is off, With the SLEP006 feature flag off, Concretely, I am thinking of this design: class MyMetaEstimator(...):
def fit(self, X, y, **fit_params):
if SLEP006_ON:
params = process_routing(self, "fit", fit_params)
else:
# define params using backward compatible semantics
params = ...
self.estimator_ = clone(self.estimator).fit(X, y, **params.estimator.fit) (There is another design where each meta-estimator defines it's own backward compatible |
I can't make it to the dedicated meeting today (🛩️). After some thinking and discussing I think my main worry is about making the "I don't know either scikit-learn, please just make it happen, you are the expert after all" user story harder for little gain. To personify the project for a moment: scikit-learn is an expert in machine-learning, people do and should delegate decisions to it. Delegation is convenient and for the vast majority of people the outcomes are better (day trading vs passive index investing, wiring your own house vs a professional electrician doing it, using Python to do science vs C++). For me this means that as a user I've expressed everything that there is to express with est = AdaBoostClassifier(LogisticRegression())
est.fit(X, y, sample_weight=sw) Most likely the average user doesn't even know that there are options and what the trade-offs between them are. As an expert who does know, most of the time I am not using this in "expert mode" either, so it is inconvenient to have to make choices. I also believe that if there are sample weights involved then everything in the "chain" (be that a Something to avoid are cases like #25906. To me this is a bug. If a user passes sample weights in the metric inside This does not mean we should forbid/make it impossible for experts to re-jig things so that they can use I wrote more about this in #23928 (comment) It feels like introducing "your chain of tools has to be consistent when it comes to using/not using sample weights" is something that can be done in a backwards incompatible with deprecation cycle way. For additional sample aligned properties like protected group membership I think we have less constraints from current user's code because AFAIK it is virtually impossible to do this now anyway. This means I am less worried about requiring explicit routing setup for these use-cases. It would still be nice if there was a way to make the "happy path" work without much extra code from the user. Most users have no sample weights, a few less have sample weights, even fewer have additional sample aligned properties and nearly no users are experts-doing-expert things. The defaults and convenience of using scikit-learn should reflect that (least amount of extra code for no weights, most extra code and "know what you are doing" for the experts doing expert things). |
To me it also feels like a bug that not everything in the chain has to be sample weight aware. I think that if we want to handle sample weigths properly and consistently in scikit-learn, we first need to be clear about what is a sample weight. There are already discussions about that (I can't retrieve), but the way I see sample weights is that if an observation point has a weight w, then everything should act as if there were w copies of this point in the dataset.
Thinking about sample weights like I described above, any step that silently ignores sample weights is a bug in my mind. Thus, even if adding support for sample weights to an estimator changes the behavior when it's used in a meta estimator, I consider it a bug fix. This is why I tend to think that meta-estimators should raise if any of the step does not support sample weights. I understand however that metadata routing is not just about sample weights and that all meta-estimators have different ways of doing things for technical reasons, and I haven't spent enough time on this subject, unlike @adrinjalali :), to have a strong opinion on how to design the best API to handle that, so my thought are not very deep. |
I agree with Tim's point that we should strive to have scikit-learn "make it easy/simple to use ML correctly". I am not 100% sure what is the correct way to handle However I acknowledge that fixing the So here is an idea: what about merging a partial integration of SLEP6 to route any metadata except
Then in parallel we fix the |
The issue with this alternative, @ogrisel , is:
|
This is the conclusion/summary of the meeting we had now. Present: @agramfort @lorentzenchr @glemaitre @jeremiedbb @ogrisel @adrinjalali Feature FlagRelease with a feature flag, which can be set through our config mechanism, either globally or a context manager. By default the feature is not enabled. We can immediately start warning users that they should use the new feature, and since we're doing a warning, we can switch to this feature at some future time w/o having to do a breaking release, since we're doing our usual deprecation cycle here (probably longer than the usual cycle) It will probably look like: sklearn.set_config(metadata_routing/slep6/advanced_metadata_routing/etc="enabled") I'll create a dedicated issue to track this. Default Request ValuesAllow users to set default request values globally. By default, users would have to set request values everywhere, but using a config value (globally or in a context manager) they can set request value to sklearn.set_global_fit_request(sample_weight=True)
sklearn.set_global_score_request(sample_weight=True) As a result of the above code, Additionally, we will develop a default request value explicitly for I'll create corresponding issues for this topic as well. This is in combination with an inspection tool we shall develop to show users where the metadata they pass can go, for which we have an issue: #18936 This should be a good compromise to satisfy the concerns raised here so far. Anything I missed? I'll create the corresponding issues and discussions for this next week and we can discuss them further in the next drafting meeting we have scheduled next week. |
A little add-on to @adrinjalali summary above. We agree on:
|
For maintainers' information and as previously mentioned to @adrinjalali, I prefer abstaining from participating in discussions on SLEP006: I am unaware of past discussions' conclusions, and I think there are already enough and better-informed maintainers participating. I trust you to make the best decisions. |
With all the points raised here having their own separate issue, and us not doing a breaking release now, I think we can close this one. Thanks everybody for the fruitful discussions. |
A while ago we talked about the possibility of a version 2 with breaking changes. Specifically, this came up in the context of SLEP006: metadata routing, but there are other things we have wanted to break which we can do all in a single release. In this issue we can talk about the possibility of such a release, and maybe a few things we'd like to include in it.
I'll try to summarize the challenges of keeping backward compatibility (BC) in the context of SLEP006:
MetaEstimator(Estimator()).fit(X, y, sample_weight)
will raise according to SLEP006 sincesample_weight
is not requested by any object. For backward compatibility, during the transition period, it'll only warn and assumesample_weight
is requested byEstimator()
.GridSearchCV(Estimator(), scorer=scorer).fit(X, y, sample_weight)
be during the transition period? If we keep backward compatibility, it should warn (whatever the warning be), routesample_weight
toEstimator
, but not to thescorer
. And that's only the case because we're keeping BC. From the user's perspective, it's very weird.Pipeline
ATM in certain methods likefit
,fit_transform
,predict
, routes*_params
to thefit/etc
of the last estimator only. Anddecision_function
andtransform
,score_samples
have no routing at all. Keeping this BC, proves to be challenging, and a bit nasty. ref: ENH SLEP006: add routing to Pipeline #24270sample_weight
in the signature offit
, and we pass the given sample weight to it if that's the case, and not if that's not the case. With routing, we would have a much better idea of when to pass sample weight and when not, but if we're keeping BC, it's really challenging to see if we should forward sample weights or not. Both AdaBoost (FEAT add routing to AdaBoost's fit #24026) and Bagging (SLEP006: Metadata routing for bagging #24250) have been very challenging, to the point that we might end up having to check if a sub-estimator is a meta-estimator itself or not, and check signature of sub-estimator'sfit
, and check the given routing info to keep BC.I'm pretty sure there are other issues we haven't foreseen, and they'll come up as we work on this project. We thought it'd be easy once we get to this point, but BC has proven to make it very tricky and has stalled the progress.
All of the above is adding substantial complexity which is probably error prone, and buggy. We have had to spend quite a bit of time every time we go back to review those bits to understand why we've done what we've done. Even as a temporary solution they're not very maintainable.
My proposal here is to have a version 2 which is not BC with previous releases in certain ways, and target a few breaking changes for it. We can aim to have the release in October 2023 or April/May 2024 as the version 2.
cc @scikit-learn/core-devs
The text was updated successfully, but these errors were encountered: