RFC SLEP006: verbose vs non-verbose declaration in meta-estimator #23928

adrinjalali · 2022-07-16T15:31:38Z

As the proposal and the implementation of meta-estimator routing (SLEP006) stands, if the user wants to use sample_weight, they need to be quite verbose in how they declare the estimators. Taking AdaBoostClassifier as an example, and imagining if AdaBoostClassifier would use the sub-estimator's score method, the user would have to write:

est = (
    AdaBoostClassifier(LogisticRegression().set_fit_request(sample_weight=True)
    .set_score_request(sample_weight=True))
    .fit(X, y, sample_weight=sw)
)

which is quite more verbose than the current code users need to write:

est = AdaBoostClassifier(LogisticRegression()).fit(X, y, sample_weight=sw)

There have been concerns about making users write quite verbose code in cases where the current pattern seems quite reasonable.

Without changing everything related to SLEP006, there are three paths we can take:

Option 1: Helper function

We can introduce helper functions to make the above code simpler. For instance, a weighted function could request sample_weight on all methods which accept sample_weight for a given estimator. Then the above code would look like:

est = AdaBoostClassifier(weighted(LogisticRegression())).fit(X, y, sample_weight=sw)

and if the sub-estimator is a pipeline:

est = AdaBoostClassifier(
    make_pipeline(weighted(StandardScaler()), weighted(LogisticRegression())))
).fit(X, y, sample_weight=sw)

Implementing weighted for a Pipeline (or other meta-estimators) would be tricky since set_fit_request is only available for consumers and not non-consumer routers; therefore the user needs to repeat the weighted call for all sub-estimators.

Option 2: Different meta-estimators

Have two classes of meta-estimators (or routers to be specific).

In this scenario, we divide meta-estimators into two classes, simple and complex. Simple routers are the ones which simply forward **kwargs to sub-estimators, and by default the assume sub-estimators have requested those metadata. This simplifies the users' code and makes the existing code for simple meta-estimators to keep working, but it raises a few issues.

First is that there will be two classes of meta-estimators, and the user would need to know which estimator is of which class. It's also not clear what we should do if the user explicitly sets request values for metadata (we can probably respect those if present).

Another issue is that if a meta-estimator changes behavior, it needs to become a complex meta-estimator if we want to keep backward compatibility for it. This doesn't seem like a good pattern.

Option 3: Keep as is

Do nothing, things are as is.

I'm in favor or option 1 because:

with the helper function the user code doesn't look too verbose
using metadata is not a beginner kinda thing and therefore this API is not hampering beginners' experience with the library
it keeps consistency among meta-estimators/consumers

xref: #22986 (review)

The text was updated successfully, but these errors were encountered:

jnothman · 2022-09-23T07:00:59Z

The increased verbosity in simple cases indeed makes me a bit sad.

AdaBoostClassifier is a bit unusual because the meta-estimator needs to send sample_weight to the base estimator, regardless of whether weights are passed in. As such, AdaBoostClassifier might be seen as a consumer of the sample_weight which it then modifies. What it is doing with relation to its base estimator and sample_weight is certainly not routing, but rather producing. Thus a consumer should only have to set_fit_request(sample_weight=True) to be passed into AdaBoostClassifier in the case that it is wrapped in a Pipeline or similar... which makes sense! How to explain this apparent "exception" in the documentation? I'm not sure...

jnothman · 2022-09-23T07:03:57Z

The cases where, to me, we're increasing verbosity unnecessarily are cases where the metaestimator always has one destination. For example, MultiOutputRegressor. This is eligible for the "simple routers" treatment, which is simple precisely because it's not routing, just passing.

adrinjalali · 2022-09-27T07:02:43Z

The cases where, to me, we're increasing verbosity unnecessarily are cases where the metaestimator always has one destination. For example, MultiOutputRegressor. This is eligible for the "simple routers" treatment, which is simple precisely because it's not routing, just passing.

Can you elaborate a bit? The sub-estimator of a MultiOutputRegressor can always be a pipeline.

In general, we can't check for signature as we're doing now. Signature checks are a hack which kinda work, but are quite error prone.

What you're suggesting, to me sounds like option (2) in the post above, where you'd like to have two classes of routers, simple and complex ones. Simple ones always route everything and work like a [network] bridge, complex ones do some sort of routing.

jnothman · 2022-09-29T14:56:15Z

The sub-estimator of a MultiOutputRegressor can always be a pipeline.

But then we can still pass whatever args are given to MultiOutputRegressor.fit to is base estimator. If it's a Pipeline it will know how to route those args.

Not sure if that's the elaboration you sought

adrinjalali · 2022-09-30T12:04:52Z

So with that, you're arguing for option (2) in my top post. I don't mind it, and we do have the tools to support it. As mentioned above, I just worry about the education part, since it creates two classes of meta-estimators. But I'm happy to go down that route if you think it makes sense.

ping @scikit-learn/core-devs

lorentzenchr · 2022-09-30T13:14:05Z

What about a 4th option: Not a helper function, but a helper method

est = AdaBoostClassifier(LogisticRegression().set_all_request_sample_weight()).fit(X, y, sample_weight=sw)

Or set_all_request(sample_weight=True).
Name to be discussed.

jnothman · 2022-12-28T05:53:14Z

I like the readability of weighted, but I think there's too much implicit in it. Hard for the reader to know that scoring is weighted. I'd prefer explicit there. I can imagine many users defining weighted at the top of their scripts to avoid .set_fit_request(sample_weight=True), but I'm not sure something based on sniffing places that sample_weight may be requested is a good idea.

jnothman · 2022-12-28T05:54:13Z

Having different kinds of metaestimators is okay, but we need to be clear on what qualifies... Otherwise okay to leave as is for now.

adrinjalali · 2023-03-27T13:48:27Z

Another solution would be something like:

with sklearn.config_context(sample_weight_requested=True):
	est = AdaBoostClassifier(LogisticRegression()).fit(X, y, sample_weight=sw)

cc @betatim

betatim · 2023-03-29T09:37:06Z

I'm coming at this from the perspective of a user or someone having to explain things to users. This is because I've not dug into the code related to SLEP6, this means my comments don't take into account what would be easier or harder to implement/maintain.

I'm not sure I understand why a user has to say more than:

est = AdaBoostClassifier(LogisticRegression()).fit(X, y, sample_weight=sw)

It seems like with this line I've already expressed all there is to say: I want to use Ada boosting, with a logistic regression classifier and my samples have weights. Now please do your thing scikit-learn.

My assumption is that the vast majority of users do not know and do not want to have to know the subtleties involved in using ada boost with sample weights. This means they can't make an educated choice which in turns means they are likely to make a choice that is worse/wrong than if they delegate to an expert.

I think there should be an accepted way of dealing with sample weights when doing ada boosting. This should be the default behaviour. There might be a small percentage of experts who want to do something different, and for those it would be good to expose an "escape hatch" where they can override the default behaviour. And this escape hatch machinery could even be what AdaBoostClassifier uses to set things up the mainstream way.

Another example that came up recently is #25906. For me it is clear that it is a bug. The user passed sample weights but the scoring did not use them. There are two outcomes I'd expect: (1) it works and weights are used everywhere (where they can be used). (2) I get an error telling me that I requested something that is impossible (e.g. I'm using a metric for which sample weights are not supported), ideally with a pointer to an escape hatch with a big warning on it "here be dragons, by using this you void your warranty".

TIL: simple things should be simple for the not-expert, with an option for experts to do expert things would be my dream scenario.

adrinjalali · 2023-03-29T15:00:10Z

There are a few issues with the above example:

What if LogisticRegression doesn't support sample_weight in version x, and starts supporting it in version x+1? Then the same code, would have to different behaviors silently. That's now what we do in scikit-learn.
How is AdaBoost supposed to know LogisticRegression supports sample_weight? Is it because its fit accepts the argument? What if instead of LogisticRegression, I pass something which is not a simple consumer? Should there be a way for estimators to tell the router objects, whether they are consumers or not? If yes, that adds quite a bit more burden to third party developers. Also, that would mean we'll move from "almost all third party estimators work out of the box with the current change", to "none of them work out of the box, all devs have to do something to give us more info about their estimators", and also, that would mean we'd have to modify all our estimators and add the info, which we're not doing now.
What if I want to put the above AdaBoost in a Pipeline? The Pipeline now definitely is not going to know if AdaBoost supports sample_weight, and the user would now need to go and set request values. It would be simpler if the user needed to set request values from the beginning.
AdaBoost basically follows two strategies based on whether the sub-estimator supports sample_weight or not, and the user needs to know that. Otherwise we're not encouraging them to do good machine learning.

betatim · 2023-03-31T07:31:12Z

What if LogisticRegression doesn't support sample_weight in version x, and starts supporting it in version x+1? Then the same code, would have to different behaviors silently. That's now what we do in scikit-learn.

I'm not sure. In version x you could not use it in a pipeline or other setup where you pass sample_weight to fit. Because I think having a mix of "some of my steps in a pipeline support weights and others don't" is a bug. Here "support" means "can do something when weights are passed", which might be "ignore them" where that is appropriate For example MinMaxScaler's behaviour is probably the same whether there are weights or not, so it "supports" using them, but ignores them.

This means in version x you can only have a setup that does not support passing in sample weights. In version x+1 it becomes possible to pass in sample weights (assuming LogisticRegression was the only thing not supporting them). However this is not a silent change as the user will have to change their code to pass in weights.

How is AdaBoost supposed to know LogisticRegression supports sample_weight? Is it because its fit accepts the argument? What if instead of LogisticRegression, I pass something which is not a simple consumer?

AdaBoost would find out by interrogating the thing passed to it. Maybe by inspecting the signature of fit or because the thing passed in has an annotation that lets you find out or a method to call. I'm not sure I have an opinion on how exactly to implement it. The thing I care about is that there is a default, so that as a user I don't have to set it explicitly.

What I'm imagining is something like the following inside say fit of Pipeline:

@property
def supports_weights(self):
  return all(step.supports_weights for step in self.steps)

@property
def supports_no_weights(self):
  return all(step.supports_no_weights for step in self.steps)

def fit(..., sample_weights=None):
  if sample_weights is not None and not all(step.supports_weights for step in self.steps):
    raise RuntimeError("You passed weights but not all steps support them.")
  # For completeness. Though I'm not sure I can imagine a case where this would happen
  if sample_weights is None and not all(step.supports_no_weights for step in self.steps):
    raise RuntimeError("You did not pass weights but at least one step requires them.")
  ...
  # if weights are passed in we pass them to every step
  if sample_weights is not None:
    for step in self.steps:
      step.fit(..., sample_weights=sample_weights)
  ...

Each step of the pipeline either knows what the value of self.supports_{no_}weights is or interrogates its components to compute it.

The logic for supports_weights should consider "everything" that happens inside the particular estimator, so there might be a few where it has to be handcrafted for that estimator.

This is also a backwards incompatible change and would require third party developers to make changes. I think we can introduce it without a breaking change though by warning for two releases that your third party estimator has no supports_weights and will stop working soon. Similarly with users that use step_name__sample_weights=blah.

For some estimators, metrics, etc both supports_weights and supports_no_weights can be true. For example an accuracy metric can work with weights and without. Same with LogisticRegressionCV, unless the user instantiated it with a metric that does not support weights, in that case supports_no_weights would be false.

Finally, maybe the method for implementing the mechanism of supports_weights and over-riding the default values is the metadata routing that SLEP6 proposes. I don't know.

I think this is covered by what I wrote above. The pipeline needs to inspect its steps and each step needs to come up with an answer.

AdaBoost basically follows two strategies based on whether the sub-estimator supports sample_weight or not, and the user needs to know that. Otherwise we're not encouraging them to do good machine learning.

This I disagree with. A lot of users could probably not explain the Ada Boost (or gradient boosting or probably even random forest) algorithm without making a mistake. That is Ok. I personally can't explain to you how to multiply two matrices together, I rely on Numpy knowing and having an efficient implementation. Delegating to experts is super efficient. That is why I think it is fine for users to not know how Ada Boost works or that it works differently when I pass weights/no weights. As long as scikit-learn gives me an error when I request something inconsistent (e.g. passing weights to LogisticRegressionCV.fit but using a metric that does not support weights). The vast majority of users are better off trusting that scikit-learn does the right thing and that they could find out what it is that it does via documentation if they ever have to.

adrinjalali · 2023-03-31T09:55:52Z

I'm not sure. In version x you could not use it in a pipeline or other setup where you pass sample_weight to fit. Because I think having a mix of "some of my steps in a pipeline support weights and others don't" is a bug.

Not talking about the pipeline. Talking about a meta-estimator such as AdaBoost which can handle sub-estimators which don't support sample_weight. So from the user's side, the API would look the same with your suggestion, but suddenly the behavior is changed. This is a core reason why we're not having any defaults.

AdaBoost would find out by interrogating the thing passed to it. Maybe by inspecting the signature of fit or because the thing passed in has an annotation that lets you find out or a method to call. I'm not sure I have an opinion on how exactly to implement it. The thing I care about is that there is a default, so that as a user I don't have to set it explicitly.

This is how things are now, and it's very brittle, and breaks easily when a meta-estimator is passed instead of a very simple consumer.

What I'm imagining is something like the following inside say fit of Pipeline:

Again, that means the behavior changes when an estimator inside the pipeline suddenly starts supporting the metadata/sample weight, and that means adding support for sample weights to an estimator would be a backward incompatible change.

adrinjalali · 2023-04-01T15:31:26Z

A dedicated issue to talk about the proposal which also addresses this: #26050

betatim · 2023-04-03T07:29:31Z

Not talking about the pipeline. Talking about a meta-estimator such as AdaBoost which can handle sub-estimators which don't support sample_weight. So from the user's side, the API would look the same with your suggestion, but suddenly the behavior is changed. This is a core reason why we're not having any defaults.

AFAIK AdaBoost does not support estimators that do not support sample weights.

Again, that means the behavior changes when an estimator inside the pipeline suddenly starts supporting the metadata/sample weight, and that means adding support for sample weights to an estimator would be a backward incompatible change.

I disagree. There should be an exception if you pass sample weights to a pipeline that contains a step that does not handle weights. Therefore when sample weight support is added to an estimator that did not support it before, there should not be any pipelines out there that both include this estimator and were being used with weights. Adding sample weight support is IMHO a new feature, not a backwards incompatible change.

github-actions bot added the Needs Triage Issue requires triage label Jul 16, 2022

This was referenced Jul 16, 2022

SLEP006 - Metadata Routing task list #22893

Open

FEAT multioutput routes metadata #22986

Merged

thomasjpfan added API RFC and removed Needs Triage Issue requires triage labels Jul 21, 2022

adrinjalali mentioned this issue Mar 27, 2023

RFC: Breaking Changes for Version 2 #25776

Closed

adrinjalali mentioned this issue Apr 1, 2023

SLEP006: globally setting request values #26050

Open

glemaitre added this to Metadata routing May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC SLEP006: verbose vs non-verbose declaration in meta-estimator #23928

RFC SLEP006: verbose vs non-verbose declaration in meta-estimator #23928

adrinjalali commented Jul 16, 2022 •

edited by lorentzenchr

Loading

jnothman commented Sep 23, 2022

jnothman commented Sep 23, 2022

adrinjalali commented Sep 27, 2022

jnothman commented Sep 29, 2022

adrinjalali commented Sep 30, 2022

lorentzenchr commented Sep 30, 2022

jnothman commented Dec 28, 2022

jnothman commented Dec 28, 2022

adrinjalali commented Mar 27, 2023

betatim commented Mar 29, 2023

adrinjalali commented Mar 29, 2023

betatim commented Mar 31, 2023

adrinjalali commented Mar 31, 2023

adrinjalali commented Apr 1, 2023

betatim commented Apr 3, 2023

RFC SLEP006: verbose vs non-verbose declaration in meta-estimator #23928

RFC SLEP006: verbose vs non-verbose declaration in meta-estimator #23928

Comments

adrinjalali commented Jul 16, 2022 • edited by lorentzenchr Loading

Option 1: Helper function

Option 2: Different meta-estimators

Option 3: Keep as is

jnothman commented Sep 23, 2022

jnothman commented Sep 23, 2022

adrinjalali commented Sep 27, 2022

jnothman commented Sep 29, 2022

adrinjalali commented Sep 30, 2022

lorentzenchr commented Sep 30, 2022

jnothman commented Dec 28, 2022

jnothman commented Dec 28, 2022

adrinjalali commented Mar 27, 2023

betatim commented Mar 29, 2023

adrinjalali commented Mar 29, 2023

betatim commented Mar 31, 2023

adrinjalali commented Mar 31, 2023

adrinjalali commented Apr 1, 2023

betatim commented Apr 3, 2023

adrinjalali commented Jul 16, 2022 •

edited by lorentzenchr

Loading