-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
RFC SLEP006: verbose vs non-verbose declaration in meta-estimator #23928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The increased verbosity in simple cases indeed makes me a bit sad.
|
The cases where, to me, we're increasing verbosity unnecessarily are cases where the metaestimator always has one destination. For example, |
Can you elaborate a bit? The sub-estimator of a In general, we can't check for signature as we're doing now. Signature checks are a hack which kinda work, but are quite error prone. What you're suggesting, to me sounds like option (2) in the post above, where you'd like to have two classes of routers, simple and complex ones. Simple ones always route everything and work like a [network] bridge, complex ones do some sort of routing. |
But then we can still pass whatever args are given to MultiOutputRegressor.fit to is base estimator. If it's a Pipeline it will know how to route those args. Not sure if that's the elaboration you sought |
So with that, you're arguing for option (2) in my top post. I don't mind it, and we do have the tools to support it. As mentioned above, I just worry about the education part, since it creates two classes of meta-estimators. But I'm happy to go down that route if you think it makes sense. ping @scikit-learn/core-devs |
What about a 4th option: Not a helper function, but a helper method est = AdaBoostClassifier(LogisticRegression().set_all_request_sample_weight()).fit(X, y, sample_weight=sw) Or |
I like the readability of |
Having different kinds of metaestimators is okay, but we need to be clear on what qualifies... Otherwise okay to leave as is for now. |
Another solution would be something like: with sklearn.config_context(sample_weight_requested=True):
est = AdaBoostClassifier(LogisticRegression()).fit(X, y, sample_weight=sw) cc @betatim |
I'm coming at this from the perspective of a user or someone having to explain things to users. This is because I've not dug into the code related to SLEP6, this means my comments don't take into account what would be easier or harder to implement/maintain. I'm not sure I understand why a user has to say more than: est = AdaBoostClassifier(LogisticRegression()).fit(X, y, sample_weight=sw) It seems like with this line I've already expressed all there is to say: I want to use Ada boosting, with a logistic regression classifier and my samples have weights. Now please do your thing scikit-learn. My assumption is that the vast majority of users do not know and do not want to have to know the subtleties involved in using ada boost with sample weights. This means they can't make an educated choice which in turns means they are likely to make a choice that is worse/wrong than if they delegate to an expert. I think there should be an accepted way of dealing with sample weights when doing ada boosting. This should be the default behaviour. There might be a small percentage of experts who want to do something different, and for those it would be good to expose an "escape hatch" where they can override the default behaviour. And this escape hatch machinery could even be what Another example that came up recently is #25906. For me it is clear that it is a bug. The user passed sample weights but the scoring did not use them. There are two outcomes I'd expect: (1) it works and weights are used everywhere (where they can be used). (2) I get an error telling me that I requested something that is impossible (e.g. I'm using a metric for which sample weights are not supported), ideally with a pointer to an escape hatch with a big warning on it "here be dragons, by using this you void your warranty". TIL: simple things should be simple for the not-expert, with an option for experts to do expert things would be my dream scenario. |
There are a few issues with the above example:
|
I'm not sure. In version This means in version
What I'm imagining is something like the following inside say
Each step of the pipeline either knows what the value of The logic for This is also a backwards incompatible change and would require third party developers to make changes. I think we can introduce it without a breaking change though by warning for two releases that your third party estimator has no For some estimators, metrics, etc both Finally, maybe the method for implementing the mechanism of I think this is covered by what I wrote above. The pipeline needs to inspect its steps and each step needs to come up with an answer.
This I disagree with. A lot of users could probably not explain the Ada Boost (or gradient boosting or probably even random forest) algorithm without making a mistake. That is Ok. I personally can't explain to you how to multiply two matrices together, I rely on Numpy knowing and having an efficient implementation. Delegating to experts is super efficient. That is why I think it is fine for users to not know how Ada Boost works or that it works differently when I pass weights/no weights. As long as scikit-learn gives me an error when I request something inconsistent (e.g. passing weights to |
Not talking about the pipeline. Talking about a meta-estimator such as
This is how things are now, and it's very brittle, and breaks easily when a meta-estimator is passed instead of a very simple consumer.
Again, that means the behavior changes when an estimator inside the pipeline suddenly starts supporting the metadata/sample weight, and that means adding support for sample weights to an estimator would be a backward incompatible change. |
A dedicated issue to talk about the proposal which also addresses this: #26050 |
AFAIK
I disagree. There should be an exception if you pass sample weights to a pipeline that contains a step that does not handle weights. Therefore when sample weight support is added to an estimator that did not support it before, there should not be any pipelines out there that both include this estimator and were being used with weights. Adding sample weight support is IMHO a new feature, not a backwards incompatible change. |
As the proposal and the implementation of meta-estimator routing (SLEP006) stands, if the user wants to use
sample_weight
, they need to be quite verbose in how they declare the estimators. TakingAdaBoostClassifier
as an example, and imagining ifAdaBoostClassifier
would use the sub-estimator'sscore
method, the user would have to write:which is quite more verbose than the current code users need to write:
There have been concerns about making users write quite verbose code in cases where the current pattern seems quite reasonable.
Without changing everything related to SLEP006, there are three paths we can take:
Option 1: Helper function
We can introduce helper functions to make the above code simpler. For instance, a
weighted
function could requestsample_weight
on all methods which acceptsample_weight
for a given estimator. Then the above code would look like:and if the sub-estimator is a pipeline:
Implementing
weighted
for aPipeline
(or other meta-estimators) would be tricky sinceset_fit_request
is only available for consumers and not non-consumer routers; therefore the user needs to repeat theweighted
call for all sub-estimators.Option 2: Different meta-estimators
Have two classes of meta-estimators (or routers to be specific).
In this scenario, we divide meta-estimators into two classes, simple and complex. Simple routers are the ones which simply forward
**kwargs
to sub-estimators, and by default the assume sub-estimators have requested those metadata. This simplifies the users' code and makes the existing code for simple meta-estimators to keep working, but it raises a few issues.First is that there will be two classes of meta-estimators, and the user would need to know which estimator is of which class. It's also not clear what we should do if the user explicitly sets request values for metadata (we can probably respect those if present).
Another issue is that if a meta-estimator changes behavior, it needs to become a complex meta-estimator if we want to keep backward compatibility for it. This doesn't seem like a good pattern.
Option 3: Keep as is
Do nothing, things are as is.
I'm in favor or option 1 because:
xref: #22986 (review)
The text was updated successfully, but these errors were encountered: