-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[WIP] Label power set multilabel classification strategy #2461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pr is ready for review. |
doc/modules/multiclass.rst
Outdated
Label power set | ||
=============== | ||
|
||
:class:`LabelPowerSetClassifier` is problem transformation method and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a problem ...
This is so funny, @arjoly are you reviewing your own pull request? :) |
Yep ;-) |
@rsivapr ENH means enhancement(s) |
@arjoly That was quick! :) I literally deleted that post within a second when I realized it must mean that. |
I received an email when you post a message on my pull request. ;-) |
(ping @glouppe ) |
doc/modules/multiclass.rst
Outdated
------------------- | ||
|
||
Label power set can be used for multi-class classification, but this is | ||
equivalent to a nop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure everybody knows what you mean by nop ;)
My first question would be: does this ever work? And can we get an example of this vs OVR with one dataset where OVR works and one where this works? Otherwise this looks good, good job :) I am not entirely happy with the testing as the real use case is only tested via a hard-coded result. I think I would like it best if the transformation would be done by hand there for a small problem and in an obvious way and compare against the estimator. But maybe that is overkill. wdyt? |
sklearn/multiclass.py
Outdated
|
||
The maximum number of class is bounded by the number of samples and | ||
the number of possible label sets in the training set. This strategy | ||
allows to take into account the correlation between the labels contrarily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"allows to" -> "may".
"contrarily to one-vs-the-rest" -> "unlike one-vs-rest".
Also, please add a warning that complexity blows out exponentially with the number of classes, restricting its use to ?<=10.
Yes, this works. For instance on the yeast dataset, the lps meta-estimator shine on several metrics compare to ovr
Should I add the script to the examples? |
Do you suggest to create a |
Hm strange, commons tests are failing and do not detect that this is a meta estimator. |
sklearn/multiclass.py
Outdated
y_coded = self.estimator.predict(X) | ||
binary_code_size = len(self.label_binarizer_.classes_) | ||
|
||
if binary_code_size == 2 and self.label_binarizer_: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIX second condition. It needs a better tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for, the case where it's really binary classification? I'd add a comment.
rebase on top of master |
Writeup of our discussion:
Apart from this lgtm, 👍 |
There is an api discepancy for that classes. |
see #2451 for more information |
Isn't there some way of renormalizing the output of OvR to be comparable? |
I answer to you totally off apparently. Yes, there is a way. The one we discussed today. I am working. I need to add tests whenever some label set are missing, i.e. label set presented at fit are not the same whenver you predict. |
|
yes, it should have one Switch from MRG to WIP since I am progressing slowly on this. |
Hi everyone, this seems to be almost in a completed state but not merged yet. Was there any reason for doing so ? I would also like to ask the same about classifier chain algorithm in #3727 and MultiOutput Bagging in #4848. I would like to complete these PRs in case no one else is currently working on these and if it is okay. Thanks ! |
@maniteja123 I think you can feel free to take this over if you like, I think @arjoly is busy. |
given the lack of requests for this, I'd suggest moving this to scikit-learn-extras. |
I agree. Should we close this and create an issue there? |
Add one of the simplest and common multi-label classification strategy which use
a multi-class classifier as a base estimator.
The core code is functional, but there is still things to do: