[WIP] Label power set multilabel classification strategy #2461

arjoly · 2013-09-20T14:28:35Z

Add one of the simplest and common multi-label classification strategy which use
a multi-class classifier as a base estimator.

The core code is functional, but there is still things to do:

Add some word about binary relevance in ovr narrative doc
Write narrative doc about LP
Add some references
Add some regression tests
"making your remark about overfitting a bit more explicit maybe?"

arjoly · 2013-09-25T15:49:08Z

This pr is ready for review.

arjoly · 2013-09-26T06:15:24Z

doc/modules/multiclass.rst

+Label power set
+===============
+
+:class:`LabelPowerSetClassifier` is problem transformation method and


a problem ...

glouppe · 2013-09-26T06:50:12Z

This is so funny, @arjoly are you reviewing your own pull request? :)

arjoly · 2013-09-26T06:53:02Z

Yep ;-)

arjoly · 2013-09-26T08:03:43Z

@rsivapr ENH means enhancement(s)

rsivapr · 2013-09-26T08:33:02Z

@arjoly That was quick! :) I literally deleted that post within a second when I realized it must mean that.

arjoly · 2013-09-26T08:35:48Z

I received an email when you post a message on my pull request. ;-)

arjoly · 2013-10-16T12:19:05Z

(ping @glouppe )

amueller · 2013-11-04T06:14:25Z

doc/modules/multiclass.rst

+-------------------
+
+Label power set can be used for multi-class classification, but this is
+equivalent to a nop.


not sure everybody knows what you mean by nop ;)

amueller · 2013-11-04T06:29:18Z

My first question would be: does this ever work? And can we get an example of this vs OVR with one dataset where OVR works and one where this works?
Also, I think we should warn the user that this can only produce label combinations that actually exist in the trainingset (making your remark about overfitting a bit more explicit maybe?)

Otherwise this looks good, good job :)

I am not entirely happy with the testing as the real use case is only tested via a hard-coded result. I think I would like it best if the transformation would be done by hand there for a small problem and in an obvious way and compare against the estimator. But maybe that is overkill. wdyt?

jnothman · 2013-12-08T22:27:38Z

sklearn/multiclass.py

+
+    The maximum number of class is bounded by the number of samples and
+    the number of possible label sets in the training set. This strategy
+    allows to take into account the correlation between the labels contrarily


"allows to" -> "may".
"contrarily to one-vs-the-rest" -> "unlike one-vs-rest".

Also, please add a warning that complexity blows out exponentially with the number of classes, restricting its use to ?<=10.

arjoly · 2013-12-09T07:08:12Z

@amueller @jnothman Thanks for the review !!! I will try to find some time to work on all your comments.

arjoly · 2013-12-10T09:57:07Z

My first question would be: does this ever work? And can we get an example of this vs OVR with one dataset where OVR works and one where this works?

Yes, this works. For instance on the yeast dataset, the lps meta-estimator shine on several metrics compare to ovr

{'hamming_loss': {'dummy': 0.23298021498675806,
                  'lps svm': 0.25775042841564105,
                  'ova svm': 0.23298021498675806},
 'jaccard': {'dummy': 0.33653822852296145,
             'lps svm': 0.43881512586528965,
             'ova svm': 0.33653822852296145},
 'macro-f1': {'dummy': 0.12221934801958166,
              'lps svm': 0.23575486447032259,
              'ova svm': 0.12221934801958166},
 'micro-f1': {'dummy': 0.47828362114076395,
              'lps svm': 0.56270648870093831,
              'ova svm': 0.47828362114076395},
 'samples-f1': {'dummy': 0.45689163011954953,
                'lps svm': 0.547173739867307,
                'ova svm': 0.45689163011954953},
 'subset_accuracy': {'dummy': 0.017448200654307525,
                     'lps svm': 0.14612868047982552,
                     'ova svm': 0.017448200654307525},
 'weighted-f1': {'dummy': 0.30083303670803041,
                 'lps svm': 0.43848139536413128,
                 'ova svm': 0.30083303670803041}}

Should I add the script to the examples?

arjoly · 2013-12-10T10:00:11Z

I am not entirely happy with the testing as the real use case is only tested via a hard-coded result. I think I would like it best if the transformation would be done by hand there for a small problem and in an obvious way and compare against the estimator. But maybe that is overkill. wdyt?

Do you suggest to create a LabelPowerSetTransformer?

arjoly · 2013-12-10T10:15:30Z

Hm strange, commons tests are failing and do not detect that this is a meta estimator.

arjoly · 2013-12-10T14:17:01Z

sklearn/multiclass.py

+        y_coded = self.estimator.predict(X)
+        binary_code_size = len(self.label_binarizer_.classes_)
+
+        if binary_code_size == 2 and self.label_binarizer_:


FIX second condition. It needs a better tests.

What is this for, the case where it's really binary classification? I'd add a comment.

arjoly · 2014-07-19T13:50:53Z

rebase on top of master

coveralls · 2014-07-19T14:01:58Z

Coverage increased (+0.01%) when pulling b5f1a9c on arjoly:labelpowerset into 0807e19 on scikit-learn:master.

arjoly · 2014-07-19T15:25:46Z

@amueller and @vene Is it good for you?

vene · 2014-07-19T16:54:07Z

Writeup of our discussion:

add a test for the zero-label class being handled correctly in predict_proba
marginalize to get p(label | x) (btw how would this relate with what OvR gets?

Apart from this lgtm, 👍

arjoly · 2014-07-19T16:58:31Z

marginalize to get p(label | x) (btw how would this relate with what OvR gets?

There is an api discepancy for that classes.

arjoly · 2014-07-19T16:59:02Z

see #2451 for more information

vene · 2014-07-19T16:59:14Z

Isn't there some way of renormalizing the output of OvR to be comparable?

arjoly · 2014-07-19T22:16:02Z

Isn't there some way of renormalizing the output of OvR to be comparable?

I answer to you totally off apparently. Yes, there is a way. The one we discussed today. I am working.

I need to add tests whenever some label set are missing, i.e. label set presented at fit are not the same whenver you predict.

vene · 2014-07-20T17:32:59Z

LabelPowerSetClassifier doesn't have a classes_ attribute. Should it have one?

arjoly · 2014-07-20T22:39:00Z

LabelPowerSetClassifier doesn't have a classes_ attribute. Should it have one?

yes, it should have one

Switch from MRG to WIP since I am progressing slowly on this.

maniteja123 · 2016-03-21T12:46:15Z

Hi everyone, this seems to be almost in a completed state but not merged yet. Was there any reason for doing so ? I would also like to ask the same about classifier chain algorithm in #3727 and MultiOutput Bagging in #4848. I would like to complete these PRs in case no one else is currently working on these and if it is okay. Thanks !

amueller · 2016-10-10T23:42:51Z

@maniteja123 I think you can feel free to take this over if you like, I think @arjoly is busy.

amueller · 2019-07-14T22:51:07Z

given the lack of requests for this, I'd suggest moving this to scikit-learn-extras.

haiatn · 2023-07-29T07:40:10Z

given the lack of requests for this, I'd suggest moving this to scikit-learn-extras.

I agree. Should we close this and create an issue there?

arjoly reviewed Sep 26, 2013
View reviewed changes

amueller reviewed Nov 4, 2013
View reviewed changes

jnothman reviewed Dec 8, 2013
View reviewed changes

arjoly reviewed Dec 10, 2013
View reviewed changes

arjoly mentioned this pull request Apr 28, 2014

data-independent CV iterators #2904

Closed

arjoly changed the title ~~[MRG] Label power set multilabel classification strategy~~ [WIP] Label power set multilabel classification strategy Jul 20, 2014

larsmans force-pushed the master branch from 58a55ad to 4b82379 Compare August 25, 2014 21:50

arjoly mentioned this pull request Oct 1, 2014

[WIP] Classifier Chain for multi-label problems #3727

Closed

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

arjoly added 3 commits July 9, 2015 14:54

ENH add label power set meta-estimator for multilabel classification

7f40349

wip labelpower set predict proba

c477277

wip labelpowerset

4d9ac0b

arjoly force-pushed the labelpowerset branch from b5f1a9c to 4d9ac0b Compare July 9, 2015 12:55

wip

88bd01c

amueller added the Waiting for Reviewer label Dec 10, 2015

ChristianSch mentioned this pull request Jul 13, 2016

Extended Multi-label Classification Support EpistasisLab/tpot#196

Open

amueller added the Move to scikit-learn-extra This PR should be moved to the scikit-learn-extras repository label Jul 14, 2019

github-actions bot added the module:utils label Mar 2, 2020

cmarmo removed the Waiting for Reviewer label Sep 17, 2020

Base automatically changed from master to main January 22, 2021 10:48

adrinjalali closed this Jul 29, 2023

Uh oh!

[WIP] Label power set multilabel classification strategy #2461

[WIP] Label power set multilabel classification strategy #2461

Uh oh!

Conversation

arjoly commented Sep 20, 2013

Uh oh!

arjoly commented Sep 25, 2013

Uh oh!

arjoly Sep 26, 2013

Choose a reason for hiding this comment

Uh oh!

glouppe commented Sep 26, 2013

Uh oh!

arjoly commented Sep 26, 2013

Uh oh!

arjoly commented Sep 26, 2013

Uh oh!

rsivapr commented Sep 26, 2013

Uh oh!

arjoly commented Sep 26, 2013

Uh oh!

arjoly commented Oct 16, 2013

Uh oh!

amueller Nov 4, 2013

Choose a reason for hiding this comment

Uh oh!

amueller commented Nov 4, 2013

Uh oh!

jnothman Dec 8, 2013

Choose a reason for hiding this comment

Uh oh!

arjoly commented Dec 9, 2013

Uh oh!

arjoly commented Dec 10, 2013

Uh oh!

arjoly commented Dec 10, 2013

Uh oh!

arjoly commented Dec 10, 2013

Uh oh!

arjoly Dec 10, 2013

Choose a reason for hiding this comment

Uh oh!

vene Jul 16, 2014

Choose a reason for hiding this comment

Uh oh!

arjoly commented Jul 19, 2014

Uh oh!

coveralls commented Jul 19, 2014

Uh oh!

arjoly commented Jul 19, 2014

Uh oh!

vene commented Jul 19, 2014

Uh oh!

arjoly commented Jul 19, 2014

Uh oh!

arjoly commented Jul 19, 2014

Uh oh!

vene commented Jul 19, 2014

Uh oh!

arjoly commented Jul 19, 2014

Uh oh!

vene commented Jul 20, 2014

Uh oh!

arjoly commented Jul 20, 2014

Uh oh!

maniteja123 commented Mar 21, 2016

Uh oh!

amueller commented Oct 10, 2016

Uh oh!

amueller commented Jul 14, 2019

Uh oh!

haiatn commented Jul 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

haiatn commented Jul 29, 2023 •

edited

Loading