[FIX] Fixing Issue #3550 - hard clamping. #3758

kpysniak · 2014-10-11T14:14:26Z

Fixing issue #3550

…Spreading.

coveralls · 2014-10-11T14:23:04Z

Coverage increased (+0.0%) when pulling db37aa6 on kpysniak:propag into 0dfa4d0 on scikit-learn:master.

coveralls · 2014-10-11T14:35:03Z

Coverage increased (+0.0%) when pulling 027bf1f on kpysniak:propag into 0dfa4d0 on scikit-learn:master.

amueller · 2015-01-23T21:42:21Z

Alpha = 1 means hard clamping, right? Can you maybe add that to the docs? They are very rudimentary at the moment :-/

amueller · 2015-01-23T21:48:38Z

Do you think it makes sense to test for other values of alpha? I just looked around but it doesn't seem like the original publication had that parameter. Do you know which paper this is from?

ogrisel · 2015-03-04T15:13:58Z

sklearn/semi_supervised/label_propagation.py

@@ -420,8 +428,7 @@ def _build_graph(self):
            self.nn_fit = None
        n_samples = self.X_.shape[0]
        affinity_matrix = self._get_kernel(self.X_)
-        laplacian = graph_laplacian(affinity_matrix, normed=True)
-        laplacian = -laplacian
+        laplacian = -graph_laplacian(affinity_matrix, normed=True)


would be better to do:

laplacian = graph_laplacian(affinity_matrix, normed=True) laplacian *= -1

to spare a memory copy.

ogrisel · 2015-03-04T15:23:04Z

I am not familiar with how clamping is suppose to work in LabelPropagation and co. Could you please improve the docstring for the parameters (in particular for alpha) to explain how they impact the model?

amueller · 2015-03-04T16:54:50Z

I think we need a reference. This PR basically inverts the meaning of alpha as far as I can tell. I have no way of knowing which one is the correct variant.

ogrisel · 2015-03-05T15:38:29Z

I think we need a reference. This PR basically inverts the meaning of alpha as far as I can tell. I have no way of knowing which one is the correct variant.

The narrative doc loosely defines the meaning of alpha and indeed it does not seem to be ~~correct~~ respected by the implementation. I agree that we need to compare our implementation with some official reference for the algorithm.

amueller · 2015-03-05T20:46:11Z

Maybe @clayw can give us insights into what alpha should mean and where it is defined in the literature.

amueller · 2015-03-05T20:48:00Z

It seems to be defined here:
http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=FDD1A24102936B6FE8E329EB196E3BF4?doi=10.1.1.115.3219&rep=rep1&type=pdf
That is for label spreading, however, where alpha=1 doesn't seem to make sense.
Still looking for the parameter in label propagation.

amueller · 2015-03-05T21:39:17Z

I think LabelPropagation should not have a public alpha parameter, as it is not in the original publication and I can not find any other reference to it in this setting. So we should deprecate the public part and under the hood set it to zero. From the label spreading paper, it looks to me as if 0 (which is not allowed there) would be hard clamping.

amueller · 2015-03-05T21:59:06Z

It also looks to me like both algorithms can be implemented in closed form with single SVD (label spreading) or by solving a linear equation (label propagation), and there is no need to iterate.
Alpha in the label spreading paper is in ]0, 1[ and is the amount of information that flows. alpha=0 means don't spread any information, alpha=1 would be "replace all previous information" and both are not allowed. They use alpha=.99 in all experiments.

amueller · 2015-03-05T22:01:09Z

So todo:

Fix the docs (that document alpha wrongly)
Deprecate alpha in LabelPropagation (it should always be zero)
implement the closed-form versions

amueller · 2015-03-05T22:02:20Z

Anyone interested or should I go for it?

amueller · 2015-03-05T22:05:35Z

Also, is there a reference for the word LabelSpreading? ^^

ogrisel · 2015-03-06T14:42:22Z

Thanks for running this investigation. Here are pictures from the Semi-Supervised Learning book that is referenced in our documentation. The algorithm is indeed iterative and it references the Zhou et al. paper.

clayw · 2015-03-06T19:39:28Z

Catching up here. That is indeed the original book referenced in the docs, and it looks like there's some solution planned out. Testing could be always be more thorough to catch these issues

Are there any action items that I could take here?

amueller · 2015-03-06T19:46:35Z

Huh, ok I only looked at the papers, not the book (where is it referenced?)
Is it this one?

Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux. In Semi-Supervised Learning

I did not realize that was a reference to a book, I guess it is here:
http://www.acad.bg/ebook/ml/MITPress-%20SemiSupervised%20Learning.pdf

The papers have closed form algorithms.
I'll look at the book again.
@clayw the main action items would be to clarify the meaning of alpha (it is inconsistent between the docs and the code), clarify the meaning in LabelPropagation, and make it to be hard clamping by default.

amueller · 2015-03-06T19:50:01Z

Also, the citation should be

Bengio, Yoshua, Olivier Delalleau, and Nicolas Le Roux. "Label propagation and quadratic criterion." Semi-supervised learning 10 (2006).

Which would make it much much easier to find. I did not realize that was the reference.

amueller · 2015-03-06T19:52:54Z

@clayw the book also doesn't mention alpha for Label Propagation, right?

clayw · 2015-03-06T20:55:07Z

Alpha that's described in the literature for LabelPropagation in the book (only LabelSpreading), but I don't think it's such a crazy thing to support in this implementation. However, if there's no hard requirement to keep it around, then it can be deprecated.

Thanks for fixing the code to catch those cases and sprucing up the docs. Can you run the example at http://scikit-learn.org/stable/auto_examples/semi_supervised/plot_label_propagation_digits.html and verify that there aren't any performance issues with this change?

amueller · 2015-03-06T22:08:07Z

Hm, there is actually an alpha in Algorithm 11.2., is that what you were implementing?
Does that recover Algorithm 11.1 for a setting of alpha?

amueller · 2015-03-06T22:12:21Z

The docs say

The LabelPropagation algorithm performs hard clamping of input labels, which means \alpha=1. This clamping factor can be relaxed, to say \alpha=0.8, which means that we will always retain 80 percent of our original label distribution, but the algorithm gets to change it’s confidence of the distribution within 20 percent.

Can you point me to where this is explained in the paper?
link: http://www.iro.umontreal.ca/~lisa/pointeurs/bengio_ssl.pdf

bryandeng · 2015-03-25T15:00:24Z

@amueller The three TODOs look like ideal tasks for me.

I'll open a new issue for the closed form version. We can further discuss there.
And I now start working on the two smaller TODOs.

amueller · 2015-03-25T20:41:19Z

Go for it :)
I think the most urgent part is making sure label propagation does hard clamping by default and that the documentation is consistent with the meaning of alpha.

bryandeng · 2015-03-25T21:04:59Z

Got it.

boechat107 · 2016-04-20T13:06:53Z

@kpysniak, do you plan to finish this PR?

jnothman · 2016-08-31T14:32:17Z

@kpysniak, do you plan to finish this PR?

I think we can assume the answer is "no". Takers?

maniteja123 · 2016-08-31T16:38:55Z

Can I take it up ? I am not an expert but will try my best. Thanks.

boechat107 · 2016-08-31T17:17:55Z

I have proposed a solution on #6727 months ago.

maniteja123 · 2016-08-31T17:25:43Z

Hi, sorry for missing the link to the PR. Thanks for letting me know.

jnothman · 2016-08-31T21:33:53Z

Great, @boechat107 sorry we missed it.

boechat107 · 2016-08-31T21:55:55Z

No worries, guys
;-)

jnothman · 2017-07-05T08:26:34Z

Fixed in #9239

kpysniak added 2 commits October 11, 2014 15:16

FIX: clamping was not handled correctly in LabelPropagation.

f4ee6ca

Test: Added unit test for hard clamping in LabelPropagation and Label…

7f5244b

…Spreading.

kpysniak force-pushed the propag branch from db37aa6 to 7f5244b Compare October 11, 2014 14:24

Added newline at the end of test_label_propagation.py

8ed15ef

kpysniak force-pushed the propag branch from 027bf1f to 8ed15ef Compare October 11, 2014 14:42

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

amueller added the Bug label Jan 22, 2015

amueller added this to the 0.16 milestone Jan 22, 2015

ogrisel reviewed Mar 4, 2015
View reviewed changes

amueller mentioned this pull request Mar 25, 2015

Closed form algorithms for semi-supervised learning #4449

Closed

amueller modified the milestones: 0.16, 0.17 Sep 11, 2015

amueller modified the milestones: 0.18, 0.17 Sep 20, 2015

This was referenced Apr 21, 2016

BUG: clamping is implemented incorrectly in sklearn.semi_supervised.LabelSpreading #5774

Closed

[MRG] Fixing label clamping for LabelPropagation #6727

Closed

jnothman added the Need Contributor label Aug 31, 2016

jnothman removed the Need Contributor label Aug 31, 2016

jnothman removed this from the 0.18 milestone Aug 31, 2016

jnothman mentioned this pull request Jun 21, 2017

[WIP] Try finish #6727: alpha deprecation in LabelPropagation #9192

Closed

musically-ut mentioned this pull request Jun 28, 2017

[MRG+1] Fix semi_supervised #9239

Merged

jnothman closed this Jul 5, 2017

Uh oh!

[FIX] Fixing Issue #3550 - hard clamping. #3758

[FIX] Fixing Issue #3550 - hard clamping. #3758

Uh oh!

Conversation

kpysniak commented Oct 11, 2014

Uh oh!

coveralls commented Oct 11, 2014

Uh oh!

coveralls commented Oct 11, 2014

Uh oh!

amueller commented Jan 23, 2015

Uh oh!

amueller commented Jan 23, 2015

Uh oh!

ogrisel Mar 4, 2015

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Mar 4, 2015

Uh oh!

amueller commented Mar 4, 2015

Uh oh!

ogrisel commented Mar 5, 2015

Uh oh!

amueller commented Mar 5, 2015

Uh oh!

amueller commented Mar 5, 2015

Uh oh!

amueller commented Mar 5, 2015

Uh oh!

amueller commented Mar 5, 2015

Uh oh!

amueller commented Mar 5, 2015

Uh oh!

amueller commented Mar 5, 2015

Uh oh!

amueller commented Mar 5, 2015

Uh oh!

ogrisel commented Mar 6, 2015

Uh oh!

clayw commented Mar 6, 2015

Uh oh!

amueller commented Mar 6, 2015

Uh oh!

amueller commented Mar 6, 2015

Uh oh!

amueller commented Mar 6, 2015

Uh oh!

clayw commented Mar 6, 2015

Uh oh!

amueller commented Mar 6, 2015

Uh oh!

amueller commented Mar 6, 2015

Uh oh!

bryandeng commented Mar 25, 2015

Uh oh!

amueller commented Mar 25, 2015

Uh oh!

bryandeng commented Mar 25, 2015

Uh oh!

boechat107 commented Apr 20, 2016

Uh oh!

jnothman commented Aug 31, 2016

Uh oh!

maniteja123 commented Aug 31, 2016

Uh oh!

boechat107 commented Aug 31, 2016

Uh oh!

maniteja123 commented Aug 31, 2016

Uh oh!

jnothman commented Aug 31, 2016

Uh oh!

boechat107 commented Aug 31, 2016

Uh oh!

jnothman commented Jul 5, 2017

Uh oh!

Uh oh!