[MRG] Fixing label clamping for LabelPropagation #6727

boechat107 · 2016-04-27T20:16:13Z

Referenced issues

This PR comes to fix #3550 and #5774. It's similar to PR #3758, which seems to be abandoned.

Important note: my development files hack-dev/* would be removed before merging this PR.

Explanations

From the comments in the PR #3758, this PR improves the documentation of the parameter alpha, deprecates it for LabelPropagation and fix the label clamping (the credit is all to @musically-ut).

From [1] and [3], we can see that LabelPropagation doesn't have a clamping factor. In the case of LabelSpreading, from [2] and [3] we have

which means that alpha = 0 keeps the initial label information Ŷ(0). As suggested by @amueller in #3758, this should be the only possible value for LabelPropagation.

In the current implementation, line 239, alpha = 0 would not propagate label information to the unlabeled instances, but would propagate label information to the labeled instances. I saw this exact behavior with my data, but I couldn't find yet a simple test case to catch it.

Some additional comments

In #5774, @musically-ut discussed a very important point about the current implementation, the graph construction. There is no mention about directed graphs in the referenced papers, but it's what the current implementation does (line 137). Maybe a specific issue should be created for this.

Referenced papers

[1] Zhu, Xiaojin, and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.
[2] Zhou, Dengyong, et al. "Learning with local and global consistency." Advances in neural information processing systems 16.16 (2004): 321-328.
[3] Bengio, Yoshua, Olivier Delalleau, and Nicolas Le Roux. "Label propagation and quadratic criterion." Semi-supervised learning (2006): 193-216.

The previous way was breaking the test sklearn.tests.test_common.test_all_estimators

Based on the original paper.

amueller · 2016-08-31T17:32:38Z

Sorry for not reviewing this so far. I think it's an important fix. The maintainers are pretty busy :(

boechat107 · 2016-08-31T17:38:38Z

No worries, I understand. ;-)

jnothman · 2016-09-15T00:06:23Z

This needs some love soon.

amueller · 2016-12-07T23:14:12Z

Can you elaborate on the directed graph issue? The construction creates a symmetric graph matrix, right?

musically-ut · 2016-12-07T23:19:37Z

@amueller I don't think the default implementation of kneighbors_graph creates symmetric graphs (it would be very odd if it did, since k-neighbors may not be symmetric).

I corrected by making the connectivity matrix symmetric explicitly: https://github.com/musically-ut/semi_supervised/blob/master/semi_supervised/label_propagation.py#L136

Does this answer your question?

amueller

Can you please add the test suggested in #5774 (at least)?

amueller · 2016-12-07T22:34:31Z

sklearn/semi_supervised/label_propagation.py

@@ -291,7 +292,7 @@ class LabelPropagation(BaseLabelPropagation):
        Parameter for knn kernel

    alpha : float
-        Clamping factor
+        DEPRECATED: It's internally fixed to zero and it'll be removed in 0.19


Deprecated in 0.19 and removed in 0.21. You should raise a deprecation warning if it is set.

amueller · 2016-12-07T22:36:10Z

sklearn/semi_supervised/label_propagation.py

+                 alpha=.0, max_iter=30, tol=1e-3, n_jobs=1):
+        # alpha is deprecated for LabelPropagation because it doesn't have any
+        # theoretical meaning (from the reference paper).
+        alpha = .0


you can just put alpha=0 in the line below, and set alpha above to None and if alpha is not None raise a DeprecationWarning.

amueller · 2016-12-07T23:25:50Z

@musically-ut yeah, sorry, of course. I was thinking about distance graphs... getting late over here...

amueller · 2016-12-07T23:28:22Z

We might want to think about renaming alpha to something more intuitive to force users to rethink what they're doing....

musically-ut · 2016-12-07T23:28:24Z

@amueller, no worries. Great to see progress on this though!

musically-ut · 2016-12-07T23:30:22Z

@amueller I'm not inclined to change the name of the parameter, it is called alpha in all the papers regarding semi-supervised learning that I have seen.

amueller · 2016-12-07T23:30:34Z

@musically-ut we should have fixed this a looong long time ago. Do you want to open a separate issue about symmetry?

musically-ut · 2016-12-07T23:32:45Z

@amueller Sure, I can do that.

IIRC, I wasn't sure what the option y was doing and we were facing a dearth of test cases which would fail in one case while work in the other, to avoid regression.

amueller · 2016-12-07T23:34:42Z

We usually try to avoid Greek letter parameters and prefer natural language parameters, though not everywhere (if I'd rewrite the linear models, I'd call alpha "regularization"). I don't have a strong opinion though.

The question is whether the parameters is called alpha and the docs say clamping factor or the parameter is called clamping_factor and the docs say alpha in the literature.
Actually "clamping factor" is a pretty bad description because it's the inverse of clamping. Maybe "spreading factor"?

musically-ut · 2016-12-07T23:49:17Z

@amueller spreading_factor sounds okay. However, maintaining backwards compatibility in this case will become slightly tricky.

amueller · 2016-12-07T23:54:29Z

Deprecating parameters is not really tricky, but I'm fine with keeping alpha. I was also thinking about the docs, which currently say "clamping factor" which is bad.

boechat107 · 2016-12-08T11:09:24Z

I'm going to apply the suggestions regarding the deprecation of alpha, but I'll leave it like alpha for now.

boechat107 · 2016-12-08T19:07:30Z

Now that the tests are failing, I remember why I didn't use alpha=None.

Suggestions?

amueller · 2016-12-08T19:46:04Z

sklearn/semi_supervised/label_propagation.py

+                "Deprecated in 0.19 and it's going to be removed in 0.21.",
+                DeprecationWarning
+            )
+        alpha = .0


don't put this here. Put it where it's used

Ok, I'll set the alpha of the line below to .0.

But what about the alpha of the constructor above? You said it should be set to None, but the tests fail because the two constructors have different default values (None and 0).

What am I missing?

It shouldn't be changed in the constructor, it should be replaced in fit

Should the fit function behave differently for LabelPropagation and LabelSpreading? I mean, should they have different implementations of fit?

It's a bit unfortunate but it looks like LabelPropagation needs to get it's own fit which calls
super(LabelPropagation, self).fit. But before, it checks whether alpha is set and raises a deprecation warning.
Then in BaseLabelPropagation you need to introduce a local variable alpha that has the right value. One way to do this is to check isinstance(self, LabelSpreading) - which is not great.
The other would be so set self._alpha = 0 in LabelPropagation.fit
and do alpha = getattr(self, "_alpha", self.alpha) in BaseLabelPropagation.fit.

If you have a nicer way, that's also fine ;)

Considering the testing restrictions, I can't see a any better solution. I chose to adopt your second suggestion.

This solution isn't great, but it sets the correct value for alpha without violating the restrictions imposed by the tests.

jnothman

Ideally, we would have a test that the DeprecationWarning fires when it is meant to and when not.

Please add a bug fix entry to what's new.

jnothman · 2017-01-09T13:12:44Z

sklearn/semi_supervised/label_propagation.py

@@ -291,7 +295,7 @@ class LabelPropagation(BaseLabelPropagation):
        Parameter for knn kernel

    alpha : float
-        Clamping factor
+        DEPRECATED: Deprecated in 0.19 and it's going to be removed in 0.21.


can use sphinx's .. deprecated

OK, I'll check it out.

jnothman · 2017-01-09T13:16:12Z

sklearn/semi_supervised/label_propagation.py

+        # theoretical meaning (from the reference paper). Look at PR 6727.
+        if self.alpha is not None:
+            warnings.warn(
+                "Deprecated in 0.19 and it's going to be removed in 0.21.",


State "Parameter alpha was ... and will be ..."

jnothman · 2017-01-09T13:19:24Z

sklearn/semi_supervised/label_propagation.py

+            warnings.warn(
+                "Deprecated in 0.19 and it's going to be removed in 0.21.",
+                DeprecationWarning
+            )


Should we be setting _alpha to the user-provided alpha until deprecation is compeleted? Otherwise our deprecation warning should state that alpha is being ignored.

In the case of LabelPropagation, setting a different alpha value is not something supported by the algorithm's description in the cited references.

jnothman · 2017-01-09T13:21:29Z

sklearn/semi_supervised/label_propagation.py

+    def fit(self, X, y):
+        # alpha is deprecated for LabelPropagation because it doesn't have any
+        # theoretical meaning (from the reference paper). Look at PR 6727.
+        if self.alpha is not None:


you need to implement __init__ such that alpha=None by default and this message only appears when a non-default value is used.

To follow your suggestion and pass the tests, I would need to change the __init__ of BaseLabelPropagation, which is probably not what we want since LabelSpreading can really use different alpha values.

This problem was discussed with @amueller before, in his code review.

Do you have another suggestion?

I don't see how. I have no problem putting a new __init__ in LabelPropagation which passes alpha=None to BaseLabelPropagation.__init__, while still passing the tests.

jnothman

I'd like to see this merged in the coming release!

jnothman · 2017-05-29T03:54:14Z

sklearn/semi_supervised/label_propagation.py

+    def fit(self, X, y):
+        # alpha is deprecated for LabelPropagation because it doesn't have any
+        # theoretical meaning (from the reference paper). Look at PR 6727.
+        if self.alpha is not None:


I don't see how. I have no problem putting a new __init__ in LabelPropagation which passes alpha=None to BaseLabelPropagation.__init__, while still passing the tests.

jnothman · 2017-05-29T03:56:01Z

sklearn/semi_supervised/label_propagation.py

@@ -381,7 +402,10 @@ class LabelSpreading(BaseLabelPropagation):
      parameter for knn kernel

    alpha : float
-      clamping factor
+      Clamping factor [0, 1], it specifies the relative amount of the


A bit more verbose for clarity: "Clamping factor, in [0, 1], specifies the extent to which a sample's label should derive from its neighbors in preference to its initial label."

By a new __init__ do you mean something like this?

What should be the default value for alpha in BaseLabelPropagation.__init__? What about LabelPropagation.__init__? The tests were failing because I ended up with alpha=1 in BaseLabelPropagation.__init__ and alpha=None in LabelPropagation.__init__, i.e., this constructor set a different default value for alpha than the former.

jnothman · 2017-05-29T11:58:59Z

I've not yet looked at test failures or otherwise, but I'd leave alpha=None until fit, or even just remove it from base's init if it's not relevant to all subclasses

…

On 29 May 2017 8:53 pm, "Andre Boechat" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In sklearn/semi_supervised/label_propagation.py <#6727 (comment)> : > @@ -381,7 +402,10 @@ class LabelSpreading(BaseLabelPropagation): parameter for knn kernel alpha : float - clamping factor + Clamping factor [0, 1], it specifies the relative amount of the By a new __init__ do you mean something like this <f609105> ? What should be the default value for alpha in BaseLabelPropagation.__init__? What about LabelPropagation.__init__? The tests were failing because I ended up with alpha=1 in BaseLabelPropagation.__init__ and alpha=None in LabelPropagation.__init__, i.e., this constructor set a different default value for alpha than the former. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#6727 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6_f4VqVwRMB0AHMXG9HOe0Yxuhq-ks5r-qOogaJpZM4IRT8O> .

jnothman · 2017-05-29T23:40:33Z

The point is that you should still not have to modify anything in init, and leave warning until fit.

boechat107 · 2017-05-30T04:50:57Z

If I don't modify __init__, the warning on LabelPropagation.fit will be echoed by default. Is this what you meant?

jnothman · 2017-05-30T06:23:54Z

i was suggesting you do modify init but don't modify the parameter before setting the attribute

…

On 30 May 2017 2:51 pm, "Andre Boechat" ***@***.***> wrote: If I don't modify __init__, the warning on LabelPropagation.fit will be echoed by default. Is this what you meant? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#6727 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6-jkNQC4VcQZUwnP5v6yjGMkog-eks5r-6AygaJpZM4IRT8O> .

Changes to fixing scikit-learn#5774 (label clamping)

jnothman · 2017-06-13T13:01:30Z

It seems you'll need to resolve a merge conflict.

I've now patched this to my liking. Could we get at least one other review? @amueller?

boechat107 · 2017-06-13T16:54:01Z

Yes, I noticed. I'm gonna fix it.

jnothman · 2017-06-14T00:53:59Z

sklearn/semi_supervised/tests/test_label_propagation.py

+    X, y = make_classification(n_samples=100)
+    y[::3] = -1
+    lp_default = label_propagation.LabelPropagation()
+    lp_default_y = assert_no_warnings(lp_default.fit, X, y).transduction_


This is getting a warning (although not the one we're concerned about) and the assertion fails:

====================================================================== FAIL: sklearn.semi_supervised.tests.test_label_propagation.test_alpha_deprecation ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/travis/miniconda/envs/testenv/lib/python3.6/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/home/travis/build/scikit-learn/scikit-learn/sklearn/semi_supervised/tests/test_label_propagation.py", line 71, in test_alpha_deprecation lp_default_y = assert_no_warnings(lp_default.fit, X, y).transduction_ File "/home/travis/build/scikit-learn/scikit-learn/sklearn/utils/testing.py", line 230, in assert_no_warnings ', '.join(str(warning) for warning in w))) AssertionError: Got warnings when calling fit: [{message : RuntimeWarning('underflow encountered in exp',), category : 'RuntimeWarning', filename : '/home/travis/build/scikit-learn/scikit-learn/sklearn/metrics/pairwise.py', lineno : 837, line : None}]

I've realised that I didn't fix the random_state in make_classification. We should do that. But I also haven't investigated why we should be getting exp underflow. Do you know?

It is not something that comes to mind right now. I'll have look at it.

I don't seem to get that warning on my system, but all Travis runs get it.

Could I suggest you try setting the kernel='knn' in the test here, and we'll see if that works?

jnothman · 2017-06-21T00:27:27Z

I've made a couple of finishing touches at #9192.

ogrisel · 2017-06-30T15:41:34Z

Closing in favor of #9239.

boechat107 added 6 commits April 20, 2016 16:14

Files for my dev environment with Docker

f37cff0

Fixing label clamping (alpha=0 for hard clamping)

f725281

Deprecating alpha, fixing its value to zero

f609105

Correct way to deprecate alpha for LabelPropagation

3c4f627

The previous way was breaking the test sklearn.tests.test_common.test_all_estimators

Detailed info for LabelSpreading's alpha parameter

2499098

Based on the original paper.

Minor changes in the deprecation message

2c0645b

boechat107 mentioned this pull request Aug 31, 2016

[FIX] Fixing Issue #3550 - hard clamping. #3758

Closed

amueller added the Waiting for Reviewer label Aug 31, 2016

amueller added this to the 0.19 milestone Aug 31, 2016

amueller added the Bug label Dec 6, 2016

amueller reviewed Dec 7, 2016

View reviewed changes

musically-ut mentioned this pull request Dec 7, 2016

[BUG] LabelPropagation should use undirected graphs for knn kernel. #8008

Open

Improving "deprecated" doc string and raising DeprecationWarning

606d65e

amueller reviewed Dec 8, 2016

View reviewed changes

boechat107 added 2 commits December 12, 2016 15:16

Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

7b267a8

This solution isn't great, but it sets the correct value for alpha without violating the restrictions imposed by the tests.

Removal of my development files

bd1a06c

jnothman reviewed Jan 9, 2017

View reviewed changes

jnothman modified the milestones: 0.18.2, 0.19 Jan 9, 2017

boechat107 added 2 commits January 9, 2017 14:51

Using sphinx's "deprecated" tag (jnothman's suggestion)

2662196

Deprecation warning: stating that the alpha's value will be ignored

551feec

amueller modified the milestones: 0.18.2, 0.19 Jan 11, 2017

amueller added the Blocker label Jan 11, 2017

jnothman reviewed May 29, 2017

View reviewed changes

jnothman and others added 4 commits June 13, 2017 00:15

Use __init__ with alpha=None

91b7f9a

Merge branch 'master' into lpalpha

c5b515e

Update what's new

69b3e89

Merge pull request #2 from jnothman/lpalpha

297c16b

Changes to fixing scikit-learn#5774 (label clamping)

Merge branch 'master' into issue-5774

95f73ef

jnothman reviewed Jun 14, 2017

View reviewed changes

jnothman mentioned this pull request Jun 21, 2017

[WIP] Try finish #6727: alpha deprecation in LabelPropagation #9192

Closed

musically-ut mentioned this pull request Jun 28, 2017

[MRG+1] Fix semi_supervised #9239

Merged

ogrisel closed this Jun 30, 2017

Uh oh!

[MRG] Fixing label clamping for LabelPropagation #6727

[MRG] Fixing label clamping for LabelPropagation #6727

Uh oh!

Conversation

boechat107 commented Apr 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Referenced issues

Explanations

Some additional comments

Referenced papers

Uh oh!

amueller commented Aug 31, 2016

Uh oh!

boechat107 commented Aug 31, 2016

Uh oh!

jnothman commented Sep 15, 2016

Uh oh!

amueller commented Dec 7, 2016

Uh oh!

musically-ut commented Dec 7, 2016

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Dec 7, 2016

Uh oh!

amueller commented Dec 7, 2016

Uh oh!

musically-ut commented Dec 7, 2016

Uh oh!

musically-ut commented Dec 7, 2016

Uh oh!

amueller commented Dec 7, 2016

Uh oh!

musically-ut commented Dec 7, 2016

Uh oh!

amueller commented Dec 7, 2016

Uh oh!

musically-ut commented Dec 7, 2016

Uh oh!

amueller commented Dec 7, 2016

Uh oh!

boechat107 commented Dec 8, 2016

Uh oh!

boechat107 commented Dec 8, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

boechat107 commented Apr 27, 2016 •

edited

Loading

boechat107 Jan 9, 2017 •

edited

Loading

boechat107 commented Jun 13, 2017 •

edited

Loading