-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[WIP] Try finish #6727: alpha deprecation in LabelPropagation #9192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The previous way was breaking the test sklearn.tests.test_common.test_all_estimators
Based on the original paper.
This solution isn't great, but it sets the correct value for alpha without violating the restrictions imposed by the tests.
Changes to fixing scikit-learn#5774 (label clamping)
@boechat107, it looks like |
I've made another couple of small documentation fixes here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor comment related to narrative docs.
@@ -378,6 +378,11 @@ Bug fixes | |||
- Fix bug where stratified CV splitters did not work with | |||
:class:`linear_model.LassoCV`. :issue:`8973` by `Paulo Haddad <paulochf>`. | |||
|
|||
- Fix :class:`semi_supervised.LabelPropagation` to always do hard clamping. | |||
Its ``alpha`` parameter now defaults to 0 and the parameter is deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, the algorithm and the documentation says that alpha
is the percentage of the initial class distribution that will be maintained. Quoting the documentation.
"The LabelPropagation algorithm performs hard clamping of input labels, which means \alpha=1."
In that case, why does this default to zero assuming the definition of alpha is the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, I just read the documentation below. Can you please change the narrative documentation so that there is no confusion.
Maybe I need to check a bit further that the semantics of this fix is right. I feel what's new is under-stating the change particularly because of the change from |
Okay, so I understand why the test was failing: part of it was the implementation of the algorithm and part of it was misinterpretation of The test should have looked like the following: n_classes = 2
X, y = make_classification(n_classes=n_classes, n_samples=200, random_state=0)
y[::3] = -1
clf = SS.label_propagation.LabelSpreading().fit(X, y)
# adopting notation from Zhou et al:
# W = clf._build_graph()
# D = np.diag(W.sum(axis=1))
# Dinvroot = scipy.linalg.sqrtm(np.linalg.inv(D))
# S = np.dot(np.dot(Dinvroot, W), Dinvroot)
S = clf._build_graph()
Y = np.zeros((len(y), n_classes + 1))
Y[np.arange(len(y)), y] = 1
Y = Y[:, :-1]
for alpha in [0.1, 0.3, 0.5, 0.7, 0.9]:
expected = np.dot(np.linalg.inv(np.eye(len(S)) - alpha * S), Y)
expected /= expected.sum(axis=1)[:, np.newaxis]
clf = SS.label_propagation.LabelSpreading(max_iter=10000, alpha=alpha)
clf.fit(X, y)
assert_array_almost_equal(expected, clf.label_distributions_, 4) That is, the Then, the "actual" algorithm talked about in the paper is the following (our modifications are commented out): # clamp_weights = np.ones((n_samples, 1))
# clamp_weights[~unlabeled, 0] = alpha
# TODO TESTING
clamp_weights = alpha * np.ones((n_samples, 1))
# ...
if alpha > 0.:
y_static *= 1 - alpha
# TODO TESTING
# y_static[unlabeled] = 0 I have this version implemented in this branch of my local fork; one can check out that branch and verify that the sanity-check test does succeed in that case. I can sort of see that the modifications we have made to the algorithm make sense. Similar guarantees probably can be worked out for this modified version as well, but I would be far more comfortable just using the version from the paper or finding a reference rather than use unpublished versions. What do you think? |
Yes, confusing W for S would make a lot of sense. I didn't read into
_build_graph.
I would rather implement what they have in their paper. Let's apply alpha
to all. But then we also need to have an explicit ValueError in the case
that alpha=0, or we need to handle it specially as a limiting case,
particularly if LabelPropagation relies on it.
So we could say:
* alpha > 0: F = alpha * F + (1 - alpha) * Y
* alpha = 0: F = {F[i] if i unlabelled else Y[i]}
Any harm keeping soft clamping available for LabelPropagation?
…On 25 June 2017 at 23:09, Utkarsh Upadhyay ***@***.***> wrote:
Okay, so I understand why the test was failing: part of it was the
implementation of the algorithm and part of it was misinterpretation of
clf._build_graph() function.
The test should have looked like the following:
n_classes = 2
X, y = make_classification(n_classes=n_classes, n_samples=200, random_state=0)
y[::3] = -1
clf = SS.label_propagation.LabelSpreading().fit(X, y)
# adopting notation from Zhou et al:# W = clf._build_graph()# D = np.diag(W.sum(axis=1))# Dinvroot = scipy.linalg.sqrtm(np.linalg.inv(D))# S = np.dot(np.dot(Dinvroot, W), Dinvroot)
S = clf._build_graph()
Y = np.zeros((len(y), n_classes + 1))
Y[np.arange(len(y)), y] = 1
Y = Y[:, :-1]for alpha in [0.1, 0.3, 0.5, 0.7, 0.9]:
expected = np.dot(np.linalg.inv(np.eye(len(S)) - alpha * S), Y)
expected /= expected.sum(axis=1)[:, np.newaxis]
clf = SS.label_propagation.LabelSpreading(max_iter=10000, alpha=alpha)
clf.fit(X, y)
assert_array_almost_equal(expected, clf.label_distributions_, 4)
That is, the clf._build_graph() function directly returns S instead of
returning W.
Then, the "actual" algorithm talked about in the paper
<http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.3219> is the
following (our modifications are commented out):
# clamp_weights = np.ones((n_samples, 1))
# clamp_weights[~unlabeled, 0] = alpha
# TODO TESTING
clamp_weights = alpha * np.ones((n_samples, 1))
# ...
if alpha > 0.:
y_static *= 1 - alpha
# TODO TESTING
# y_static[unlabeled] = 0
I have this version implemented in this branch
<https://github.com/musically-ut/scikit-learn/blob/tmp-semi-supervised/sklearn/semi_supervised/label_propagation.py#L251>
of my local fork; one can check out that branch and verify that the
sanity-check test does succeed in that case.
------------------------------
I can *sort of* see that the modifications we have made to the algorithm
make sense. Similar guarantees *probably* can be worked out for this
modified version as well, but I would be far more comfortable just using
the version from the paper or finding a reference rather than use
unpublished versions.
What do you think?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9192 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz63D5y_ZgPPPUr4INZ_G5fe6ort9iks5sHlv1gaJpZM4OATgT>
.
|
Thanks for this great investigation, Manoj. I really appreciate being able
to close a chapter on another dodgy implementation...
…On 25 June 2017 at 23:27, Joel Nothman ***@***.***> wrote:
Yes, confusing W for S would make a lot of sense. I didn't read into
_build_graph.
I would rather implement what they have in their paper. Let's apply alpha
to all. But then we also need to have an explicit ValueError in the case
that alpha=0, or we need to handle it specially as a limiting case,
particularly if LabelPropagation relies on it.
So we could say:
* alpha > 0: F = alpha * F + (1 - alpha) * Y
* alpha = 0: F = {F[i] if i unlabelled else Y[i]}
Any harm keeping soft clamping available for LabelPropagation?
On 25 June 2017 at 23:09, Utkarsh Upadhyay ***@***.***>
wrote:
> Okay, so I understand why the test was failing: part of it was the
> implementation of the algorithm and part of it was misinterpretation of
> clf._build_graph() function.
>
> The test should have looked like the following:
>
> n_classes = 2
> X, y = make_classification(n_classes=n_classes, n_samples=200, random_state=0)
> y[::3] = -1
> clf = SS.label_propagation.LabelSpreading().fit(X, y)
> # adopting notation from Zhou et al:# W = clf._build_graph()# D = np.diag(W.sum(axis=1))# Dinvroot = scipy.linalg.sqrtm(np.linalg.inv(D))# S = np.dot(np.dot(Dinvroot, W), Dinvroot)
>
> S = clf._build_graph()
> Y = np.zeros((len(y), n_classes + 1))
> Y[np.arange(len(y)), y] = 1
> Y = Y[:, :-1]for alpha in [0.1, 0.3, 0.5, 0.7, 0.9]:
> expected = np.dot(np.linalg.inv(np.eye(len(S)) - alpha * S), Y)
> expected /= expected.sum(axis=1)[:, np.newaxis]
> clf = SS.label_propagation.LabelSpreading(max_iter=10000, alpha=alpha)
> clf.fit(X, y)
> assert_array_almost_equal(expected, clf.label_distributions_, 4)
>
> That is, the clf._build_graph() function directly returns S instead of
> returning W.
>
> Then, the "actual" algorithm talked about in the paper
> <http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.3219> is the
> following (our modifications are commented out):
>
> # clamp_weights = np.ones((n_samples, 1))
> # clamp_weights[~unlabeled, 0] = alpha
>
> # TODO TESTING
> clamp_weights = alpha * np.ones((n_samples, 1))
>
> # ...
>
> if alpha > 0.:
> y_static *= 1 - alpha
> # TODO TESTING
> # y_static[unlabeled] = 0
>
> I have this version implemented in this branch
> <https://github.com/musically-ut/scikit-learn/blob/tmp-semi-supervised/sklearn/semi_supervised/label_propagation.py#L251>
> of my local fork; one can check out that branch and verify that the
> sanity-check test does succeed in that case.
> ------------------------------
>
> I can *sort of* see that the modifications we have made to the algorithm
> make sense. Similar guarantees *probably* can be worked out for this
> modified version as well, but I would be far more comfortable just using
> the version from the paper or finding a reference rather than use
> unpublished versions.
>
> What do you think?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#9192 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAEz63D5y_ZgPPPUr4INZ_G5fe6ort9iks5sHlv1gaJpZM4OATgT>
> .
>
|
I think I might have been a bit rough in my notation above. And I'm still a
bit uneasy about it.
…On 25 June 2017 at 23:28, Joel Nothman ***@***.***> wrote:
Thanks for this great investigation, Manoj. I really appreciate being able
to close a chapter on another dodgy implementation...
On 25 June 2017 at 23:27, Joel Nothman ***@***.***> wrote:
> Yes, confusing W for S would make a lot of sense. I didn't read into
> _build_graph.
>
> I would rather implement what they have in their paper. Let's apply alpha
> to all. But then we also need to have an explicit ValueError in the case
> that alpha=0, or we need to handle it specially as a limiting case,
> particularly if LabelPropagation relies on it.
>
> So we could say:
> * alpha > 0: F = alpha * F + (1 - alpha) * Y
> * alpha = 0: F = {F[i] if i unlabelled else Y[i]}
>
> Any harm keeping soft clamping available for LabelPropagation?
>
> On 25 June 2017 at 23:09, Utkarsh Upadhyay ***@***.***>
> wrote:
>
>> Okay, so I understand why the test was failing: part of it was the
>> implementation of the algorithm and part of it was misinterpretation of
>> clf._build_graph() function.
>>
>> The test should have looked like the following:
>>
>> n_classes = 2
>> X, y = make_classification(n_classes=n_classes, n_samples=200, random_state=0)
>> y[::3] = -1
>> clf = SS.label_propagation.LabelSpreading().fit(X, y)
>> # adopting notation from Zhou et al:# W = clf._build_graph()# D = np.diag(W.sum(axis=1))# Dinvroot = scipy.linalg.sqrtm(np.linalg.inv(D))# S = np.dot(np.dot(Dinvroot, W), Dinvroot)
>>
>> S = clf._build_graph()
>> Y = np.zeros((len(y), n_classes + 1))
>> Y[np.arange(len(y)), y] = 1
>> Y = Y[:, :-1]for alpha in [0.1, 0.3, 0.5, 0.7, 0.9]:
>> expected = np.dot(np.linalg.inv(np.eye(len(S)) - alpha * S), Y)
>> expected /= expected.sum(axis=1)[:, np.newaxis]
>> clf = SS.label_propagation.LabelSpreading(max_iter=10000, alpha=alpha)
>> clf.fit(X, y)
>> assert_array_almost_equal(expected, clf.label_distributions_, 4)
>>
>> That is, the clf._build_graph() function directly returns S instead of
>> returning W.
>>
>> Then, the "actual" algorithm talked about in the paper
>> <http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.3219> is
>> the following (our modifications are commented out):
>>
>> # clamp_weights = np.ones((n_samples, 1))
>> # clamp_weights[~unlabeled, 0] = alpha
>>
>> # TODO TESTING
>> clamp_weights = alpha * np.ones((n_samples, 1))
>>
>> # ...
>>
>> if alpha > 0.:
>> y_static *= 1 - alpha
>> # TODO TESTING
>> # y_static[unlabeled] = 0
>>
>> I have this version implemented in this branch
>> <https://github.com/musically-ut/scikit-learn/blob/tmp-semi-supervised/sklearn/semi_supervised/label_propagation.py#L251>
>> of my local fork; one can check out that branch and verify that the
>> sanity-check test does succeed in that case.
>> ------------------------------
>>
>> I can *sort of* see that the modifications we have made to the
>> algorithm make sense. Similar guarantees *probably* can be worked out
>> for this modified version as well, but I would be far more comfortable just
>> using the version from the paper or finding a reference rather than use
>> unpublished versions.
>>
>> What do you think?
>>
>> —
>> You are receiving this because you authored the thread.
>> Reply to this email directly, view it on GitHub
>> <#9192 (comment)>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/AAEz63D5y_ZgPPPUr4INZ_G5fe6ort9iks5sHlv1gaJpZM4OATgT>
>> .
>>
>
>
|
Update: No, it does make sense. Rethinking. While we are at it, I also recommend adding a warning if the method did not converge after |
I think Y[i] is supplied by the user where i is labelled. The idea here is
that alpha=0 corresponds to hard clamping.
…On 26 June 2017 at 00:01, Utkarsh Upadhyay ***@***.***> wrote:
So we could say:
- alpha > 0: F = alpha * F + (1 - alpha) * Y
- alpha = 0: F = {F[i] if i unlabelled else Y[i]}
Err, I don't think the last case makes sense because the initial Y[i] are
not supplied by the user (they are all 0s). It will be weird to return this
without any hint of something having any hint of a warning/error.
Personally, throwing a ValueError makes more sense.
While we are at it, I also recommend adding a warning if the method did
not converge after max_iterations, akin to this
<https://github.com/musically-ut/semi_supervised/blob/master/semi_supervised/label_propagation.py#L271>
.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9192 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67tLFX_skYt2WtoQl6yOy1EQgRYJks5sHmghgaJpZM4OATgT>
.
|
Okay, my suggestion is that we disallow both the extreme values of For |
Actually, I can use some help; my numpy matrix-fu might be wrong somewhere. I'm trying to replicate the calculations done in eqn. (12) in the reference for Am I doing the correct things here? n_classes = 2
X, y = make_classification(n_classes=n_classes, n_samples=200, random_state=0)
y[::3] = -1
# Using Zhu.' 2002 notation:
clf = label_propagation.LabelPropagation().fit(X, y)
T_bar = clf._build_graph()
Y = np.zeros((len(y), n_classes + 1))
Y[np.arange(len(y)), y] = 1
unlabelled_idx = Y[:, (-1,)].nonzero()[0]
labelled_idx = (Y[:, (-1,)] == 0).nonzero()[0]
Tuu = T_bar[np.meshgrid(unlabelled_idx, unlabelled_idx, indexing='ij')]
Tul = T_bar[np.meshgrid(unlabelled_idx, labelled_idx, indexing='ij')]
Y = Y[:, :-1]
Y_u = np.dot(np.dot(np.linalg.inv(np.eye(Tuu.shape[0]) - Tuu), Tul), Y[labelled_idx])
expected = Y.copy()
expected[unlabelled_idx, :] = Y_u
expected /= expected.sum(axis=1)[:, np.newaxis]
assert_array_almost_equal(expected, clf.label_distributions_, 4) Feedback on making this more efficient/easier to read also welcome. |
Okay, I've pushed the changes to a branch on my fork which branches on I've also added the tests and have changed the old test from using Curiously, adding a step to do the following:
i.e., # ...
self.label_distributions_ = safe_sparse_dot(
graph_matrix, self.label_distributions_)
if alpha is None:
# LabelPropagation
normalizer = np.sum(self.label_distributions_, axis=1)[:, np.newaxis]
self.label_distributions_ /= normalizer
# clamp
# ... did not change the outcome of the test, while changing increasing the Things still left to do in this PR:
Maybe:
|
I'm not surprised that normalisation or not does not change the test passing (the test itself normalises the final distribution, and everything else is affine, but I've not fully thought it through). Using |
I know there's been a lot of this, but I'd be happy for you to take over here if you wish, or to send me a PR. |
I've added throwing of I'm sort of at a loss while designing tests, though. For example, I would love to have tests which:
I've tried using def _not_converged(y_truth, y_prediction, tol=1e-3):
"""basic convergence check"""
return np.abs(y_truth - y_prediction).sum() > tol I don't think we should be using relative tolerance. re: this PR; do you mean to close this discussion and start a new one? |
I don't mind if we close and start anew.
testing for sparse should be easy. is it not already tested?
On 27 Jun 2017 7:09 am, "Utkarsh Upadhyay" <[email protected]> wrote:
I've added throwing of ValueError for invalid alpha (including tests) and
the normalization step before clamping LabelPropagation.
I'm sort of at a loss while designing tests, though. For example, I would
love to have tests which:
- would fail if the normalization in LabelPropagation was not done but
succeeded afterwards (there must exist such a case because T is not
row-normalized, only column normalized).
- would verify that the implementation works for sparse matrixes.
- would test the limit of numerical stability.
…------------------------------
I've tried using assert_allclose and it succeeds after playing with rtol a
little for LabelPropagation. However, looking at the definition of
convergence in the the code:
def _not_converged(y_truth, y_prediction, tol=1e-3):
"""basic convergence check"""
return np.abs(y_truth - y_prediction).sum() > tol
I don't think we should be using relative tolerance.
------------------------------
re: this PR; do you mean to close this discussion and start a new one?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9192 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6-raZanRIeFoJXf02MUXRYP0siMsks5sIB36gaJpZM4OATgT>
.
|
I think we might have to leave this broken for 0.19.0, and aim to merge it soon after the release. |
I don't think so. Does something like
Not ideal, but sounds reasonable.
I'd rather keep this context around somehow. I missed the discussion on this thread for quite a while (~ 1 week?) because I wasn't automatically subscribed to it. Note to self: Explicitly ping everyone involved on the new PR which will eventually be created. |
Why not just generate dense X and then convert to sparse? |
Actually, the Hence, we don't need to test the sparse implementation right away. However, tests for in-the-loop normalization and numerical stability (how?) would be nice to have. I am fairly confident that the code implements the algorithm correctly and merging it as-is will move the implementation in the correct direction (i.e. from flat Earth -> spherical Earth). The tests would help move it to the 'oblate spheroid' realm in my head, but coming up with tests is ... difficult. Thoughts? |
do you mean my pr be fixing it, or your implementation?
Sorry I've lost the time to focus on this.
…On 28 Jun 2017 3:48 am, "Utkarsh Upadhyay" ***@***.***> wrote:
Actually, the graph_matrix can only be sparse if the kernel returns a
sparse matrix and only the kNN kernel returns that (which I am avoiding
fixing in this PR).
Hence, we don't need to test the sparse implementation right away.
However, tests for in-the-loop normalization and numerical stability (how?)
would be nice to have.
I am fairly confident that the code implements the algorithm correctly and
merging it as-is will move the implementation in the correct direction
(i.e. from flat Earth -> spherical Earth). The tests would help move it to
the 'oblate spheroid' realm in my head, but coming up with tests is ...
difficult.
Thoughts?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9192 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6-hm8GeDbXgbjnQ26ARA-NbP74YTks5sIUBvgaJpZM4OATgT>
.
|
I meant a new PR with your changes + the tweaks/tests we've developed over this thread, which are on my branch. I'll presently create a PR from it and link to it here. |
@musically-ut can you please give us a summary of what remains to be done for this PR? Is this ready for final review? If so please update the title from |
A actually I understand that this should be closed in favor of #9239. |
Closes #3550, #5774, #3758, #6727.
Travis likes it.