Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[WIP] Try finish #6727: alpha deprecation in LabelPropagation #9192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 20 commits into from

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Jun 21, 2017

Closes #3550, #5774, #3758, #6727.

Travis likes it.

@jnothman jnothman added this to the 0.19 milestone Jun 21, 2017
@jnothman
Copy link
Member Author

@boechat107, it looks like kernel='knn' makes Travis happy. If you'd rather adopt this fix in your branch, that's fine. In either case, I hope we can get some quick reviews and merge soon.

@jnothman
Copy link
Member Author

I've made another couple of small documentation fixes here.

Copy link
Member

@MechCoder MechCoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor comment related to narrative docs.

@@ -378,6 +378,11 @@ Bug fixes
- Fix bug where stratified CV splitters did not work with
:class:`linear_model.LassoCV`. :issue:`8973` by `Paulo Haddad <paulochf>`.

- Fix :class:`semi_supervised.LabelPropagation` to always do hard clamping.
Its ``alpha`` parameter now defaults to 0 and the parameter is deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, the algorithm and the documentation says that alpha is the percentage of the initial class distribution that will be maintained. Quoting the documentation.

"The LabelPropagation algorithm performs hard clamping of input labels, which means \alpha=1."

In that case, why does this default to zero assuming the definition of alpha is the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm, I just read the documentation below. Can you please change the narrative documentation so that there is no confusion.

@MechCoder MechCoder changed the title Try finish #6727: alpha deprecation in LabelPropagation [MRG+1] Try finish #6727: alpha deprecation in LabelPropagation Jun 21, 2017
@jnothman
Copy link
Member Author

Maybe I need to check a bit further that the semantics of this fix is right. I feel what's new is under-stating the change particularly because of the change from ...[unlabelled] to ...[~unlabelled]

@musically-ut
Copy link
Contributor

Okay, so I understand why the test was failing: part of it was the implementation of the algorithm and part of it was misinterpretation of clf._build_graph() function.

The test should have looked like the following:

n_classes = 2
X, y = make_classification(n_classes=n_classes, n_samples=200, random_state=0)
y[::3] = -1
clf = SS.label_propagation.LabelSpreading().fit(X, y)

# adopting notation from Zhou et al:
# W = clf._build_graph()
# D = np.diag(W.sum(axis=1))
# Dinvroot = scipy.linalg.sqrtm(np.linalg.inv(D))
# S = np.dot(np.dot(Dinvroot, W), Dinvroot)

S = clf._build_graph()
Y = np.zeros((len(y), n_classes + 1))
Y[np.arange(len(y)), y] = 1
Y = Y[:, :-1]
for alpha in [0.1, 0.3, 0.5, 0.7, 0.9]:
    expected = np.dot(np.linalg.inv(np.eye(len(S)) - alpha * S), Y)
    expected /= expected.sum(axis=1)[:, np.newaxis]
    clf = SS.label_propagation.LabelSpreading(max_iter=10000, alpha=alpha)
    clf.fit(X, y)
    assert_array_almost_equal(expected, clf.label_distributions_, 4)

That is, the clf._build_graph() function directly returns S instead of returning W.

Then, the "actual" algorithm talked about in the paper is the following (our modifications are commented out):

        # clamp_weights = np.ones((n_samples, 1))
        # clamp_weights[~unlabeled, 0] = alpha

        # TODO TESTING
        clamp_weights = alpha * np.ones((n_samples, 1))

        # ...

        if alpha > 0.:
            y_static *= 1 - alpha
        # TODO TESTING
        # y_static[unlabeled] = 0

I have this version implemented in this branch of my local fork; one can check out that branch and verify that the sanity-check test does succeed in that case.


I can sort of see that the modifications we have made to the algorithm make sense. Similar guarantees probably can be worked out for this modified version as well, but I would be far more comfortable just using the version from the paper or finding a reference rather than use unpublished versions.

What do you think?

@jnothman
Copy link
Member Author

jnothman commented Jun 25, 2017 via email

@jnothman
Copy link
Member Author

jnothman commented Jun 25, 2017 via email

@jnothman
Copy link
Member Author

jnothman commented Jun 25, 2017 via email

@musically-ut
Copy link
Contributor

musically-ut commented Jun 25, 2017

So we could say:

  • alpha > 0: F = alpha * F + (1 - alpha) * Y
  • alpha = 0: F = {F[i] if i unlabelled else Y[i]}

Err, I don't think the last case makes sense because the initial Y[i] are not supplied by the user (they are all 0s). It will be weird to return this without any hint of something having gone wrong.

Personally, throwing a ValueError makes more sense.

Update: No, it does make sense. Rethinking.


While we are at it, I also recommend adding a warning if the method did not converge after max_iterations, akin to this.

@jnothman
Copy link
Member Author

jnothman commented Jun 25, 2017 via email

@musically-ut
Copy link
Contributor

Okay, my suggestion is that we disallow both the extreme values of alpha, i.e. 0 and 1. The paper requires alpha to be in the open interval (0, 1) because in one case, we are completely ignoring the transduction and in the other, the input labels. Hence, I'll be happy to throw a ValueError in both cases for LabelSpread.

For LabelPropagation, I'm working on handling the case alpha == None gracefully and writing a similar sanity check for it.

@musically-ut
Copy link
Contributor

Actually, I can use some help; my numpy matrix-fu might be wrong somewhere.

I'm trying to replicate the calculations done in eqn. (12) in the reference for LabelPropagation.

Am I doing the correct things here?

n_classes = 2
X, y = make_classification(n_classes=n_classes, n_samples=200, random_state=0)
y[::3] = -1

# Using Zhu.' 2002 notation:

clf = label_propagation.LabelPropagation().fit(X, y)
T_bar = clf._build_graph()

Y = np.zeros((len(y), n_classes + 1))
Y[np.arange(len(y)), y] = 1

unlabelled_idx = Y[:, (-1,)].nonzero()[0]
labelled_idx = (Y[:, (-1,)] == 0).nonzero()[0]

Tuu = T_bar[np.meshgrid(unlabelled_idx, unlabelled_idx, indexing='ij')]
Tul = T_bar[np.meshgrid(unlabelled_idx, labelled_idx, indexing='ij')]

Y = Y[:, :-1]
Y_u = np.dot(np.dot(np.linalg.inv(np.eye(Tuu.shape[0]) - Tuu), Tul), Y[labelled_idx])

expected = Y.copy()
expected[unlabelled_idx, :] = Y_u
expected /= expected.sum(axis=1)[:, np.newaxis]

assert_array_almost_equal(expected, clf.label_distributions_, 4)

Feedback on making this more efficient/easier to read also welcome.

@musically-ut
Copy link
Contributor

musically-ut commented Jun 25, 2017

Okay, I've pushed the changes to a branch on my fork which branches on alpha is None to differentiate LabelSpreading and LabelPropagation. This is a leaky abstraction but something I can live with.

I've also added the tests and have changed the old test from using knn to using rbf kernel because #8008. I've adjusted the gamma parameter such that the exp underflow is not a problem.

Curiously, adding a step to do the following:

  • LabelPropagation row-normalizes Y to be a valid probability, while LabelSpreading makes such no constraints on the analogous F(T). I suppose we should change this behaviour to be true to the original algorithms depending upon if fit is called from LabelPropagation or LabelSpreading.

i.e.,

            # ...
            self.label_distributions_ = safe_sparse_dot(
                graph_matrix, self.label_distributions_)

            if alpha is None:
                # LabelPropagation
                normalizer = np.sum(self.label_distributions_, axis=1)[:, np.newaxis]
                self.label_distributions_ /= normalizer
            
           # clamp
           # ...

did not change the outcome of the test, while changing increasing the gamma to even 10 brought about numerical instability in the tests (perhaps in the algorithm as well?), making the results diverge and the test fail.


Things still left to do in this PR:

  • Throw a ValueError for alpha = 0 or alpha = 1.
  • Normalize step for LabelPropagation before clamping.
  • More varied tests, esp. to check for numerical stability and with sparse matrixes.

Maybe:

@jnothman
Copy link
Member Author

I'm not surprised that normalisation or not does not change the test passing (the test itself normalises the final distribution, and everything else is affine, but I've not fully thought it through).

Using assert_array_almost_equal might be less robust to changes in gamma than assert_allclose which uses relative tolerance. I might be wrong, though.

@jnothman
Copy link
Member Author

I know there's been a lot of this, but I'd be happy for you to take over here if you wish, or to send me a PR.

@musically-ut
Copy link
Contributor

musically-ut commented Jun 26, 2017

I've added throwing of ValueError for invalid alpha (including tests) and the normalization step before clamping LabelPropagation.

I'm sort of at a loss while designing tests, though. For example, I would love to have tests which:

  • would fail if the normalization in LabelPropagation was not done but succeeded afterwards (there must exist such a case because T is not row-normalized, only column normalized).
  • would verify that the implementation works for sparse matrixes.
  • would test the limit of numerical stability.

I've tried using assert_allclose and it succeeds after playing with rtol a little for LabelPropagation. However, looking at the definition of convergence in the the code:

def _not_converged(y_truth, y_prediction, tol=1e-3):
    """basic convergence check"""
    return np.abs(y_truth - y_prediction).sum() > tol

I don't think we should be using relative tolerance.


re: this PR; do you mean to close this discussion and start a new one?

@jnothman
Copy link
Member Author

jnothman commented Jun 26, 2017 via email

@jnothman
Copy link
Member Author

I think we might have to leave this broken for 0.19.0, and aim to merge it soon after the release.

@jnothman jnothman modified the milestones: 0.20, 0.19 Jun 27, 2017
@musically-ut
Copy link
Contributor

testing for sparse should be easy. is it not already tested?

I don't think so. Does something like make_classification exist to generate sparse X?

I think we might have to leave this broken for 0.19.0, and aim to merge it soon after the release.

Not ideal, but sounds reasonable.

I don't mind if we close and start anew [pull request].

I'd rather keep this context around somehow. I missed the discussion on this thread for quite a while (~ 1 week?) because I wasn't automatically subscribed to it.


Note to self: Explicitly ping everyone involved on the new PR which will eventually be created.

@MechCoder
Copy link
Member

Does something like make_classification exist to generate sparse X?

Why not just generate dense X and then convert to sparse?

@musically-ut
Copy link
Contributor

Actually, the graph_matrix can only be sparse if the kernel returns a sparse matrix and only the kNN kernel returns that (which I am avoiding fixing in this PR).

Hence, we don't need to test the sparse implementation right away. However, tests for in-the-loop normalization and numerical stability (how?) would be nice to have.

I am fairly confident that the code implements the algorithm correctly and merging it as-is will move the implementation in the correct direction (i.e. from flat Earth -> spherical Earth). The tests would help move it to the 'oblate spheroid' realm in my head, but coming up with tests is ... difficult.

Thoughts?

@jnothman
Copy link
Member Author

jnothman commented Jun 27, 2017 via email

@musically-ut
Copy link
Contributor

I meant a new PR with your changes + the tweaks/tests we've developed over this thread, which are on my branch. I'll presently create a PR from it and link to it here.

@ogrisel
Copy link
Member

ogrisel commented Jun 29, 2017

@musically-ut can you please give us a summary of what remains to be done for this PR? Is this ready for final review? If so please update the title from [WIP] to [MRG].

@ogrisel
Copy link
Member

ogrisel commented Jun 29, 2017

A actually I understand that this should be closed in favor of #9239.

@ogrisel ogrisel closed this Jun 29, 2017
@ogrisel ogrisel modified the milestones: 0.19, 0.20 Jun 29, 2017
@musically-ut
Copy link
Contributor

@ogrisel Thanks, yes, #9239 supersedes this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants