Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: clamping is implemented incorrectly in sklearn.semi_supervised.LabelSpreading #5774

Closed
@musically-ut

Description

@musically-ut

The code which does clamping in sklearn.semi_supervised.LabelSpreading appears to be incorrect:

    clamp_weights = np.ones((n_samples, 1))
    clamp_weights[unlabeled, 0] = self.alpha

    # ...

    y_static = np.copy(self.label_distributions_)
    if self.alpha > 0.:
        y_static *= 1 - self.alpha
    y_static[unlabeled] = 0

    # ...

    while ...:
        ...
        self.label_distributions_ = safe_sparse_dot(
            graph_matrix, self.label_distributions_)
        # clamp
        self.label_distributions_ = np.multiply(
            clamp_weights, self.label_distributions_) + y_static

This does the following:

  1. If ith sample is labeled, then: y_new[i] = 1.0 * M * y_old[i] + (1 - alpha) * y_init[i]
  2. If ith sample is unlabeled, then: y_new[i] = alpha * M * y_old[i] + 0.0

This is clearly incorrect. The correct way to do this is:

  1. If ith sample is labeled, then: y_new[i] = alpha * M * y_old[i] + (1 - alpha) * y_init[i]
  2. If ith sample is unlabeled, then: y_new[i] = 1.0 * M * y_old[i] + 0.0

The fix is relatively simple:

-clamp_weights[unlabeled, 0] = self.alpha
+clamp_weights[~unlabeled, 0] = self.alpha

I can create a PR for this but am not sure what kind of test cases I should add to avoid a regression, if any.


Test case:

samples = [[1., 0.], [0., 1.], [1., 2.5]]
labels = [0, 1, -1]
mdl = label_propagation.LabelSpreading(kernel='rbf', max_iter=5000)
mdl.fit(samples, labels)  # This will use up all 5000 iterations without converging

With the fix in place, it takes only 6 iterations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions