BUG: clamping is implemented incorrectly in sklearn.semi_supervised.LabelSpreading

The code [which does clamping in `sklearn.semi_supervised.LabelSpreading`](https://github.com/scikit-learn/scikit-learn/blob/e5f71e7f5e0ff789fed7bdea475223b8ff6a5576/sklearn/semi_supervised/label_propagation.py#L230) appears to be incorrect:

```
    clamp_weights = np.ones((n_samples, 1))
    clamp_weights[unlabeled, 0] = self.alpha

    # ...

    y_static = np.copy(self.label_distributions_)
    if self.alpha > 0.:
        y_static *= 1 - self.alpha
    y_static[unlabeled] = 0

    # ...

    while ...:
        ...
        self.label_distributions_ = safe_sparse_dot(
            graph_matrix, self.label_distributions_)
        # clamp
        self.label_distributions_ = np.multiply(
            clamp_weights, self.label_distributions_) + y_static
```

This does the following: 
1. If `i`th sample is labeled, then: `y_new[i] = 1.0 * M * y_old[i] + (1 - alpha) * y_init[i]`
2. If `i`th sample is unlabeled, then: `y_new[i] = alpha * M * y_old[i] + 0.0`

This is clearly incorrect. The correct way to do this is:
1. If `i`th sample is labeled, then: `y_new[i] = alpha * M * y_old[i] + (1 - alpha) * y_init[i]`
2. If `i`th sample is unlabeled, then: `y_new[i] = 1.0 * M * y_old[i] + 0.0`

The fix is relatively simple:

```
-clamp_weights[unlabeled, 0] = self.alpha
+clamp_weights[~unlabeled, 0] = self.alpha
```

I can create a PR for this but am not sure what kind of test cases I should add to avoid a regression, if any.

---

Test case:

```
samples = [[1., 0.], [0., 1.], [1., 2.5]]
labels = [0, 1, -1]
mdl = label_propagation.LabelSpreading(kernel='rbf', max_iter=5000)
mdl.fit(samples, labels)  # This will use up all 5000 iterations without converging
```

With the fix in place, it takes only 6 iterations.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: clamping is implemented incorrectly in sklearn.semi_supervised.LabelSpreading #5774

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: clamping is implemented incorrectly in sklearn.semi_supervised.LabelSpreading #5774

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions