Add `fit_predict` to BaseLabelPropagation #24898

scottgigante-immunai · 2022-11-11T17:21:58Z

What does this implement/fix? Explain your changes.

This PR adds a fit_predict method to BaseLabelPropagation. The predicted labels for X are already computed, but there is no documented way of accessing them without calling predict again, which causes redundant computation.

betatim · 2022-11-14T09:54:00Z

Can you add a quick test for fit_predict please? I don't know if there is a general test that checks that fit(X, y).predict(X) == fit_predict(X, y), if not that would be something to test.

Otherwise this seems like a reasonable thing to add.

adrinjalali · 2022-11-14T10:03:28Z

sklearn/semi_supervised/_label_propagation.py

+            Predictions for input data.
+        """
+        self.fit(X, y)
+        return self.transduction_


transduction_ is already documented in LabelPropagation as:

transduction_ : ndarray of shape (n_samples) Label assigned to each item via the transduction.

We can improve that description if you think it's not clear, and we can also add attributes to BaseLabelPropagation for clarity maybe. I think I might prefer that to implementing fit_predict.

The reason I thought this should exist is that similar methods exist for other sklearn classes, e.g. KMeans.fit_predict does basically the same thing.

Yes, but I'm not sure if that's a good thing.

cc @scikit-learn/core-devs

It definitely feels (is) a bit redundant, but I think the benefit of a more "consistent" API is worth it. It helps newcomers infer usage across estimators. Overall I'm in favor if it strictly conforms to the expected semantics.

Edit: as @scottgigante-immunai mentioned, in this case fit(X, y).predict(X) == fit_predict(X, y) does not hold. I think this is not obvious a-priori and consequently I think enforcing access through transduction_ is more appropriate

scottgigante-immunai · 2022-11-17T15:46:23Z

Can you add a quick test for fit_predict please? I don't know if there is a general test that checks that fit(X, y).predict(X) == fit_predict(X, y), if not that would be something to test.

@betatim this fails and for good reason -- when fitting, the predictions keep the existing labels in place (unless LabelSpreading alpha > 0) while calling predict has no knowledge of the prior labels. I could write a test that only checks that the values of the unlabelled data are the same, but this is actually a different operation than calling fit(X, y).predict(X).

adrinjalali · 2022-11-17T18:46:27Z

this is actually a different operation than calling fit(X, y).predict(X).

In this case, I don't think we can add this. We should better document transduction_

betatim · 2022-11-18T07:52:28Z

Agreed. With fit().predict() != fit_predict() I think it would create more confusion than being helpful. 👍 for improved docs for transduction_. Maybe also a +1 to leaving a comment in the code somewhere (if there is a good place) for people from the future to let them know that someone already tried to add fit_predict() and discovered that it "isn't that easy".

scottgigante-immunai · 2022-11-18T16:10:28Z

That all makes sense to me. I've opened a new PR #24985 which closes this and instead edits the docs.

PS: I love that we're communicating with "people from the future". Such time travel :D

Add fit_predict to BaseLabelPropagation

f7b98f3

github-actions bot added the module:semi_supervised label Nov 11, 2022

Lint

31acc88

adrinjalali reviewed Nov 14, 2022

View reviewed changes

scottgigante-immunai added 4 commits November 17, 2022 09:16

Add fit_predict test

b47783f

Fix typo

172ba3f

Only fit once in testing

70db969

Use the same sample data as predict test

e3b79d5

scottgigante-immunai mentioned this pull request Nov 18, 2022

DOC Improve docs of BaseLabelPropagation.transduction_ #24985

Merged

glemaitre closed this in #24985 Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add `fit_predict` to BaseLabelPropagation #24898

Add `fit_predict` to BaseLabelPropagation #24898

Uh oh!

scottgigante-immunai commented Nov 11, 2022

Uh oh!

betatim commented Nov 14, 2022 •

edited

Loading

Uh oh!

adrinjalali Nov 14, 2022

Uh oh!

scottgigante-immunai Nov 17, 2022

Uh oh!

adrinjalali Nov 17, 2022

Uh oh!

Micky774 Nov 17, 2022 •

edited

Loading

Uh oh!

scottgigante-immunai commented Nov 17, 2022

Uh oh!

adrinjalali commented Nov 17, 2022

Uh oh!

betatim commented Nov 18, 2022

Uh oh!

scottgigante-immunai commented Nov 18, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Add fit_predict to BaseLabelPropagation #24898

Add fit_predict to BaseLabelPropagation #24898

Uh oh!

Conversation

scottgigante-immunai commented Nov 11, 2022

What does this implement/fix? Explain your changes.

Uh oh!

betatim commented Nov 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali Nov 14, 2022

Choose a reason for hiding this comment

Uh oh!

scottgigante-immunai Nov 17, 2022

Choose a reason for hiding this comment

Uh oh!

adrinjalali Nov 17, 2022

Choose a reason for hiding this comment

Uh oh!

Micky774 Nov 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scottgigante-immunai commented Nov 17, 2022

Uh oh!

adrinjalali commented Nov 17, 2022

Uh oh!

betatim commented Nov 18, 2022

Uh oh!

scottgigante-immunai commented Nov 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Add `fit_predict` to BaseLabelPropagation #24898

Add `fit_predict` to BaseLabelPropagation #24898

betatim commented Nov 14, 2022 •

edited

Loading

Micky774 Nov 17, 2022 •

edited

Loading

scottgigante-immunai commented Nov 18, 2022 •

edited

Loading