Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

scottgigante-immunai
Copy link
Contributor

What does this implement/fix? Explain your changes.

This PR adds a fit_predict method to BaseLabelPropagation. The predicted labels for X are already computed, but there is no documented way of accessing them without calling predict again, which causes redundant computation.

@betatim
Copy link
Member

betatim commented Nov 14, 2022

Can you add a quick test for fit_predict please? I don't know if there is a general test that checks that fit(X, y).predict(X) == fit_predict(X, y), if not that would be something to test.

Otherwise this seems like a reasonable thing to add.

Predictions for input data.
"""
self.fit(X, y)
return self.transduction_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transduction_ is already documented in LabelPropagation as:

    transduction_ : ndarray of shape (n_samples)
        Label assigned to each item via the transduction.

We can improve that description if you think it's not clear, and we can also add attributes to BaseLabelPropagation for clarity maybe. I think I might prefer that to implementing fit_predict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I thought this should exist is that similar methods exist for other sklearn classes, e.g. KMeans.fit_predict does basically the same thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'm not sure if that's a good thing.

cc @scikit-learn/core-devs

Copy link
Contributor

@Micky774 Micky774 Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely feels (is) a bit redundant, but I think the benefit of a more "consistent" API is worth it. It helps newcomers infer usage across estimators. Overall I'm in favor if it strictly conforms to the expected semantics.

Edit: as @scottgigante-immunai mentioned, in this case fit(X, y).predict(X) == fit_predict(X, y) does not hold. I think this is not obvious a-priori and consequently I think enforcing access through transduction_ is more appropriate

@scottgigante-immunai
Copy link
Contributor Author

Can you add a quick test for fit_predict please? I don't know if there is a general test that checks that fit(X, y).predict(X) == fit_predict(X, y), if not that would be something to test.

@betatim this fails and for good reason -- when fitting, the predictions keep the existing labels in place (unless LabelSpreading alpha > 0) while calling predict has no knowledge of the prior labels. I could write a test that only checks that the values of the unlabelled data are the same, but this is actually a different operation than calling fit(X, y).predict(X).

@adrinjalali
Copy link
Member

this is actually a different operation than calling fit(X, y).predict(X).

In this case, I don't think we can add this. We should better document transduction_

@betatim
Copy link
Member

betatim commented Nov 18, 2022

Agreed. With fit().predict() != fit_predict() I think it would create more confusion than being helpful. πŸ‘ for improved docs for transduction_. Maybe also a +1 to leaving a comment in the code somewhere (if there is a good place) for people from the future to let them know that someone already tried to add fit_predict() and discovered that it "isn't that easy".

@scottgigante-immunai
Copy link
Contributor Author

scottgigante-immunai commented Nov 18, 2022

That all makes sense to me. I've opened a new PR #24985 which closes this and instead edits the docs.

PS: I love that we're communicating with "people from the future". Such time travel :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants