ENH Exposes latent mean and variance for GPCs #22227

miguelgondu · 2022-01-16T00:10:06Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adds a return_std_of_f flag for binary Gaussian Process Classifiers. This value was already being computed in some of the methods, so this PR just exposes that variable and adds checks in the API to ensure this is only possible for binary (and not multiclass) classification using the Laplace approximation.

Any other comments?

miguelgondu · 2022-12-21T21:21:28Z

Depending on whether the API decision, I would be happy to update this with the latest changes in main.

noashin

What is lacking is an explanation of the meaning of the std of f. Emphasizing that this is not the variance of \pi (the link function over the latent function) and it does not translate directly to the confidence intervals. Maybe it can be added to the general documentation of GPs in scikit.
As the API is now more similar to the one of GaussianProcessRegressor, without further explanation, it implies that the quantity returned is equivalent to the one returned by the flag return_std in GaussianProcessRegressor (even though the name of the flag is different).

noashin · 2023-08-31T12:32:56Z

sklearn/gaussian_process/_gpc.py

            X = self._validate_data(X, ensure_2d=True, dtype="numeric", reset=False)
        else:
            X = self._validate_data(X, ensure_2d=False, dtype=None, reset=False)



I would put the check whether std_f can be returned or not here, after the kernel tests. Than the check can be similar to the one done around lines 773:

if self.n_classes_ > 2:
if return_std_of_f:
raise ValueError(
"Returning the standard deviation of the "
"latent function f is only supported for GPCs "
"that use the Laplace Approximation."
)
else:
return self.base_estimator_.predict_proba(X)
else:
return self.base_estimator_.predict_proba(
X, return_std_of_f=return_std_of_f
)

Agreed! Will implement these changes.

noashin · 2023-08-31T12:38:01Z

sklearn/gaussian_process/_gpc.py

        return self

-    def predict(self, X):
+    def predict(self, X, return_std_of_f=False):


I think that std_of_f should be available only via predict_proba.
As predict does not provide a probabilistic estimation returning an the std of the nuiance function is out of the scope in a way.

Hi Noa,

First of all, thanks for the review!

I added the flag here as well to match the API for GPR. In the predict method for Gaussian Process Regression, you can also find a return_std. If we remove it from this predict, wouldn't that make the APIs different?

I'm happy to remove it either way, but I'm curious about what you think regarding whether the APIs for GPR and GPC should match.

Maybe @adrinjalali could share his opinion of whether keeping the interfaces as similar as possible.

If it's worth anything, someone mentioned the consistency with GPR in a comment to the original issue: #22226 (comment)

I agree that in this case having it only for predict_proba makes more sense.

I'll include those changes then!

github-actions · 2023-09-26T15:57:23Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 9403c59. Link to the linter CI: here}

miguelgondu · 2023-09-26T17:08:07Z

Dear @noashin,

Again, many thanks for the review.

I have added a paragraph to the documentation explaining what the keyword would do, and I am hopefully clear and specific on the fact that we are returning $\sqrt{\text{Var}[f_*]}$ and not $\sqrt{\text{Var}[\pi(f)]}$. I have also moved the changelog bit to 1.4.

Let me know if I should address anything else!

adrinjalali

I wonder if the argument could also be simply called return_std

adrinjalali · 2023-09-28T11:22:27Z

sklearn/gaussian_process/_gpc.py

        return self

-    def predict(self, X):
+    def predict(self, X, return_std_of_f=False):


I agree that in this case having it only for predict_proba makes more sense.

miguelgondu · 2023-09-28T11:26:34Z

I wonder if the argument could also be simply called return_std

It might be misleading! People might assume that it refers to the uncertainty over the class probabilities $\pi(f)$ instead of the latent variable $f$. I'm happy to change it either way. What does @noashin think?

noashin · 2023-09-28T11:30:31Z

I wonder if the argument could also be simply called return_std

It might be misleading! People might assume that it refers to the uncertainty over the class probabilities π(f) instead of the latent variable f. I'm happy to change it either way. What does @noashin think?

I agree. I think that return_std will be misleading.

adrinjalali · 2023-09-28T11:31:45Z

ping @glemaitre maybe.

pmeiners · 2025-01-25T06:50:51Z

Any news on this feature? It seems to be requested frequently:
http://krasserm.github.io/2020/11/04/gaussian-processes-classification/
https://stackoverflow.com/questions/67818319/how-to-output-mean-and-stdv-of-gaussian-process-classifier-in-sklearn
https://stackoverflow.com/questions/79086293/gaussian-process-binary-classification-why-is-the-variance-with-gpy-much-smalle

miguelgondu · 2025-01-26T08:52:25Z

I'm happy to update it, but it seems we're waiting for @glemaitre to give input on whether we should call it return_std or return_std_of_f.

adrinjalali · 2025-02-18T17:33:01Z

Ping @antoinebaker and @snath-xoc , I think you can give a nicer feedback here than I can.

antoinebaker · 2025-02-20T10:57:20Z

Thanks for the PR @miguelgondu !

If I understood correctly the original issue, I see two motivations behind returning f_std:

plot the posterior latent f "in logit space" along with its confidence region, something like fill_between(x, f_mean-f_std,f_mean+f_std)
sample the posterior latent f (from the Laplace approximation which is Gaussian with mean and covariance f_mean and f_cov ), and generate corresponding samples y_prob = logistic(f) to plot, use in ancestral sampling or compute bootstrap estimates (like confidence intervals or std for y_prob)

But if you had these applications in mind, I don't think the current approach is enough (because f_mean is missing) ?

I don't have a strong opinion here but maybe a separate method (name and signature TBD):

laplace_approximation(X, return_std=False, return_cov=False) 
# returns f_mean, f_std or even f_cov if possible

would be better that returning f_std inside predict_proba. I feel returning y_prob with f_std would mix apples with oranges and could lead to misuse (eg thinking it's y_mean, y_std like in the GPR case, even if the docstring warns against it).

snath-xoc · 2025-02-21T09:27:44Z

Thanks for the PR @miguelgondu, I agree with @antoinebaker that taking this into a dedicated function may be better as compared to within predict_proba so we don't confuse users with y_mean and f_mean (although I see that it is well-documented). Being able to quantify the uncertainty in the latent f space could be interesting but case-specific, with the proposed "laplace_approximation" term we could also provide the option of calculating cov/std based uncertainty estimates.

miguelgondu · 2025-02-21T09:42:01Z

Hi @antoinebaker and @snath-xoc,

Thanks for the feedback!

Agreed, the best path forward would be to implement another method. I'll try to do it soon, and ask for a re-review.

miguelgondu · 2025-04-22T15:52:37Z

Dear @noashin @adrinjalali @antoinebaker and @snath-xoc,

I have addressed the suggested changes. Now the GaussianProcessClassifier has a latent_mean_and_variance method that gives access to the values. I'm happy to change the name if need be. The method is now used internally by predict_proba as well.

For now, I only test whether the returned values have the right shape, that exceptions get raised when calling the method with more than two classes, and that the method also works on string kernels. Let me know if I should include any more tests. The validity should already be covered by previous tests on the GPC.

I'm also happy to clean-up the git history via rebasing and squashing irrelevant commits.

antoinebaker

Thanks @miguelgondu for the PR, latent_mean_and_variance is a very good name I think.

A few formatting nitpicks, otherwise LGTM!

sklearn/gaussian_process/_gpc.py

miguelgondu · 2025-04-24T14:25:10Z

Thanks for the feedback, @antoinebaker ! I have addressed it.

snath-xoc

Hi @miguelgondu thank you for the changes, I thought I would just do some nitpicks to ensure consistency in the naming of latent_mean and latent_var (unless you have a good reason for calling them f_mean and var_f_star?) throughout the code

doc/whats_new/upcoming_changes/sklearn.gaussian_process/22227.enhancement.rst

sklearn/gaussian_process/_gpc.py

miguelgondu · 2025-04-24T19:55:43Z

Hi @snath-xoc ,

Thanks for the feedback! I had left the previous names because of the notation in the book we cite. That being said, I agree with the consistency changes you proposed. They're now implemented!

Happy to address any other feedback. Let me know!

snath-xoc · 2025-04-25T18:27:51Z

LGTM once we get a second opinion now!

antoinebaker · 2025-04-29T09:20:40Z

LGTM too !

…by moving the verifications

…edict method

Co-authored-by: antoinebaker <[email protected]>

miguelgondu · 2025-05-05T12:46:31Z

@noashin and @adrinjalali . Sorry for the second ping. Let me know if I should provide any changes!

adrinjalali

Thank you very much. This seems quite straightforward now, and with the two reviews we have, I'm confident with the contribution.

github-actions bot added the module:gaussian_process label Jan 16, 2022

miguelgondu changed the title ~~[WIP] Adds a return std flag for GPCs~~ [WIP] Adds a return_std_of_f flag for GPCs Jan 16, 2022

miguelgondu changed the title ~~[WIP] Adds a return_std_of_f flag for GPCs~~ [MRG] Adds a return_std_of_f flag for GPCs Jan 16, 2022

miguelgondu changed the title ~~[MRG] Adds a return_std_of_f flag for GPCs~~ [MRG] [ENH] Adds a return_std_of_f flag for GPCs Jan 18, 2022

cmarmo added the Needs Decision - API label Jul 7, 2022

noashin reviewed Aug 31, 2023

View reviewed changes

adrinjalali reviewed Sep 28, 2023

View reviewed changes

lorentzenchr added API Needs Decision Requires decision and removed Needs Decision - API labels Mar 14, 2024

miguelgondu force-pushed the gpc_return_std_flag branch from 4657afa to 7111d99 Compare April 22, 2025 15:35

miguelgondu changed the title ~~[MRG] [ENH] Adds a return_std_of_f flag for GPCs~~ [MRG] [ENH] Exposes latent mean and variance for GPCs Apr 22, 2025

miguelgondu force-pushed the gpc_return_std_flag branch from ad3e6aa to e8c8de2 Compare April 22, 2025 15:47

miguelgondu requested review from adrinjalali and noashin April 22, 2025 15:52

antoinebaker approved these changes Apr 24, 2025

View reviewed changes

sklearn/gaussian_process/_gpc.py Outdated Show resolved Hide resolved

sklearn/gaussian_process/_gpc.py Outdated Show resolved Hide resolved

miguelgondu force-pushed the gpc_return_std_flag branch from 49762dd to dec800a Compare April 24, 2025 14:17

snath-xoc reviewed Apr 24, 2025

View reviewed changes

miguelgondu and others added 17 commits May 5, 2025 14:46

Adds a return std flag for GPCs

aa0684c

Fixes a docstring error by adding some spaces

376ac76

Adds tests for GPC return std of f flag

5fc6f3c

Increases test coverage, makes tests more pythonic

43c3ebe

Adds documentation, moves the contribution, and addresses the review …

3693ea6

…by moving the verifications

Adds a name for the link function in the documentation

b52ecac

Removes the return std flag from the predict method

63ed1a1

Updates documentation and changelog according to the change in the pr…

fded19a

…edict method

Small typo on the GPC documentation

5fe11bd

Implements a latent mean and variance function for GPC

5a7a54c

Removes old contribution from v1.4

1636935

Adds a test on the exception raised

d42806a

Adds a test on structured kernels

b1644ff

Apply suggestions from code review regarding line wrap

d45e9d6

Co-authored-by: antoinebaker <[email protected]>

Addresses feedback regarding line length in docstrings

9f3e569

Updates the test to reflect a change in error raised

ab2b2bf

Adds consistency in naming of variables

9403c59

miguelgondu force-pushed the gpc_return_std_flag branch from c51825e to 9403c59 Compare May 5, 2025 12:46

adrinjalali changed the title ~~[MRG] [ENH] Exposes latent mean and variance for GPCs~~ ENH Exposes latent mean and variance for GPCs May 6, 2025

adrinjalali approved these changes May 6, 2025

View reviewed changes

adrinjalali merged commit b55aba5 into scikit-learn:main May 6, 2025
37 checks passed

adrinjalali mentioned this pull request May 6, 2025

DOC add versionadded directive to new method in GPC #31320

Merged

Uh oh!

ENH Exposes latent mean and variance for GPCs #22227

ENH Exposes latent mean and variance for GPCs #22227

Uh oh!

Conversation

miguelgondu commented Jan 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

miguelgondu commented Dec 21, 2022

Uh oh!

noashin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

miguelgondu commented Sep 26, 2023

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

miguelgondu commented Sep 28, 2023

Uh oh!

noashin commented Sep 28, 2023

Uh oh!

adrinjalali commented Sep 28, 2023

Uh oh!

pmeiners commented Jan 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miguelgondu commented Jan 26, 2025

Uh oh!

adrinjalali commented Feb 18, 2025

Uh oh!

antoinebaker commented Feb 20, 2025

Uh oh!

snath-xoc commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miguelgondu commented Feb 21, 2025

Uh oh!

miguelgondu commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antoinebaker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

miguelgondu commented Apr 24, 2025

Uh oh!

snath-xoc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

miguelgondu commented Jan 16, 2022 •

edited

Loading

github-actions bot commented Sep 26, 2023 •

edited

Loading

pmeiners commented Jan 25, 2025 •

edited

Loading

snath-xoc commented Feb 21, 2025 •

edited

Loading

miguelgondu commented Apr 22, 2025 •

edited

Loading

snath-xoc left a comment •

edited

Loading