Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[WIP] Brier score binless decomposition #22233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ColdTeapot273K
Copy link

Reference Issues/PRs

Closes #21774. See also #18268, #21718

What does this implement/fix? Explain your changes.

Described in #21774

Any other comments?

@ColdTeapot273K ColdTeapot273K changed the title Brier score binless decomposition [WIP] Brier score binless decomposition Jan 17, 2022
@ogrisel
Copy link
Member

ogrisel commented May 17, 2022

Thanks for the PR. However, before reviewing it we should settle the discussion on the related issue: #21774 (comment)

@e-pet
Copy link

e-pet commented May 31, 2022

I just tried using your implementation of brier_score_loss_decomposition and ran into a problem (at least I believe it is a problem). Basically, I have a MWE where I obtain BS = calibration loss and refinement loss = 0, which seems incorrect to me (the predictor is far from perfect).

    rng = np.random.RandomState(42)
    from sklearn.model_selection import train_test_split
    from scipy import stats
    from sklearn.linear_model import LogisticRegression
    import numpy as np
    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    from sklearn.metrics import roc_auc_score

    def risk(x, group, confounder):
        """The true risk of each sample"""
        return 0.2 + x + 0.3*group*confounder - 0.15*group

    x = np.atleast_2d(rng.uniform(0, 0.5, size=1000)).T
    group = np.atleast_2d(rng.binomial(1, 0.5, size=1000)).T
    confounder = np.atleast_2d(rng.binomial(1, 0.2, size=1000)).T
    expected_risk = risk(x, group, confounder)
    lower, upper = -0.1, 0.1
    mu, sigma = 0, 0.05
    risk_noise = np.atleast_2d(stats.truncnorm(
        (lower - mu) / sigma, (upper - mu) / sigma, loc=mu, scale=sigma).rvs(1000)).T
    true_risk = np.maximum(np.minimum(expected_risk + risk_noise, 1), 0)
    outcomes = rng.binomial(1, p=true_risk)
    X_train, X_test, y_train, y_test, risk_train, risk_test = train_test_split(
    np.concatenate((x, group), axis=1), outcomes, true_risk, test_size=0.25, random_state=rng)
    idces = np.argsort(X_test[:, 0], axis=0)
    X_test = np.atleast_2d(X_test[idces, :].squeeze())
    y_test = np.atleast_2d(y_test[idces, :].squeeze()).T
    risk_test = np.atleast_2d(risk_test[idces, :].squeeze()).T
    LR = LogisticRegression()
    LR.fit(X_train, y_train.ravel())
    risk_test_pred_uncalib = LR.predict_proba(X_test)[:, 1]
    # yields bs==cl==0.22..., rl == 0
    bs, cl, rl = brier_score_loss_decomposition(y_test.reshape(-1), risk_test_pred_uncalib.reshape(-1))
    
    plt.figure()
    df = pd.DataFrame({'risk prediction': risk_test_pred_uncalib.reshape(-1), 'true risk': risk_test.reshape(-1)})
    sns.scatterplot(data=df, x='risk prediction', y='true risk')
    plt.xlim([0, 1])
    plt.ylim([0, 1])
    plt.plot([0, 1], [0, 1], '--')   

    if roc_auc_score(y_test.reshape(-1), risk_test_pred_uncalib.reshape(-1)) < 1:  # AUROC = 0.67...
        assert(rl > 0) 

@ColdTeapot273K
Copy link
Author

ColdTeapot273K commented Jun 21, 2022

@e-pet

I've checked your example

TLDR: working as intended, since it's an exact, binless implementation; but we can do something about it to give more flexibility without introducing bins.

Two points:

  1. refinement loss grows when predictions are alike -> when model differentiates/disctiminates less, generalizes more
  2. since this is a binless implementation, i.e. exact, the refinement loss grows only when predictions for different labels are exactly similar, as in exact number to an epsilon. in practice it means you're likely to get significantly nonzero refinement loss only from tree-based models rather than from linear (although it happens)

e.g. this is how you get the maximum refinement loss:

bs, cl, rl = brier_score_loss_decomposition(
    np.array([1, 0]).reshape(-1), np.array([0.5, 0.5]).reshape(-1)
)
# 0.25, 0.0, 0.25

while the calibration loss here is 0 and it's an expected result. that's how you can hack the calibration metric in general btw - just predict class balance value (countered by considering joint likelihood metric instead, see McElreath's "Statistical Rethinking 2nd ed", paragraph 7.2)


Now, this got me thinking that in practical applications we don't often consider numbers which differ in very distant decimals (e.g. 0.XXXXXX3 and 0.XXXXXX4) as too different. So we might relax the constraint on 'exactness' of this implementation by introducing some absolute/relative tolerance (like in math.isclose()).
So that:

  • 0.XXX3 and 0.XXX4 are treated as the same (and therefore pred[i]=0.XXX3, test[i]=0 and pred[i]=0.XXX4, test[i]=1 would increase refinement loss)
  • 0.X3 and 0.X4 are considered different (and therefore pred[i]=0.X3, test[i]=0 and pred[i]=0.X4, test[i]=1 wouldn't increase refinement loss)

Tolerance in this case would act as alternative parameter to binning, the degree of "exactness" of this exact implementation.
Or maybe instead it should be left to a user to do some np.round() over predicted probabilities if they so desire. Simple, explicit alternative.

@lorentzenchr
Copy link
Member

lorentzenchr commented Jun 27, 2022

I would much prefer a more general solution to score decompositions as proposed in #23767.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calibration and Refinement loss for Brier score loss
4 participants