Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Additive Score/Metric decomposition into Miscalibration, Discrimination and Uncertainty #23767

Open
@lorentzenchr

Description

@lorentzenchr

Describe the workflow you want to enable

Proposition

I'd like to decompose scores (at least the ones from consistent scoring functions for identifiable functionals) into meaningful additive components:

  • MSC ≥ 0 miscalibration (for Brier score also know as reliability term)
  • DSC ≥ 0 discrimination (for Brier score also know as resolution term)
  • UNC ≥ 0 uncertainty or entropy

as (lower is better) score_loss = MSC - DSC + UNC, see [1, 2, 3, 4]. An implementation in R is available at [5].

For classification with Brier score

from sklearn.metrics import brier_score_loss, score_decompose

model.fit(X_train, y_train)
y_pred = model.predict(y_test)
msc, dsc, unc = score_decompose(y_test, y_pred, brier_score_loss)

assert msc - dsc + unc = brier_score_loss(y_test, y_pred)

For quantile regression with pinball loss

rom sklearn.metrics import mean_pinball_loss, score_decompose

alpha = 0.8  # the 80% quantile
model.set_params(quantile=alpha).fit(X_train, y_train)
y_pred = model.predict(y_test)
msc, dsc, unc = score_decompose(y_test, y_pred, brier_score_loss, alpha=alpha)

assert msc - dsc + unc = mean_pinball_loss(y_test, y_pred, alpha=alpha)

References

[1] Pohle (2020) https://arxiv.org/abs/2005.01835
[2] Dimitriadis, Gneiting, Jordan (2020) https://arxiv.org/abs/2008.03033
[3] Gneiting & Resin (2021) https://arxiv.org/abs/2108.03210
[4] Fissler, Lorentzen & Mayer (2022) https://arxiv.org/abs/2202.12780
[5] https://github.com/aijordan/reliabilitydiag

Describe your proposed solution

From [2]:

While there is a consensus on the character and intuitive interpretation of the decomposition terms, their exact form remains subject to debate, despite a half century quest in the wake of Murphy’s [1973] Brier score decomposition. In particular, Murphy’s decomposition is exact in the discrete case, but fails to be exact under continuous forecasts, which has prompted the development of increasingly complex types of decompositions (...).

In the extant literature, it has been assumed implicitly or explicitly that the calibrated and reference forecasts can be chosen at researchers’ discretion (...). We argue otherwise and contend that the calibrated forecasts ought to be the PAV-(re)calibrated probabilities, as displayed in the CORP reliability diagram, whereas the reference forecast r ought to be the marginal event frequency (...). We refer to the resulting decomposition as the CORP score decomposition, which enjoys the following properties:
• MCB ≥ 0 with equality if the original forecast is calibrated.
• DSC ≥ 0 with equality if the PAV-calibrated forecast is constant.
• The decomposition is exact.

Perhaps surprisingly, the PAV algorithm and its appealing properties generalize from prob-
abilistic classifiers to mean, quantile, and expectile assessments for real-valued outcomes

Describe alternatives you've considered, if relevant

No response

Additional context

#23132 is related and might be a good first step towards score_decomposition.

Solving this issue will also solve #21774.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions