Additive Score/Metric decomposition into Miscalibration, Discrimination and Uncertainty

### Describe the workflow you want to enable

### Proposition
I'd like to decompose scores (at least the ones from consistent scoring functions for identifiable functionals) into meaningful additive components:
- MSC ≥ 0 miscalibration (for Brier score also know as reliability term)
- DSC ≥ 0 discrimination (for Brier score also know as resolution term)
- UNC ≥ 0 uncertainty or entropy

as (lower is better) `score_loss = MSC - DSC + UNC`, see [1, 2, 3, 4]. An implementation in R is available at [5].

#### For classification with Brier score
```python
from sklearn.metrics import brier_score_loss, score_decompose

model.fit(X_train, y_train)
y_pred = model.predict(y_test)
msc, dsc, unc = score_decompose(y_test, y_pred, brier_score_loss)

assert msc - dsc + unc = brier_score_loss(y_test, y_pred)
```
#### For quantile regression with pinball loss
```python
rom sklearn.metrics import mean_pinball_loss, score_decompose

alpha = 0.8  # the 80% quantile
model.set_params(quantile=alpha).fit(X_train, y_train)
y_pred = model.predict(y_test)
msc, dsc, unc = score_decompose(y_test, y_pred, brier_score_loss, alpha=alpha)

assert msc - dsc + unc = mean_pinball_loss(y_test, y_pred, alpha=alpha)
```

#### References
[1] Pohle (2020) https://arxiv.org/abs/2005.01835
[2] Dimitriadis, Gneiting, Jordan (2020) https://arxiv.org/abs/2008.03033
[3] Gneiting & Resin (2021) https://arxiv.org/abs/2108.03210
[4] Fissler, Lorentzen & Mayer (2022) https://arxiv.org/abs/2202.12780
[5] https://github.com/aijordan/reliabilitydiag

### Describe your proposed solution

From [2]:
>  While there is a consensus on the character and intuitive interpretation of the decomposition terms, their exact form remains subject to debate, despite a half century quest in the wake of Murphy’s [1973] Brier score decomposition. In particular, Murphy’s decomposition is exact in the discrete case, but fails to be exact under continuous forecasts, which has prompted the development of increasingly complex types of decompositions (...).

> In the extant literature, it has been assumed implicitly or explicitly that the calibrated and reference forecasts can be chosen at researchers’ discretion (...). We argue otherwise and contend that the calibrated forecasts ought to be the PAV-(re)calibrated probabilities, as displayed in the CORP reliability diagram, whereas the reference forecast r ought to be the marginal event frequency (...). We refer to the resulting decomposition as the CORP score decomposition, which enjoys the following properties:
• MCB ≥ 0 with equality if the original forecast is calibrated.
• DSC ≥ 0 with equality if the PAV-calibrated forecast is constant.
• The decomposition is exact.

> Perhaps surprisingly, the PAV algorithm and its appealing properties generalize from prob-
abilistic classifiers to mean, quantile, and expectile assessments for real-valued outcomes

### Describe alternatives you've considered, if relevant

_No response_

### Additional context

 #23132 is related and might be a good first step towards `score_decomposition`.

Solving this issue will also solve #21774.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Additive Score/Metric decomposition into Miscalibration, Discrimination and Uncertainty #23767

Describe the workflow you want to enable

Proposition

For classification with Brier score

For quantile regression with pinball loss

References

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Additive Score/Metric decomposition into Miscalibration, Discrimination and Uncertainty #23767

Description

Describe the workflow you want to enable

Proposition

For classification with Brier score

For quantile regression with pinball loss

References

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions