BREAKING: make shap explanations consistent for xgboost and lightgbm #3318

CloseChoice · 2023-10-08T12:23:47Z

Overview

Closes #3187
Closes #2887

Description of the changes proposed in this pull request:

make the shap explanations outputs consistent for the binary feature interaction case for xgboost and lightgbm
adapt test to the new output shape

NOTE: This is a breaking change.

Checklist

All pre-commit checks pass.
Unit tests added (if fixing a bug or adding a new feature)

codecov · 2023-10-08T12:50:58Z

Codecov Report

Attention: Patch coverage is 97.22222% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 61.33%. Comparing base (188e010) to head (1ab9da8).

Files	Patch %	Lines
shap/explainers/_linear.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3318      +/-   ##
==========================================
+ Coverage   60.74%   61.33%   +0.59%     
==========================================
  Files          90       90              
  Lines       12718    12721       +3     
==========================================
+ Hits         7725     7802      +77     
+ Misses       4993     4919      -74

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

CloseChoice · 2023-10-09T15:12:46Z

Here are a couple of things that I think could need refactoring:

I think it would be good to have some kind prediction_type which can take the values: binary_classification, multi_class_classifiction, multi_label_classification, regression, multi_target_regression. With this, we could simplify some code
the whole structure of the code, here and here looks to me like it is build bit by bit but could be refactored nicely and some complexity could be removed in the process aswell.

CloseChoice · 2023-10-09T19:29:22Z

The tests will pass once PR #3325 gets merged

thatlittleboy · 2023-10-22T06:52:00Z

I'm supportive of the proposed changes.

We'll just have to make sure the implementation is thorough and documentations are updated accordingly to educate the users on the API changes.

CloseChoice · 2023-10-24T20:30:40Z

I'm supportive of the proposed changes.

We'll just have to make sure the implementation is thorough and documentations are updated accordingly to educate the users on the API changes.

Anything else you want to be tested or added in this PR?

connortann · 2023-11-29T16:45:52Z

May I suggest adding to the docstring:

A description of precisely what the shapes will be, depending on the prediction type
A change notice, e.g. "Changed in v0.44.0: the shape of returned shap values is changed from ... to ..."

We can also ensure we put this change notice in the release notes. The PR title and description is currently a little vague as to what exactly has changed in the API.

CloseChoice · 2023-12-08T13:23:19Z

connected to #2675. Note to myself. Check if output of random forest is consistent with the other 3 outputs.

for more information, see https://pre-commit.ci

shap/explainers/_tree.py

connortann

Some slight formatting adjustments but otherwise LGTM!

Co-authored-by: connortann <[email protected]>

CloseChoice · 2024-03-07T06:47:46Z

Some slight formatting adjustments but otherwise LGTM!

Merged your suggestions. Thanks once again for the review.

connortann

Thanks again for all your work on this. There are a few remaining docs formatting issues, but I can address them in a subsequent PR. Let's get it in!

connortann · 2024-03-07T11:13:57Z

@CloseChoice I noticed a test failure on master, which might be related. Would you kindly take a look - perhaps we need to loosen the tolerances of a few of the np.asset_all_close calls, and/or ensure the random seed is fixed?

Example failing run:
https://github.com/shap/shap/actions/runs/8186093687/job/22383771181

I'll re-run the failed job to see if it's reproducible...

CloseChoice · 2024-03-07T17:37:00Z

@connortann yes, looks like we the allowed tolerance is too low. Will add a pull request for this.

From the failed run logs:

Mismatched elements: 1 / 96 (1.04%)
Max absolute difference: 1.53473644e-05
Max relative difference: 4.40842603e-05

imatiach-msft · 2025-01-28T16:54:53Z

The title and description for this PR is completely wrong. It does not just "make the shap explanations outputs consistent for the binary feature interaction case for xgboost and lightgbm". It breaks the output format of every explainer in shap from (# classes x # examples x # features) to (# examples x # features x # classes) and changes the output from list of 2D np.ndarrays to 3D np.ndarray. This is a huge headache for downstream users and libraries that have used the old format for many years now - in just our OSS library this shap update has caused 60 test failures when we can usually update without any issues. Also, I don't understand why every explainer needs to be changed instead of xgboost. The old format seemed much better and was much easier to understand - it was a lot easier to tell if the model is a classifier or regressor. Indexing into the class to get the shap values for that class is much easier than trying to figure out how to get the third dimension and slice the array for users - often this is the most common operation and what users pass to visualizations. Many other text/vision explainers and other libraries follow this logic as well. The new format is harder to understand and use in my opinion and xgboost should have been changed instead of lightgbm and every shap explainer (kernel, deep tf/pytorch, gradient, linear, tree) in this library.

connortann · 2025-01-29T17:17:51Z

@imatiach-msft I'm troubled to hear that this change is causing you a huge headache to accommodate. I'm keen to see if we can collectively figure out the best way forward.

The original inconsistency described in #3187 was quite a compelling argument that the shapes of the Explanation object were hard to understand and inconsistent. There's still a huge amount of inconsistent & legacy code in shap, and the documentation is lacking, so I think it can be hard to understand and interpret the various attributes of Explainer and Explanation objects. The issue tracker was (and still is) swamped with bug reports that relate to this kind of inconsistency in shapes, which I suppose fed the motivation to try to standardise it.

As a long-time contributor to the ML interpretability ecosystem, do you have a view of what would be a better resolution to the discussion in #3187? It's theoretically possible to make another breaking change I suppose, if that is really merited.

It's worth emphasising as well that this package is direly short of maintainers, so whilst the intent is always to try to make the package more internally consistent and helpful for downstream users, changes like this do not get as much discussion and review as they perhaps deserve. #3559 has more context. Anything we can do to grow the pool of maintainers would be helpful, especially to include folk involved in downstream packages.

imatiach-msft · 2025-01-29T21:21:48Z

@connortann I think there are two parts to this:
1.) How can we reduce breaking changes in the future
2.) Is the new format (# examples x # features x # classes) superior to the old one (# classes x # examples x # features)

In regards to the first point, actually shap did this very well several years ago. Scott Lundberg added a new API, __call__, which returned an explanation object instead of changing the old shap_values API and format. It would be better if we could do this in the future if we want to change the output and we don't have a choice.
Another part of this is to define clearly what the explanation format from shap_values is and have tests to validate it. It's one thing to fix one explainer because it's outputting something off, it's quite another to modify every single explainer's output format. That's going to cause major issues for users, since as soon as they install a newer or older version of shap they will need to change a lot of their code, and downstream libraries that handle shap explanations now have to deal with multiple explanation formats based on the shap package version.

In regards to the second point, I still believe the old format was superior to the new one for shap_values and this format makes less sense. However, given that this breaking change was already made, and we want to reduce breaking changes, I'm not sure if doing another breaking change is a good idea. We should just make sure we don't do any more breaking changes like this. In addition, Scott added the new __call__ API on explainers which returns an explanation object which is the new output format. Our library still uses the legacy shap_values API and it's interesting that shap users/contributors are making a lot of changes to it - which seems to indicate that it is still widely used despite being the old API.

imatiach-msft · 2025-01-29T21:28:16Z

For more context on the __call__ API change see this PR from 2020:
"Major refactor to support new API (backward compat retained)"
ddcfa16
Note how there was an emphasis to keep backwards compatibility and add this new __call__ API that returns an explanation object, whereas this change completely breaks the shap_values API format that has been there for more than 7 (?) years now

connortann · 2025-01-30T12:15:44Z

It would be great to hear @CloseChoice & @thatlittleboy 's perspective, as I think they are more knowledgeable about the various tradeoffs in this PR. Unfortunately there haven't been any active maintainers recently so I really value getting more input.

We are overdue for a release, so I wonder if there is an opportunity to try to mitigate the headache for downstream packages somewhat. I'm struggling to come up with a workable proposal but this seems like a very important issue so I'd like to help out if I can.

I'm not sure if doing another breaking change is a good idea.

Yeah, I'm very hesitant about this idea too. On the flip side though, it could be worthwhile if a) it makes the package better in the long term, and b) we find a way to mitigate harm to downstream packages.

Maybe there could be a configurable setting, like output_shape: Literal["classfirst", "classlast"], so that users could opt-out of the new behaviour? That would add a lot of complexity, and wouldn't work for 0.45 & 0.46 unless this was somehow back-ported as a patch release.

Or alternatively, hypothetically if we switched back to the old-style output format and made a new release, downstream packages could include a dependency pin to exclude the problematic versions. We could retrospectively edit the release notes to highlight the changed behaviour and encourage users to upgrade. Whilst it's not ideal to make two breaking changes in fairly quick succession, do you think it would be better to restore the prior behaviour?

imatiach-msft · 2025-01-30T17:52:59Z

@connortann I like the idea of having the output_shape: Literal["classfirst", "classlast"]. I think my main complaint is that:
1.) The PR title and description is wrong. The scope of changes implemented here were much bigger than in the title and description.
2.) shap should not make breaking changes so easily to every explainer in the library. There needs to be tests, verification and documentation about the expected format and before making breaking changes like this there needs to be a lot more discussion and agreement as this affects many notebooks, existing production code, etc.
3.) Ideally, instead of making breaking changes, a new API like the __call__ API change should be implemented. Also, I'm not sure why we are making changes to shap_values method since it is the legacy way to call explainers and the call API was supposed to be the new preferred way. The fact that we are seeing changes makes me wonder if users actually still prefer to use the shap_values API over the call API. If we are to make breaking changes to the explanation object from call API, I think it would be useful to have some version on the explanation object so it will be easier for downstream libraries to maintain code and have different logic based on the explanation object version. Our library was written several years before the call API was added and it still uses the legacy shap_values method which returns the simple array/list of arrays format.

CloseChoice · 2025-01-30T22:49:25Z

Totally agree, that the title is misleading and we should have better documentation on this. I saw a couple issues that relate to the breaking changes in here, so it's a couple users that are affected. I would still argue that the format is easier to grasp, but that might just be my implementation bias and doesn't help with the problem at hand.

For future changes I totally agree, we should introduce new APIs instead of breaking old ones, and if not possible otherwise, either back-ported new changes to older versions to give users time to adapt or throw deprecation warnings.

EDIT: in retrospect adding a output_shape: Literal["classfirst", "classlast"] would have still be a breaking change but probably kept most of the downstream code working. Would it be an option to add these parameters, but still keep "classlast" as default, to avoid another breaking change? That would make it the most convenient for users to upgrade? I am also willing to do this.

I saw that we'll changed the default value of KernelExplainers shap_valuesfunction in 0.47.0. And while I agree thatshap_valuesis somehow the legacy interface, this is not clearly communicated in the docs, so IMO we'll need to guarantee consistency betweencallandshap_valuesor make it transparent that we'll freeze theshap_values` API, so not adding any new keywords but also saveit from breaking. @connortann @imatiach-msft what do you think about that?

connortann · 2025-01-31T13:17:02Z

To address @imatiach-msft 's comments:

The PR title and description is wrong. The scope of changes implemented here were much bigger than in the title and description.

Yes, this is totally fair. I should have picked up on that as a reviewer. Also, I think the release notes aren't clear enough about the nature of the change and they can be improved.

shap should not make breaking changes so easily to every explainer in the library. There needs to be tests, verification and documentation about the expected format and before making breaking changes like this there needs to be a lot more discussion and agreement as this affects many notebooks, existing production code, etc.

Again, fair. I think the existing tests are actually pretty ok, but we're definitely lacking sufficient documentation about the expected outputs. I know @CloseChoice and others are trying to improve that, e.g. on #3939, which will be v. helpful. We should prioritise documentating the shapes of the attributes and how to slice them.

In retrospect one mistake was not going through a careful deprecation cycle as we usually do, as tracked on #3507.

Also, I'm not sure why we are making changes to shap_values method since it is the legacy way to call explainers and the call API was supposed to be the new preferred way

On this point, I think it's preferable to have the output shapes consistent between the two methods. It would be quite confusing if the shap_values array had a different shape depending on how it was created.

There is lots of legacy code in shap, which makes it a nightmare to maintain in many ways. For example, as of a year ago half the plots expected numpy arrays, and the other half expected Explanation objects; and the main "summary plot" is an alias for a legacy function that has a newer duplicate implementation, is poorly tested, and is completely missing from the documentation.

So, if we want a hope of being able to fix bugs and even add new features that the community is requesting, I think we absolutely need to be able to deprecate old functionality to make the library a little more consistent and less duplicative. But I agree with your point, we nonetheless should still be extremely cautious about breaking changes.

in retrospect adding a output_shape: Literal["classfirst", "classlast"] would have still be a breaking change but probably kept most of the downstream code working. Would it be an option to add these parameters, but still keep "classlast" as default, to avoid another breaking change?

Yep, agreed about the default. However - thinking it through a bit would it actually help downstream projects ? Supposing hyothetically shap added & released this new functionality, there would still be several versions without this option, so potentially quite a large range of versions would need to be excluded. Or, downstream packages could add a version constraint to only the latest shap>=1.48 (or whatever the future release is). What would be best for you @imatiach-msft : would you be interested in having this option in a future version, or at this point would it be simplest just to migrate to the new output format?

imatiach-msft · 2025-01-31T17:59:32Z

Would it be an option to add these parameters, but still keep "classlast" as default, to avoid another breaking change?
Not sure about others who might experience these but I am just trying to upgrade to latest now. My hope is that we won't have such breaking changes in the future or they can be more limited. My fix is to reshape the output so that even though shap has breaking changes our library won't, since we have many internal and external customers depending on this middle layer. My hope is that I can try to support both shap before and after these changes but keep our API as is so all of the downstream code that consumes our library won't need to be changed. I would not recommend to add this classlast parameter at this point since it would just complicate the shap source code more and possibly add more issues.

Or, downstream packages could add a version constraint to only the latest shap>=1.48 (or whatever the future release is).
You always want to support as much as possible, so as many users as possible can use the package across their various different python versions and dependencies. However, at some point the spaghetti code becomes too much to handle and this is the only remaining choice left.
The other big problem is that we serialize and deserialize explanations (upload and download explanations and explainers) in various different scenarios, so on one environment you might have a newer version of shap while in another an older version. Just using the version of the shap package can cause issues in these situations, the code has to be written carefully with this in mind.

The main problem for our team is that shap is used everywhere by a ton of customers, 1P and 3P, and with this breaking change there are a ton of unknowns. In addition, I am supposed to be working on new genai projects now. Just trying to support the legacy software that is still used by many customers and then this sort of change just turns the legacy code maintenance into a big project.

CloseChoice changed the title ~~make shap explanations consistent for xgboost and lightgbm~~ BREAKING: make shap explanations consistent for xgboost and lightgbm Oct 8, 2023

CloseChoice added 4 commits October 9, 2023 11:17

make shap explanations consistent for xgboost and lightgbm

ff07e49

remove print

4e72b7b

WIP: working version

2bab15a

add test

4d31340

CloseChoice force-pushed the BREAK-consistent-outputs-xgboost-lightgbm branch from b61a3c6 to 4d31340 Compare October 9, 2023 14:52

add github issue number

e960ac8

CloseChoice marked this pull request as ready for review October 9, 2023 15:12

CloseChoice and others added 2 commits October 10, 2023 18:43

Merge branch 'master' into BREAK-consistent-outputs-xgboost-lightgbm

32344f1

fix test

d85d4da

thatlittleboy added the BREAKING Indicates that a PR is introducing a breaking change label Oct 14, 2023

thatlittleboy added this to the 0.44.0 milestone Oct 22, 2023

connortann modified the milestones: 0.44.0, 0.45.0 Dec 6, 2023

This was referenced Dec 8, 2023

ENH: Run Notebooks in CI pipeline #3430

Closed

BUG: Inconsistencies between different tree-based ensembles with the TreeExplainer #3432

Open

CloseChoice and others added 7 commits January 10, 2024 20:59

Merge branch 'master' of github.com:shap/shap into dummy

916084c

WIP: fix last tests

1aa1558

WIP: try differently by applying stack after check_additivity

7d911af

WIP: all tests working, add test for regression models

fdaa3e9

add catboost interaction test

fc2d281

add tests for regressions

78219f4

[pre-commit.ci] auto fixes from pre-commit.com hooks

eca01b4

for more information, see https://pre-commit.ci

connortann reviewed Mar 6, 2024

View reviewed changes

shap/explainers/_tree.py Show resolved Hide resolved

connortann requested changes Mar 6, 2024

View reviewed changes

CloseChoice and others added 3 commits March 7, 2024 07:46

Update shap/explainers/_deep/__init__.py

8c96b84

Co-authored-by: connortann <[email protected]>

Update shap/explainers/_tree.py

4a97110

Co-authored-by: connortann <[email protected]>

Merge branch 'master' into BREAK-consistent-outputs-xgboost-lightgbm

1ab9da8

connortann approved these changes Mar 7, 2024

View reviewed changes

connortann merged commit ea3bfc8 into shap:master Mar 7, 2024

connortann mentioned this pull request Mar 7, 2024

DOCS: Formatting and phrasing updates for SHAP return values #3549

Merged

CloseChoice deleted the BREAK-consistent-outputs-xgboost-lightgbm branch March 7, 2024 10:29

CloseChoice mentioned this pull request Mar 8, 2024

reintroduce tests that were accidentally commented out #3556

Merged

2 tasks

hongfanmeng mentioned this pull request Mar 20, 2024

BUG: 0.45.0 update breaks pytorch example on docs #3581

Closed

4 tasks

connortann mentioned this pull request Jul 4, 2024

TypeError: only integer scalar arrays can be converted to a scalar index #2893

Closed

This was referenced Jul 17, 2024

BUG: Unexpected Interaction Plot Instead of Summary Plot in Multiclass SHAP Summary with XGBoost #3630

Closed

Ensure summary model can handle multiclass classification #3756

Closed

CloseChoice mentioned this pull request Sep 24, 2024

BUG: shap_values.cohorts multiclass classification error #3867

Closed

4 tasks

imatiach-msft mentioned this pull request Jan 27, 2025

update interpret-community to shap 0.46.0 interpretml/interpret-community#600

Merged

connortann mentioned this pull request Jan 28, 2025

BUG: beeswarm summary plot does not work after a random forest #3951

Open

4 tasks

celestinoxp mentioned this pull request Feb 24, 2025

[BUG]: shap tests - correlation and reason index out of range pycaret/pycaret#4152

Closed

3 tasks

BREAKING: make shap explanations consistent for xgboost and lightgbm #3318

BREAKING: make shap explanations consistent for xgboost and lightgbm #3318

Uh oh!

Conversation

CloseChoice commented Oct 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Checklist

Uh oh!

codecov bot commented Oct 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

CloseChoice commented Oct 9, 2023

Uh oh!

CloseChoice commented Oct 9, 2023

Uh oh!

thatlittleboy commented Oct 22, 2023

Uh oh!

CloseChoice commented Oct 24, 2023

Uh oh!

connortann commented Nov 29, 2023

Uh oh!

CloseChoice commented Dec 8, 2023

Uh oh!

Uh oh!

connortann left a comment

Choose a reason for hiding this comment

Uh oh!

CloseChoice commented Mar 7, 2024

Uh oh!

connortann left a comment

Choose a reason for hiding this comment

Uh oh!

connortann commented Mar 7, 2024

Uh oh!

CloseChoice commented Mar 7, 2024

Uh oh!

imatiach-msft commented Jan 28, 2025

Uh oh!

connortann commented Jan 29, 2025

Uh oh!

imatiach-msft commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

imatiach-msft commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

connortann commented Jan 30, 2025

Uh oh!

imatiach-msft commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CloseChoice commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

connortann commented Jan 31, 2025

Uh oh!

imatiach-msft commented Jan 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CloseChoice commented Oct 8, 2023 •

edited

Loading

codecov bot commented Oct 8, 2023 •

edited

Loading

imatiach-msft commented Jan 29, 2025 •

edited

Loading

imatiach-msft commented Jan 29, 2025 •

edited

Loading

imatiach-msft commented Jan 30, 2025 •

edited

Loading

CloseChoice commented Jan 30, 2025 •

edited

Loading