Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@CloseChoice
Copy link
Collaborator

@CloseChoice CloseChoice commented Oct 8, 2023

Overview

Closes #3187
Closes #2887

Description of the changes proposed in this pull request:

  • make the shap explanations outputs consistent for the binary feature interaction case for xgboost and lightgbm
  • adapt test to the new output shape

NOTE: This is a breaking change.

Checklist

  • All pre-commit checks pass.
  • Unit tests added (if fixing a bug or adding a new feature)

@CloseChoice CloseChoice changed the title make shap explanations consistent for xgboost and lightgbm BREAKING: make shap explanations consistent for xgboost and lightgbm Oct 8, 2023
@codecov
Copy link

codecov bot commented Oct 8, 2023

Codecov Report

Attention: Patch coverage is 97.22222% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 61.33%. Comparing base (188e010) to head (1ab9da8).

Files Patch % Lines
shap/explainers/_linear.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3318      +/-   ##
==========================================
+ Coverage   60.74%   61.33%   +0.59%     
==========================================
  Files          90       90              
  Lines       12718    12721       +3     
==========================================
+ Hits         7725     7802      +77     
+ Misses       4993     4919      -74     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@CloseChoice CloseChoice force-pushed the BREAK-consistent-outputs-xgboost-lightgbm branch from b61a3c6 to 4d31340 Compare October 9, 2023 14:52
@CloseChoice
Copy link
Collaborator Author

Here are a couple of things that I think could need refactoring:

  • I think it would be good to have some kind prediction_type which can take the values: binary_classification, multi_class_classifiction, multi_label_classification, regression, multi_target_regression. With this, we could simplify some code
  • the whole structure of the code, here and here looks to me like it is build bit by bit but could be refactored nicely and some complexity could be removed in the process aswell.

@CloseChoice CloseChoice marked this pull request as ready for review October 9, 2023 15:12
@CloseChoice
Copy link
Collaborator Author

The tests will pass once PR #3325 gets merged

@thatlittleboy thatlittleboy added the BREAKING Indicates that a PR is introducing a breaking change label Oct 14, 2023
@thatlittleboy
Copy link
Collaborator

I'm supportive of the proposed changes.

We'll just have to make sure the implementation is thorough and documentations are updated accordingly to educate the users on the API changes.

@thatlittleboy thatlittleboy added this to the 0.44.0 milestone Oct 22, 2023
@CloseChoice
Copy link
Collaborator Author

I'm supportive of the proposed changes.

We'll just have to make sure the implementation is thorough and documentations are updated accordingly to educate the users on the API changes.

Anything else you want to be tested or added in this PR?

@connortann
Copy link
Collaborator

May I suggest adding to the docstring:

  1. A description of precisely what the shapes will be, depending on the prediction type
  2. A change notice, e.g. "Changed in v0.44.0: the shape of returned shap values is changed from ... to ..."

We can also ensure we put this change notice in the release notes. The PR title and description is currently a little vague as to what exactly has changed in the API.

@connortann connortann modified the milestones: 0.44.0, 0.45.0 Dec 6, 2023
@CloseChoice
Copy link
Collaborator Author

connected to #2675. Note to myself. Check if output of random forest is consistent with the other 3 outputs.

Copy link
Collaborator

@connortann connortann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some slight formatting adjustments but otherwise LGTM!

@CloseChoice
Copy link
Collaborator Author

Some slight formatting adjustments but otherwise LGTM!

Merged your suggestions. Thanks once again for the review.

Copy link
Collaborator

@connortann connortann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for all your work on this. There are a few remaining docs formatting issues, but I can address them in a subsequent PR. Let's get it in!

@connortann connortann merged commit ea3bfc8 into shap:master Mar 7, 2024
@CloseChoice CloseChoice deleted the BREAK-consistent-outputs-xgboost-lightgbm branch March 7, 2024 10:29
@connortann
Copy link
Collaborator

@CloseChoice I noticed a test failure on master, which might be related. Would you kindly take a look - perhaps we need to loosen the tolerances of a few of the np.asset_all_close calls, and/or ensure the random seed is fixed?

Example failing run:
https://github.com/shap/shap/actions/runs/8186093687/job/22383771181

I'll re-run the failed job to see if it's reproducible...

@CloseChoice
Copy link
Collaborator Author

@connortann yes, looks like we the allowed tolerance is too low. Will add a pull request for this.

From the failed run logs:

Mismatched elements: 1 / 96 (1.04%)
Max absolute difference: 1.53473644e-05
Max relative difference: 4.40842603e-05

@imatiach-msft
Copy link
Collaborator

The title and description for this PR is completely wrong. It does not just "make the shap explanations outputs consistent for the binary feature interaction case for xgboost and lightgbm". It breaks the output format of every explainer in shap from (# classes x # examples x # features) to (# examples x # features x # classes) and changes the output from list of 2D np.ndarrays to 3D np.ndarray. This is a huge headache for downstream users and libraries that have used the old format for many years now - in just our OSS library this shap update has caused 60 test failures when we can usually update without any issues. Also, I don't understand why every explainer needs to be changed instead of xgboost. The old format seemed much better and was much easier to understand - it was a lot easier to tell if the model is a classifier or regressor. Indexing into the class to get the shap values for that class is much easier than trying to figure out how to get the third dimension and slice the array for users - often this is the most common operation and what users pass to visualizations. Many other text/vision explainers and other libraries follow this logic as well. The new format is harder to understand and use in my opinion and xgboost should have been changed instead of lightgbm and every shap explainer (kernel, deep tf/pytorch, gradient, linear, tree) in this library.

@connortann
Copy link
Collaborator

@imatiach-msft I'm troubled to hear that this change is causing you a huge headache to accommodate. I'm keen to see if we can collectively figure out the best way forward.

The original inconsistency described in #3187 was quite a compelling argument that the shapes of the Explanation object were hard to understand and inconsistent. There's still a huge amount of inconsistent & legacy code in shap, and the documentation is lacking, so I think it can be hard to understand and interpret the various attributes of Explainer and Explanation objects. The issue tracker was (and still is) swamped with bug reports that relate to this kind of inconsistency in shapes, which I suppose fed the motivation to try to standardise it.

As a long-time contributor to the ML interpretability ecosystem, do you have a view of what would be a better resolution to the discussion in #3187? It's theoretically possible to make another breaking change I suppose, if that is really merited.

It's worth emphasising as well that this package is direly short of maintainers, so whilst the intent is always to try to make the package more internally consistent and helpful for downstream users, changes like this do not get as much discussion and review as they perhaps deserve. #3559 has more context. Anything we can do to grow the pool of maintainers would be helpful, especially to include folk involved in downstream packages.

@imatiach-msft
Copy link
Collaborator

imatiach-msft commented Jan 29, 2025

@connortann I think there are two parts to this:
1.) How can we reduce breaking changes in the future
2.) Is the new format (# examples x # features x # classes) superior to the old one (# classes x # examples x # features)

In regards to the first point, actually shap did this very well several years ago. Scott Lundberg added a new API, __call__, which returned an explanation object instead of changing the old shap_values API and format. It would be better if we could do this in the future if we want to change the output and we don't have a choice.
Another part of this is to define clearly what the explanation format from shap_values is and have tests to validate it. It's one thing to fix one explainer because it's outputting something off, it's quite another to modify every single explainer's output format. That's going to cause major issues for users, since as soon as they install a newer or older version of shap they will need to change a lot of their code, and downstream libraries that handle shap explanations now have to deal with multiple explanation formats based on the shap package version.

In regards to the second point, I still believe the old format was superior to the new one for shap_values and this format makes less sense. However, given that this breaking change was already made, and we want to reduce breaking changes, I'm not sure if doing another breaking change is a good idea. We should just make sure we don't do any more breaking changes like this. In addition, Scott added the new __call__ API on explainers which returns an explanation object which is the new output format. Our library still uses the legacy shap_values API and it's interesting that shap users/contributors are making a lot of changes to it - which seems to indicate that it is still widely used despite being the old API.

@imatiach-msft
Copy link
Collaborator

imatiach-msft commented Jan 29, 2025

For more context on the __call__ API change see this PR from 2020:
"Major refactor to support new API (backward compat retained)"
ddcfa16
Note how there was an emphasis to keep backwards compatibility and add this new __call__ API that returns an explanation object, whereas this change completely breaks the shap_values API format that has been there for more than 7 (?) years now

@connortann
Copy link
Collaborator

It would be great to hear @CloseChoice & @thatlittleboy 's perspective, as I think they are more knowledgeable about the various tradeoffs in this PR. Unfortunately there haven't been any active maintainers recently so I really value getting more input.

We are overdue for a release, so I wonder if there is an opportunity to try to mitigate the headache for downstream packages somewhat. I'm struggling to come up with a workable proposal but this seems like a very important issue so I'd like to help out if I can.

I'm not sure if doing another breaking change is a good idea.

Yeah, I'm very hesitant about this idea too. On the flip side though, it could be worthwhile if a) it makes the package better in the long term, and b) we find a way to mitigate harm to downstream packages.

Maybe there could be a configurable setting, like output_shape: Literal["classfirst", "classlast"], so that users could opt-out of the new behaviour? That would add a lot of complexity, and wouldn't work for 0.45 & 0.46 unless this was somehow back-ported as a patch release.

Or alternatively, hypothetically if we switched back to the old-style output format and made a new release, downstream packages could include a dependency pin to exclude the problematic versions. We could retrospectively edit the release notes to highlight the changed behaviour and encourage users to upgrade. Whilst it's not ideal to make two breaking changes in fairly quick succession, do you think it would be better to restore the prior behaviour?

@imatiach-msft
Copy link
Collaborator

imatiach-msft commented Jan 30, 2025

@connortann I like the idea of having the output_shape: Literal["classfirst", "classlast"]. I think my main complaint is that:
1.) The PR title and description is wrong. The scope of changes implemented here were much bigger than in the title and description.
2.) shap should not make breaking changes so easily to every explainer in the library. There needs to be tests, verification and documentation about the expected format and before making breaking changes like this there needs to be a lot more discussion and agreement as this affects many notebooks, existing production code, etc.
3.) Ideally, instead of making breaking changes, a new API like the __call__ API change should be implemented. Also, I'm not sure why we are making changes to shap_values method since it is the legacy way to call explainers and the call API was supposed to be the new preferred way. The fact that we are seeing changes makes me wonder if users actually still prefer to use the shap_values API over the call API. If we are to make breaking changes to the explanation object from call API, I think it would be useful to have some version on the explanation object so it will be easier for downstream libraries to maintain code and have different logic based on the explanation object version. Our library was written several years before the call API was added and it still uses the legacy shap_values method which returns the simple array/list of arrays format.

@CloseChoice
Copy link
Collaborator Author

CloseChoice commented Jan 30, 2025

Totally agree, that the title is misleading and we should have better documentation on this. I saw a couple issues that relate to the breaking changes in here, so it's a couple users that are affected. I would still argue that the format is easier to grasp, but that might just be my implementation bias and doesn't help with the problem at hand.

For future changes I totally agree, we should introduce new APIs instead of breaking old ones, and if not possible otherwise, either back-ported new changes to older versions to give users time to adapt or throw deprecation warnings.

EDIT: in retrospect adding a output_shape: Literal["classfirst", "classlast"] would have still be a breaking change but probably kept most of the downstream code working. Would it be an option to add these parameters, but still keep "classlast" as default, to avoid another breaking change? That would make it the most convenient for users to upgrade? I am also willing to do this.

I saw that we'll changed the default value of KernelExplainers shap_valuesfunction in 0.47.0. And while I agree thatshap_valuesis somehow the legacy interface, this is not clearly communicated in the docs, so IMO we'll need to guarantee consistency betweencallandshap_valuesor make it transparent that we'll freeze theshap_values` API, so not adding any new keywords but also saveit from breaking. @connortann @imatiach-msft what do you think about that?

@connortann
Copy link
Collaborator

To address @imatiach-msft 's comments:

The PR title and description is wrong. The scope of changes implemented here were much bigger than in the title and description.

Yes, this is totally fair. I should have picked up on that as a reviewer. Also, I think the release notes aren't clear enough about the nature of the change and they can be improved.

shap should not make breaking changes so easily to every explainer in the library. There needs to be tests, verification and documentation about the expected format and before making breaking changes like this there needs to be a lot more discussion and agreement as this affects many notebooks, existing production code, etc.

Again, fair. I think the existing tests are actually pretty ok, but we're definitely lacking sufficient documentation about the expected outputs. I know @CloseChoice and others are trying to improve that, e.g. on #3939, which will be v. helpful. We should prioritise documentating the shapes of the attributes and how to slice them.

In retrospect one mistake was not going through a careful deprecation cycle as we usually do, as tracked on #3507.

Also, I'm not sure why we are making changes to shap_values method since it is the legacy way to call explainers and the call API was supposed to be the new preferred way

On this point, I think it's preferable to have the output shapes consistent between the two methods. It would be quite confusing if the shap_values array had a different shape depending on how it was created.

There is lots of legacy code in shap, which makes it a nightmare to maintain in many ways. For example, as of a year ago half the plots expected numpy arrays, and the other half expected Explanation objects; and the main "summary plot" is an alias for a legacy function that has a newer duplicate implementation, is poorly tested, and is completely missing from the documentation.

So, if we want a hope of being able to fix bugs and even add new features that the community is requesting, I think we absolutely need to be able to deprecate old functionality to make the library a little more consistent and less duplicative. But I agree with your point, we nonetheless should still be extremely cautious about breaking changes.

in retrospect adding a output_shape: Literal["classfirst", "classlast"] would have still be a breaking change but probably kept most of the downstream code working. Would it be an option to add these parameters, but still keep "classlast" as default, to avoid another breaking change?

Yep, agreed about the default. However - thinking it through a bit would it actually help downstream projects ? Supposing hyothetically shap added & released this new functionality, there would still be several versions without this option, so potentially quite a large range of versions would need to be excluded. Or, downstream packages could add a version constraint to only the latest shap>=1.48 (or whatever the future release is). What would be best for you @imatiach-msft : would you be interested in having this option in a future version, or at this point would it be simplest just to migrate to the new output format?

@imatiach-msft
Copy link
Collaborator

Would it be an option to add these parameters, but still keep "classlast" as default, to avoid another breaking change?
Not sure about others who might experience these but I am just trying to upgrade to latest now. My hope is that we won't have such breaking changes in the future or they can be more limited. My fix is to reshape the output so that even though shap has breaking changes our library won't, since we have many internal and external customers depending on this middle layer. My hope is that I can try to support both shap before and after these changes but keep our API as is so all of the downstream code that consumes our library won't need to be changed. I would not recommend to add this classlast parameter at this point since it would just complicate the shap source code more and possibly add more issues.

Or, downstream packages could add a version constraint to only the latest shap>=1.48 (or whatever the future release is).
You always want to support as much as possible, so as many users as possible can use the package across their various different python versions and dependencies. However, at some point the spaghetti code becomes too much to handle and this is the only remaining choice left.
The other big problem is that we serialize and deserialize explanations (upload and download explanations and explainers) in various different scenarios, so on one environment you might have a newer version of shap while in another an older version. Just using the version of the shap package can cause issues in these situations, the code has to be written carefully with this in mind.

The main problem for our team is that shap is used everywhere by a ton of customers, 1P and 3P, and with this breaking change there are a ton of unknowns. In addition, I am supposed to be working on new genai projects now. Just trying to support the legacy software that is still used by many customers and then this sort of change just turns the legacy code maintenance into a big project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BREAKING Indicates that a PR is introducing a breaking change

Projects

None yet

4 participants