Investigate and document pitfalls with permutation importance on imbalanced classification problems with non-independent features

While working on the review of #19411, @glemaitre found out that the permutation importance ranking for different metrics (accuracy, balanced accuracy, precision, recall) could vary a lot on imbalanced classification problems.

Perhaps more surprisingly, it is possible to train models with some features that have very significantly negative importance for some features: the permutation of those features makes their metric improve significantly! Furthermore the same features can be ranked most important for the same model using another choice of metric (e.g. balanced accuracy instead of accuracy or the converse).

This behavior makes it really problematic to rely on permutation importance to inspect models on imbalanced classification problems (which are very common in practice). I think we should really warn our users about this.

It's not that easy to reproduce this phenomenon on minimal reproduction cases but it seems to happen in practice (for instance by rebalancing Adult Census to a 1/10 positive to negative class ratio). 

We still need to clean-up @glemaitre's notebook to properly demonstrate this problem but I wanted to open this issue now to avoid forgetting about this problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Investigate and document pitfalls with permutation importance on imbalanced classification problems with non-independent features #19448

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Investigate and document pitfalls with permutation importance on imbalanced classification problems with non-independent features #19448

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions