You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While working on the review of #19411, @glemaitre found out that the permutation importance ranking for different metrics (accuracy, balanced accuracy, precision, recall) could vary a lot on imbalanced classification problems.
Perhaps more surprisingly, it is possible to train models with some features that have very significantly negative importance for some features: the permutation of those features makes their metric improve significantly! Furthermore the same features can be ranked most important for the same model using another choice of metric (e.g. balanced accuracy instead of accuracy or the converse).
This behavior makes it really problematic to rely on permutation importance to inspect models on imbalanced classification problems (which are very common in practice). I think we should really warn our users about this.
It's not that easy to reproduce this phenomenon on minimal reproduction cases but it seems to happen in practice (for instance by rebalancing Adult Census to a 1/10 positive to negative class ratio).
We still need to clean-up @glemaitre's notebook to properly demonstrate this problem but I wanted to open this issue now to avoid forgetting about this problem.