Releases: mlr-org/xplainfi
Releases · mlr-org/xplainfi
xplainfi v0.2.0: Many things work now
One might even say this is the true 0.1.0.
User-facing API improvements
Importance aggregation and confidence intervals
$importance()gainsci_methodparameter for variance estimation (#40):"none"(default): Simple aggregation without confidence intervals"raw": Uncorrected variance estimates (informative only, CIs too narrow)"nadeau_bengio": Variance correction by Nadeau & Bengio (2003) as recommended by Molnar et al. (2023)"quantile": Empirical quantile-based confidence intervals"cpi": Conditional Predictive Impact for perturbation methods (PFI/CFI/RFI), supporting t-, Wilcoxon-, Fisher-, and binomial tests
- CPI is now properly scoped to
PerturbationImportancemethods only (not available for WVIM/LOCO or SAGE) $importance()gainsstandardizeparameter to normalize scores to [-1, 1] range$importance()and$scores()gainrelationparameter (default:"difference") to compute importances as difference or ratio of baseline and post-modification loss- Moved from
$compute()to avoid recomputing predictions/refits when changing aggregation method
- Moved from
Data simulation helpers
- Add focused simulation DGPs for testing importance methods:
sim_dgp_independent(): Baseline with additive independent effectssim_dgp_correlated(): Highly correlated features (PFI fails, CFI succeeds)sim_dgp_mediated(): Mediation structure (total vs direct effects)sim_dgp_confounded(): Confounding structuresim_dgp_interactions(): Interaction effects between features
- Each DGP illustrates specific methodological challenges for importance methods
Observation-wise losses and predictions
$obs_loss()computes observation-wise importance scores whenmeasurehas aMeasure$obs_loss()method$predictionsfield stores prediction objects for further analysis
Grouped feature importance
PerturbationImportanceandWVIMmethods supportgroupsparameter for grouped feature importance:- Example:
groups = list(effects = c("x1", "x2", "x3"), noise = c("noise1", "noise2")) - In output,
featurecolumn contains group names instead of individual features - Allows measuring importance of feature sets rather than individual features
- Example:
Method-specific improvements
WVIM (Williamson's Variable Importance Measure)
- Generalizes LOCO (Leave-One-Covariate-Out) and LOCI (Leave-One-Covariate-In)
- Implemented using
mlr3fselectfor cleaner internals - Parameter renamed:
iters_refit→n_repeatsfor consistency
PerturbationImportance (PFI, CFI, RFI)
-
Performance improvements:
- Uses
learner$predict_newdata_fast()for faster predictions (requires mlr3 >= 1.1.0) - Batches permutation iterations internally to reduce
sampler$sample()calls - New
batch_sizeparameter to control memory usage with large datasets
- Uses
-
Parallelization support:
- Parallel execution via
miraiorfuturebackends - Set up with
mirai::daemons()orfuture::plan() - Parallelizes across features within each resampling iteration
- Parallel execution via
-
Parameter renamed:
iters_perm→n_repeatsfor consistency
Feature Samplers
-
Breaking changes:
- Refactored API separates task-based vs external data sampling (#49):
$sample(feature, row_ids): Samples from stored task using row IDs$sample_newdata(feature, newdata): Samples from external data
- Renamed sampler classes for hierarchical consistency:
PermutationSampler→MarginalPermutationSamplerARFSampler→ConditionalARFSamplerGaussianConditionalSampler→ConditionalGaussianSamplerKNNConditionalSampler→ConditionalKNNSamplerCtreeConditionalSampler→ConditionalCtreeSampler
- Standardized parameter name:
conditioning_setfor features to condition on
- Refactored API separates task-based vs external data sampling (#49):
-
New samplers:
MarginalSampler: Base class for marginal sampling methodsMarginalReferenceSampler: Samples complete rows from reference data (for SAGE)KnockoffSampler: Knockoff-based sampling (#16 via @mnwright)- Convenience wrappers:
KnockoffGaussianSampler,KnockoffSequentialSampler - Supports
row_ids-based sampling itersparameter for multiple knockoff iterations- Compatible with CFI (not RFI/SAGE)
- Convenience wrappers:
SAGE (Shapley Additive Global Importance)
-
Bug fix:
ConditionalSAGEnow properly uses conditional sampling (was accidentally using marginal sampling) -
Performance improvements:
- Uses
learner$predict_newdata_fast()for faster predictions batch_sizeparameter controls memory usage for large coalitions
- Uses
-
Convergence tracking (#29, #33):
- Enable with
early_stopping = TRUE - Stops when relative standard error falls below
se_threshold(default: 0.01) - Requires at least
min_permutations(default: 3) - Checks convergence every
check_intervalpermutations (default: 1) - New fields:
$converged: Boolean indicating if convergence was reached$n_permutations_used: Actual permutations used (may be less than requested)$convergence_history: Per-feature importance and SE over permutations
$plot_convergence(): Visualize convergence curves- Convergence tracked for first resampling iteration only
- Enable with