v1.4.4 ("Blitzen") is a major release, featuring numerous updates and bugfixes
(totaling 400+ commits spread across ~8 months), including
- Updates to
Lrnr_nnlsto support binary outcomes, including support for convexity of the resultant model fit and warnings on prediction quality. - Changes to
Lrnr_cv_selectorto support improved computation of the CV-risk, averaging the risk strictly across validation/holdout sets. - Update
Lrnr_slby adding a new private slot.cv_riskto store the risk estimates, using this to avoid unnecessary re-computation in theprintmethod (the.cv_riskslot is populated on the firstprintcall, and only ever re-printed thereafter). - Fix
Lrnr_screener_importance's pairing of (a) covariates returned by the importance function with (b) covariates as they are defined in the task. This issue only arose when discrete covariates were automatically one-hot encoded upon task initiation (i.e., whencolnames(task$X) != task$nodes$covariates). - Enhanced functionality in
sl3task'sadd_interactionsmethod to support interactions that involve factors. This method is most commonly used byLrnr_define_interactions, which is intended for use with another learner (e.g.,Lrnr_glmnetorLrnr_glm) in aPipeline. - Modified
Lrnr_gamformula (if not specified by user) to not usemgcv's defaultk=10degrees of freedom for each smoothsterm when there are less thank=10degrees of freedom. This bypasses anmgcv::gamerror, and tends to be relevant only for small n. - Incorporated
min_screenargumentLrnr_screener_coefs, which tries to ensure that at leastmin_screennumber of covariates are selected. If this argument is specified and thelearnerargument inLrnr_screener_coefsis aLrnr_glmnet, thenlambdais increased untilmin_screennumber of covariates are selected and a warning is produced. Ifmin_screenis specified and thelearnerargument inLrnr_screener_coefsis not aLrnr_glmnetthen it will error. - Added
formulaparameter andprocess_formulafunction to the base learner,Lrnr_base, whose methods carry over to all other learners. When aformulais supplied as a learner parameter, theprocess_formula function constructs a design matrix by supplying theformulatomodel.matrix. This implementation allowsformulato be supplied to all learners, even those without nativeformulasupport. Theformulashould be an object of class "formula`", or a character string that can be coerced to that class. - Added factory function for performance-based risks for binary outcomes with
ROCRperformance measurescustom_ROCR_risk. Supports cutoff-dependent and scalarROCRperformance measures. The risk is defined as 1 - performance, and is transformed back to the performance measure incv_riskandimportancefunctions. This change prompted the revision of argument nameloss_funandloss_functiontoeval_funandeval_function, respectively, since the evaluation of predictions relative to the observations can be either a risk or a loss function. This argument name change impacted the following:Lrnr_solnp,Lrnr_optim,Lrnr_cv_selector,cv_risk,importance, andCV_Lrnr_sl. - Incorporated stratified cross-validation when
foldsare not supplied to thesl3_Taskand the outcome is a discrete (i.e., binary or categorical) variable. - Added to the
importancemethod the option to evaluate importance overcovariate_groups, by removing/permuting all covariates in the same group together. - Added
Lrnr_gaas another metalearner.
See the NEWS file for complete details.