Releases: amices/mice
mice 3.19.0
mice 3.19.0
Major changes
-
Added
predict_mi()to generate predictions from models fitted on
multiply imputed datasets. The function pools predictions across
imputations using Rubin’s rules, and can return point predictions
or prediction intervals at a specified confidence level.Typical workflow:
- Fit a model separately on each completed dataset.
- Call
predict_mi()with the list of models and the corresponding
new data (per imputation). - Obtain either pooled predictions (
pool = TRUE) or per-imputation
predictions (pool = FALSE).
This functionality makes it easier to evaluate predictive performance
on test sets while correctly accounting for imputation uncertainty.Contributed: @Fdvanleeuwen, @thomvolker (#720)
-
Adds a correction for the Barnard-Rubin degrees of freedom calculation
that provides stabler results for small samples and zero within-imputation
variance. Contributed: @frederikfabriciusbjerre (#726)
Minor changes
-
Adds fallback for
lmerobjects inpool()without requiringbroom.mixed.
Contributed: @anya-decarlo (#728) -
Explicitly load
toenaildata from themicepackage to avoidlme4conflict. Contributed: @bbolker (#730)
mice 3.18.0
Major changes
-
Fixed a long-standing issue in the internal
augment()function that affected ordered factors (#713).Previously,
augment()would:- Convert ordered factors into unordered ones, and
- Reorder their levels alphabetically, ignoring the user-specified order.
The old behavior could degrade imputation quality for ordinal outcomes when using the
"polr"method, potentially causing model convergence issues or increased noise in imputations.The issue did not affect methods for unordered factors (
"logreg","polyreg","mnar.logreg"), where level order is inconsequential.Thanks to @mmansolf for identifying the problem and suggesting a fix. The updated
augment()now
correctly preserves theorderedclass and level order of factor variables. -
micewill now automatically move all passive variables to the end of thevisitSequencefor passive methods used without a user-specifiedvisitSequence.
This change in behavior ensures greater consistency at the end of each iteration.The new behavior works well for simple cases. However, for more complex situations — especially when passive variables depend on other passive variables — it is recommended to manually specify a
visitSequencethat updates each passive variable immediately after one of its right-hand side predictors changes. (#699) -
Adds the
calltypeargument tomice()for mixingpredictorMatrixandformulasspecifications per variable-block. Thecalltypeargument allows the user to specify some variables (or blocks of variables) by theformulasargument, and other variables bypredictorMatrixargument. (Note: This argument was calledmodeltypein version 3.17.1).calltypeis a character vector oflength(blocks)elements that indicates how the imputation model is specified. Entries can one of two values:"pred"or"formula". Ifcalltype = "pred", the predictors of the imputation model for the block are specified by the corresponding row of thepredictorMatrix. Ifcalltype = "formula"the imputation model is specified by relevant entry informulas. The default depends on the presence of theformulasargument. Ifformulasis present, thenmice()setscalltype = "formula"for any block for which aformulais specified. Otherwise,calltype = "pred". -
Introduces an optimized
matchindexC++ function to improve speed of predictive mean matching (#695)
Minor changes
- Updates security dependabot to
dawidd6/action-download-artifact@v9 - Allow for negative adjusted R2 in
pool.r.squared()(#700) - Combines and updates tests for
lasso.select.norm()andlasso.norm()into one filetest-mice.impute.lasso.norm.R - Combines and updates tests for
lasso.select.logreg()andlasso.logreg()into one filetest-mice.impute.lasso.logreg.R - Adds support for roxygen markdown documentation
Bug fixes
mice 3.17.0
Major changes
-
Imputing categorical data by predictive mean matching. Predictive mean matching (PMM) is the default method of
mice()for imputing numerical variables, but it has long been possible to impute factors. This enhancement introduces better support to work with categorical variables in PMM. The former system translated factors into integers byynum <- as.integer(f). However, the order of integers inynummay have no sensible interpretation for an unordered factor. The new system quantifiesynumand could yield better results because of higher$R^2$ . The method calculates the canonical correlation betweeny(as dummy matrix) and a linear combination of imputation model predictorsx. The algorithm then replaces each category ofyby a single number taken from the first canonical variate. After this step, the imputation model is fitted, and the predicted values from that model are extracted to function as the similarity measure for the matching step. -
The method works for both ordered and unordered factors. No special precautions are taken to ensure monotonicity between the category numbers and the quantifications, so the method should be able to preserve quadratic and other non-monotone relations of the predicted metric. It may be beneficial to remove very sparsely filled categories, for which there is a new
trimargument. All you have to use the new technique is specify tomice(..., method = "pmm", ...). Both numerical and categorical variables will then be imputed by PMM. -
Potential advantages are:
- Simpler and faster than fitting a generalised linear model, e.g., logistic regression or the proportional odds model;
- Should be insensitive to the order of categories;
- No need to solve problems with perfect prediction;
- Should inherit the good statistical properties of predictive mean matching.
-
Note that we still lack solid evidence for these claims. (#576). Contributed @stefvanbuuren
-
New system-independent method for pooling: This version introduces a new function
pool.table()that takes a tidy table of parameter estimates stemming frommrepeated analyses. The input data must consist of three columns (parameter name, estimate, standard error) and a specification of the degrees of freedom of the model fitted to the complete data. Thepool.table()function outputs 14 pooled statistics in a tidy form. The primary use ofpool.table()is to support parameter pooling for techiques that have notidy()orglance()methods, either withinRor outsideR. Thepool.table()function also allows for a novel workflows that 1) break apart the traditionalpool()function into a data-wrangling part and a parameters-reducing part, and 2) does not necessarily depend on classed R objects. (#574). Contributed @stefvanbuuren -
literanger: Adds support for the
literangerpackage forrfimputation that is about twice as fast asranger(#648). Thanks @stephematician for the contribution.
Breaking changes
-
The
complete(..., action = "long", ...)command puts the columns named".imp"and".id"in the last two positions of the long data (instead of first two positions). In this way, the columns of the imputed data will have the same positions as in the original data, which is more user-friendly and easier to work with. Note that any existing code that assumes that variables".imp"and".id"are in columns 1 and 2 will need to be modified. The advice is to modify the code using the variable names".imp"and".id". If you want the old behaviour, specify the argumentorder = "first". (#569). Contributed @stefvanbuuren -
Drops support for S4. Convert S4-related code to S3. Syntax
as(df, "mids")is deprecated. Useas.mids(df)instead. -
Adopts the
broom-convention for naming lower and upper bounds of the confidence interval as"conf.low"and"conf.high". Do not use non-syntactic names anymore, like"2.5 %".
Minor changes
- Adds support for the
dotsargument toranger::ranger(...)inmice.impute.rf()(#563). Contributed @edbonneville - Prepares for the deprecation of the
blocksargument at various places - Removes the need for
blocksininitialize_chain() - In
rbind(), when formulas are concatenated and duplicate names are found, also rename the duplicated variables in formulas by their new name - Solves problem with the package documentation link
- Simplifies
NEWS.mdformatting to get correct version sequence on CRAN and in-package NEWS - Initialize single-variables blocks in
make.method()in a more efficient way (resolves #672) - Prevent
as.mids()from filling theimpobject for complete variables - Defines S3 class constructors for
mids,mads,miraandmipoobjects
Bug fixes
- Fixes the "large logo" problem. (#574). Contributed @hanneoberman
- Patches a bug in
complete()that auto-repeated imputed values into cells that should NOT be imputed (occurred as a special case ofrbind(), where the first set of rows was imputed and the second was not). - Replaces the internal variable
typeby the more informativepred(currently active row ofpredictorMatrix) - Fixes a bug in
filter.mids()that incorrectly removed empty components in theimpobject - Fixes a bug in
ibind()that incorrectly usedlength(blocks)as the first dimension of thechainMeanandchainVarobjects - Corrects the description
visitSequence,chainMeanandchainVarcomponents of themidsobject - Fixes problems with zero predictors (#588)
- Fixes a problem with the
minpucargument inquickpred()(#634) - Fixes
coef() not available on S4 objectwhen using withlavaan(#615, #616) - Adds
.github/dependabot.ymlconfiguration to automate daily check (#598) - Update documentation tags to
roxygen2 7.3.1requirements - Repairs lost braces in the documentation
- Fixes an installation problem when
Rprofileprints tostdouton Fedora, R version 4.1.3 (#646, #647). Thanks @brookslogan for the fix. - Fixes a bug during initialization of factor values
- Removes
methodsandrlangfromDepends - Removes export of non-user facing
ampute()helpers - Clears
\linkstatements that do not pass CRAN checks
mice 3.16.0
Major changes
-
Expands
futuremice()functionality by allowing for external packages and user-written functions (#550). Contributed @thomvolker -
Adds GH issue templates
bug_report,feature_requestandhelp_wanted(#560). Contributed @hanneoberman
Minor changes
- Removes documentation files for
rbind.mids()andcbind.mids()to conform to CRAN policy - Adds
mitmlandglmnetto imports so that test code conforms to_R_CHECK_DEPENDS_ONLY=trueflag inR CMD check - Initializes random number generator in
futuremice()if there is no.Random.seedyet. - Updates GitHub actions for package checking and site building
- Preserves user settings in
predictorMatrixfor case F by adding apredictorMatrixargument tomake.predictorMatrix() - Polishes
mice.impute.mpmm()example code
Bug fixes
- Adds proper support for factors to
mice.impute.2lonly.pmm()(#555) - Solves function naming problems for S3 generic functions
tidy(),update(),format()andsum() - Out-comments and weeds example&test code to silence
R CMD checkwith_R_CHECK_DEPENDS_ONLY=true - Fixes small bug in
futuremice()that throws an error when the number of cores is not specified, but the number of available cores is greater than the number of imputations. - Solves a bug in
mice.impute.mpmm()that changed the column order of the data
mice 3.15.0
mice 3.15.0
Major changes
-
Adds a function
futuremice()with support for parallel imputation using thefuturepackage (#504). Contributed @thomvolker, @gerkovink -
Adds multivariate predictive mean matching
mice.impute.mpmm(). (#460). Contributed @Mingyang-Cai -
Adds
convergence()for convergence evaluation (#484). Contributed @hanneoberman -
Reverts the internal seed behaviour back to
mice 3.13.10(#515). #432 introduced new local seed in response to #426. However, various issues arose with this facility (#459, #492, #502, #505). This version restores the old behaviour using global.Random.seed. Contributed @gerkovink -
Adds a
custom.targument topool()that allows the advanced user to specify a custom rule for calculating the total variance$T$ . Contributed @gerkovink -
Adds new argument
excludetomice.impute.pmm()that excludes a user-specified vector of values from matching. Excluded values will not appear in the imputations. Since the observed values are not imputed, the user-specified values are still being used to fit the imputation model (#392, #519). Contributed @gerkovink
Minor changes
- Styles all
.Rand.Rmdfiles - Makes post-processing assignment consistent with lines 85/86 in
sampler.R(#511) - Edit test broken on R<4 (#501). Contributed @MichaelChirico
- Adds support for models reporting contrasts rather than terms (#498). Contributed @LukasWallrich
- Applies edits to autocorrelation function (#491). Contributed @hanneoberman
- Changes p-value calculation to more robust alternative (#494). Contributed @AndrewLawrence
- Uses
inherits()to check on class membership - Adds decprecation notices to
parlmice() - Adapt
prop,patternsandweightsmatrices for pattern with only 1's - Adds warning when patterns cannot be generated (#449, #317, #451)
- Adds warning on the order of model terms in
D1()andD2()(#420) - Adds example code to fit model on train data and apply to test data to
mice() - Adds example code on synthetic data generation and analysis in
make.where() - Adds testfile
test-mice.impute.rf.R(#448)
Bug fixes
- Replaces
.Random.seedreads from the.GlobalEnvbyget(".Random.seed", envir = globalenv(), mode = "integer", inherits = FALSE) - Repairs capitalisation problems with
lastSeedValuevariable name - Solves
x$lastSeedValueproblem incbind.mids()(#502) - Fixes problems with
ampute() - Preserves stochastic nature of
mice()by smarter random seed initialisation (#459) - Repairs a
drop = FALSEbuglet inmice.impute.rf()(#447, #448) - @str-amg reported that the new dependency on
withrpackage should have version 2.4.0 (published in January 2021) or higher. Versionswithr 2.3.0and before may giveError: object 'local_seed' is not exported by 'namespace:withr'. Either update manually, or install the patched versionmice 3.14.1from GitHub. (#445). NOTE:withris no longer needed inmice 3.15.0
mice 3.14.0
Major changes
- Adds four new univariate functions using the lasso for automatic variable selection:
| Function | Description |
|---|---|
mice.impute.lasso.norm() |
Lasso linear regression |
mice.impute.lasso.logreg() |
Lasso logistic regression |
mice.impute.lasso.select.norm() |
Lasso selector + linear regression |
mice.impute.lasso.select.logreg() |
Lasso selector + logistic regression |
Contributed by @EdoardoCostantini (#438).
-
Adds Jamshidian && Jalal's non-parametric MCAR test,
mice::MCAR()and associated plot method. Contributed by @cjvanlissa (#423). -
Adds two new functions
pool.syn()andpool.scalar.syn()that specialise pooling estimates from synthetic data. The"reiter2003"pooling rule assumes that synthetic data were created from complete data. Thanks Thom Volker (#436). -
Avoids changing the global
.Random.seed(#426, #432) by implementingwithr::local_preserve_seed()andwithr::local_seed(). This change provides stabler behavior in complex scripts. The change does not appear to break reproducibility whenmice()was run with a seed. Nevertheless, if you run into a reproducibility problem, installmice 3.13.12or before. -
Improves the imputation of parabolic data in
mice.impute.quadratic(), adds a parameterquad.outcomecontaining the name of the outcome variable in the complete-data model. Contributed @Mingyang-Cai, @gerkovink (#408) -
By default,
mice.impute.rf()now uses the fasterrangerpackage as back-end instead ofrandomForestpackage. If you want the old behaviour specify therfPackage = "randomForest"argument to themice(...)call. Contributed @prockenschaub (#431). -
Generalises
pool()so that it processes the parameters from allgamlsssub-models. Thanks Marcio Augusto Diniz (#406, #405) -
Uses the robust standard error estimate for pooling when
pool()can extractrobust.sefrom the object returned bybroom::tidy()(#310)
Bug fixes
- Contains an emergency solution as
install.on.demand()broke the standard CRAN workflow. mice 3.14.0 does not callinstall.on.demand()anymore for recommended packages. Also,install.on.demand()will not run anymore in non-interactive mode. - Repairs an error in the
mice:::barnard.rubin()function for infinitedfcom. Thanks @huftis (#441). - Solves problem with
Xi <- as.matrix(...)inmice.impute.2l.lmer()that occurred when a cluster contains only one observation (#384) - Edits the
predictorMatrixto a monotone pattern ifvisitSequence = "monotone"andmaxit = 1(#316) - Solves a problem with the plot produced by
md.pattern()(#318, #323) - Fixes the intercept in
make.formulas()(#305, #324) - Fixes seed when using
newdatainmice.mids()(#313, #325) - Solves a problem with row names of the
whereelement created inrbind()(#319) - Solves a bug in mnar imputation routine. Contributed by Margarita Moreno Betancur.
Minor changes
- Replaces URL to jstatsoft with DOI
- Update reference to literature (#442)
- Informs the user that
pool()cannot take amidsobject (#433) - Updates documentation for post-processing functionality (#387)
- Adds Rcpp necessities
- Solves a problem with "last resort" initialisation of factors (#410)
- Documents the "flat-line behaviour" of
mice.impute.2l.lmer()to indicate a problem in fitting the imputation model (#385) - Add reprex to test (#326)
- Documents that multivariate imputation methods do not support the
postparameter (#326)
mice 3.13.0
Major changes
- Updated
mids2spss()replaces theforeignbyhavenpackage. Contributed Gerko Vink (#291)
Minor changes
mice 3.12.0
Much faster predictive mean matching
- The new
matchindexC function makes predictive mean matching 50 to 600 times faster.
The speed ofpmmis now on par with normal imputation (mice.impute.norm())
and with themiceFastpackage, without compromising on the statistical quality of
the imputations. Thanks to Polkas Polkas/miceFast#10 and
suggestions by Alexander Robitzsch. See #236 for more details.
New ignore argument to mice
- New
ignoreargument tomice(). This argument is a logical vector
ofnrow(data)elements indicating which rows are ignored when creating
the imputation model. We may use theignoreargument to split the data
into a training set (on which the imputation model is built) and a test
set (that does not influence the imputation model estimates). The argument
is based on the suggestion in
#32 (comment). See #32 for
more background and techniques. Crafted by Patrick Rockenschaub
New filter() function for mids objects
- New
filter()method that subsets amidsobject (multiply-imputed data set).
The method accepts a logical vector of lengthnrow(data), or an expression
to construct such a vector from the incomplete data. (#269).
Crafted by Patrick Rockenschaub.
Changes affecting reproducibility
- Breaking change: The
matcheralgorithm inpmmhas changed tomatchindex
for speed improvements. If you want the old behavior, specifymice(..., use.matcher = TRUE).
Minor changes
- Corrected installation problem related to
cpp11package (#286) - Simplifies
with.mids()by callingeval_tidy()on a quosure. Does not yet solve #265. - Improve documentation for
pool()andpool.scalar()(#142, #106, #190 and others) - Makes
tidy.mipomore flexible (#276) - Solves a problem if
nelsonaalen()gets atibble(#272) - Add explanation to how
NAs can appear in the imputed data (#267) - Add warning to
quickpred()documentation (#268) - Styles all sources files with styler
- Improves consistency in code and documentation
- Moves internally defined functions to global namespace
- Solves bug in internal
sum.scores() - Adds deprecated messages to
lm.mids(),glm.mids(),pool.compare() - Removes
expandcov() - Strips out all
return()calls placed just before end-of-function - Remove all trailing spaces
- Repairs a bug in the routine for finding the
printFlagvalue (#258) - Update URL's after transfer to organisation
amices
mice 3.11.0
Major changes
- The Cox model does not return
df.residual, which caused problematic behavior in theD1(),D2(),D3(),anova()andpool().micenow extracts the relevant information from other parts of the objects returned bysurvival::coxph(), which solves long-standing issues with the integration of the Cox model (#246). - Adds missing
Rccpdependency to work withtidyr 1.1.1(#248).
Minor changes
- Addresses warnings:
Non-file package-anchored link(s) in documentation object. - Updates on
amputedocumentation (#251). - Ask user permission before installing a package from
suggests.