Resampling strategies are usually used to assess the performance of a learning algorithm: The entire data set is (repeatedly) split into training sets \(D^{*b}\) and test sets \(D \setminus D^{*b}\), \(b = 1,\ldots,B\). The learner is trained on each training set, predictions are made on the corresponding test set (sometimes on the training set as well) and the performance measure \(S(D^{*b}, D \setminus D^{*b})\) is calculated. Then the \(B\) individual performance values are aggregated, most often by calculating the mean. There exist various different resampling strategies, for example cross-validation and bootstrap, to mention just two popular approaches.

Resampling Figure
If you want to read up on further details, the paper Resampling Strategies for Model Assessment and Selection by Simon is probably not a bad choice. Bernd has also published a paper Resampling methods for meta-model validation with recommendations for evolutionary computation which contains detailed descriptions and lots of statistical background information on resampling methods.
In mlr the resampling strategy can be defined via function makeResampleDesc(). It requires a string that specifies the resampling method and, depending on the selected strategy, further information like the number of iterations. The supported resampling strategies are:
"CV"),"LOO"),"RepCV"),"Bootstrap"),"Subsample"),"Holdout").For example if you want to use 3-fold cross-validation type:
### 3-fold cross-validation
rdesc = makeResampleDesc("CV", iters = 3)
rdesc
## Resample description: cross-validation with 3 iterations.
## Predict: test
## Stratification: FALSEFor holdout estimation use:
### Holdout estimation
rdesc = makeResampleDesc("Holdout")
rdesc
## Resample description: holdout with 0.67 split rate.
## Predict: test
## Stratification: FALSEIn order to save you some typing mlr contains some pre-defined resample descriptions for very common strategies like holdout (hout (makeResampleDesc())) as well as cross-validation with different numbers of folds (e.g., cv5 (makeResampleDesc()) or cv10 (makeResampleDesc())).
Function resample() evaluates a Learner (makeLearner()) on a given machine learning Task() using the selected resampling strategy (makeResampleDesc()).
As a first example, the performance of linear regression (stats::lm()) on the BostonHousing (mlbench::BostonHousing()) data set is calculated using 3-fold cross-validation.
Generally, for \(K\)-fold cross-validation the data set \(D\) is partitioned into \(K\) subsets of (approximately) equal size. In the \(b\)-th of the \(K\) iterations, the \(b\)-th subset is used for testing, while the union of the remaining parts forms the training set.
As usual, you can either pass a Learner (makeLearner()) object to resample() or, as done here, provide the class name "regr.lm" of the learner. Since no performance measure is specified the default for regression learners (mean squared error, mse) is calculated.
### Specify the resampling strategy (3-fold cross-validation)
rdesc = makeResampleDesc("CV", iters = 3)
### Calculate the performance
r = resample("regr.lm", bh.task, rdesc)
## Resampling: cross-validation
## Measures: mse
## [Resample] iter 1: 21.1066581
## [Resample] iter 2: 24.9667799
## [Resample] iter 3: 25.1828780
##
## Aggregated Result: mse.test.mean=23.7521054
##
r
## Resample Result
## Task: BostonHousing-example
## Learner: regr.lm
## Aggr perf: mse.test.mean=23.7521054
## Runtime: 0.0888922The result r is an object of class resample() result. It contains performance results for the learner and some additional information like the runtime, predicted values, and optionally the models fitted in single resampling iterations.
### Peak into r
names(r)
## [1] "learner.id" "task.id" "task.desc" "measures.train"
## [5] "measures.test" "aggr" "pred" "models"
## [9] "err.msgs" "err.dumps" "extract" "runtime"
r$aggr
## mse.test.mean
## 23.75211
r$measures.test
## iter mse
## 1 1 21.10666
## 2 2 24.96678
## 3 3 25.18288r$measures.test gives the performance on each of the 3 test data sets. r$aggr shows the aggregated performance value. Its name "mse.test.mean" indicates the performance measure, mse, and the method, test.mean (aggregations()), used to aggregate the 3 individual performances. test.mean (aggregations()) is the default aggregation scheme for most performance measures and, as the name implies, takes the mean over the performances on the test data sets.
Resampling in mlr works the same way for all types of learning problems and learners. Below is a classification example where a classification tree (rpart) (rpart::rpart()) is evaluated on the Sonar (mlbench::sonar()) data set by subsampling with 5 iterations.
In each subsampling iteration the data set \(D\) is randomly partitioned into a training and a test set according to a given percentage, e.g., 2/3 training and 1/3 test set. If there is just one iteration, the strategy is commonly called holdout or test sample estimation.
You can calculate several measures at once by passing a list of Measures (makeMeasure())s to resample(). Below, the error rate (mmce), false positive and false negative rates (fpr, fnr), and the time it takes to train the learner (timetrain) are estimated by subsampling with 5 iterations.
### Subsampling with 5 iterations and default split ratio 2/3
rdesc = makeResampleDesc("Subsample", iters = 5)
### Subsampling with 5 iterations and 4/5 training data
rdesc = makeResampleDesc("Subsample", iters = 5, split = 4/5)
### Classification tree with information splitting criterion
lrn = makeLearner("classif.rpart", parms = list(split = "information"))
### Calculate the performance measures
r = resample(lrn, sonar.task, rdesc, measures = list(mmce, fpr, fnr, timetrain))
## Resampling: subsampling
## Measures: mmce fpr fnr timetrain
## [Resample] iter 1: 0.3333333 0.3750000 0.3076923 0.0340000
## [Resample] iter 2: 0.2142857 0.3157895 0.1304348 0.0280000
## [Resample] iter 3: 0.2857143 0.5000000 0.1250000 0.0170000
## [Resample] iter 4: 0.2619048 0.1666667 0.3333333 0.0230000
## [Resample] iter 5: 0.3095238 0.2500000 0.3636364 0.0190000
##
## Aggregated Result: mmce.test.mean=0.2809524,fpr.test.mean=0.3214912,fnr.test.mean=0.2520194,timetrain.test.mean=0.0242000
##
r
## Resample Result
## Task: Sonar-example
## Learner: classif.rpart
## Aggr perf: mmce.test.mean=0.2809524,fpr.test.mean=0.3214912,fnr.test.mean=0.2520194,timetrain.test.mean=0.0242000
## Runtime: 0.222119If you want to add further measures afterwards, use addRRMeasure().
### Add balanced error rate (ber) and time used to predict
addRRMeasure(r, list(ber, timepredict))
## Resample Result
## Task: Sonar-example
## Learner: classif.rpart
## Aggr perf: mmce.test.mean=0.2809524,fpr.test.mean=0.3214912,fnr.test.mean=0.2520194,timetrain.test.mean=0.0242000,ber.test.mean=0.2867553,timepredict.test.mean=0.0074000
## Runtime: 0.222119By default, resample() prints progress messages and intermediate results. You can turn this off by setting show.info = FALSE, as done in the code chunk below. (If you are interested in suppressing these messages permanently have a look at the tutorial page about configuring mlr.)
In the above example, the Learner (makeLearner()) was explicitly constructed. For convenience you can also specify the learner as a string and pass any learner parameters via the ... argument of resample().
r = resample("classif.rpart", parms = list(split = "information"), sonar.task, rdesc,
measures = list(mmce, fpr, fnr, timetrain), show.info = FALSE)
r
## Resample Result
## Task: Sonar-example
## Learner: classif.rpart
## Aggr perf: mmce.test.mean=0.2809524,fpr.test.mean=0.3012637,fnr.test.mean=0.2591595,timetrain.test.mean=0.0200000
## Runtime: 0.183523Apart from the learner performance you can extract further information from the resample results, for example predicted values or the models fitted in individual resample iterations.
Per default, the resample() result contains the predictions made during the resampling. If you do not want to keep them, e.g., in order to conserve memory, set keep.pred = FALSE when calling resample().
The predictions are stored in slot $pred of the resampling result, which can also be accessed by function getRRPredictions().
r$pred
## Resampled Prediction for:
## Resample description: subsampling with 5 iterations and 0.80 split rate.
## Predict: test
## Stratification: FALSE
## predict.type: response
## threshold:
## time (mean): 0.01
## id truth response iter set
## 1 104 M M 1 test
## 2 129 M R 1 test
## 3 185 M M 1 test
## 4 91 R R 1 test
## 5 156 M R 1 test
## 6 166 M R 1 test
## ... (#rows: 210, #cols: 5)
pred = getRRPredictions(r)
pred
## Resampled Prediction for:
## Resample description: subsampling with 5 iterations and 0.80 split rate.
## Predict: test
## Stratification: FALSE
## predict.type: response
## threshold:
## time (mean): 0.01
## id truth response iter set
## 1 104 M M 1 test
## 2 129 M R 1 test
## 3 185 M M 1 test
## 4 91 R R 1 test
## 5 156 M R 1 test
## 6 166 M R 1 test
## ... (#rows: 210, #cols: 5)pred is an object of class resample() Prediction. Just as a Prediction() object (see the tutorial page on making predictions it has an element $data which is a data.frame that contains the predictions and in the case of a supervised learning problem the true values of the target variable(s). You can use as.data.frame (Prediction() to directly access the $data slot. Moreover, all getter functions for Prediction() objects like getPredictionResponse() or getPredictionProbabilities() are applicable.
head(as.data.frame(pred))
## id truth response iter set
## 1 104 M M 1 test
## 2 129 M R 1 test
## 3 185 M M 1 test
## 4 91 R R 1 test
## 5 156 M R 1 test
## 6 166 M R 1 test
head(getPredictionTruth(pred))
## [1] M M M R M M
## Levels: M R
head(getPredictionResponse(pred))
## [1] M R M R R R
## Levels: M RThe columns iter and set in the data.frame indicate the resampling iteration and the data set (train or test) for which the prediction was made.
By default, predictions are made for the test sets only. If predictions for the training set are required, set predict = "train" (for predictions on the train set only) or predict = "both" (for predictions on both train and test sets) in makeResampleDesc(). In any case, this is necessary for some bootstrap methods (b632 and b632+) and some examples are shown later on.
Below, we use simple Holdout, i.e., split the data once into a training and test set, as resampling strategy and make predictions on both sets.
### Make predictions on both training and test sets
rdesc = makeResampleDesc("Holdout", predict = "both")
r = resample("classif.lda", iris.task, rdesc, show.info = FALSE)
r
## Resample Result
## Task: iris-example
## Learner: classif.lda
## Aggr perf: mmce.test.mean=0.0200000
## Runtime: 0.0277021
r$measures.train
## iter mmce
## 1 1 0.02(Please note that nonetheless the misclassification rate r$aggr is estimated on the test data only. How to calculate performance measures on the training sets is shown below.)
A second function to extract predictions from resample results is getRRPredictionList() which returns a list of predictions split by data set (train/test) and resampling iteration.
predList = getRRPredictionList(r)
predList
## $train
## $train$`1`
## Prediction: 100 observations
## predict.type: response
## threshold:
## time: 0.00
## id truth response
## 52 52 versicolor versicolor
## 147 147 virginica virginica
## 62 62 versicolor versicolor
## 39 39 setosa setosa
## 4 4 setosa setosa
## 105 105 virginica virginica
## ... (#rows: 100, #cols: 3)
##
##
## $test
## $test$`1`
## Prediction: 50 observations
## predict.type: response
## threshold:
## time: 0.00
## id truth response
## 101 101 virginica virginica
## 54 54 versicolor versicolor
## 120 120 virginica virginica
## 71 71 versicolor virginica
## 32 32 setosa setosa
## 69 69 versicolor versicolor
## ... (#rows: 50, #cols: 3)In each resampling iteration a Learner (makeLearner()) is fitted on the respective training set. By default, the resulting WrappedModel (makeWrappedModel())s are not included in the resample() result and slot $models is empty. In order to keep them, set models = TRUE when calling resample(), as in the following survival analysis example.
### 3-fold cross-validation
rdesc = makeResampleDesc("CV", iters = 3)
r = resample("surv.coxph", lung.task, rdesc, show.info = FALSE, models = TRUE)
r$models
## [[1]]
## Model for learner.id=surv.coxph; learner.class=surv.coxph
## Trained on: task.id = lung-example; obs = 111; features = 8
## Hyperparameters:
##
## [[2]]
## Model for learner.id=surv.coxph; learner.class=surv.coxph
## Trained on: task.id = lung-example; obs = 112; features = 8
## Hyperparameters:
##
## [[3]]
## Model for learner.id=surv.coxph; learner.class=surv.coxph
## Trained on: task.id = lung-example; obs = 111; features = 8
## Hyperparameters:Keeping complete fitted models can be memory-intensive if these objects are large or the number of resampling iterations is high. Alternatively, you can use the extract argument of resample() to retain only the information you need. To this end you need to pass a function to extract which is applied to each WrappedModel (makeWrappedModel()) object fitted in each resampling iteration.
Below, we cluster the datasets::mtcars() data using the \(k\)-means algorithm with \(k = 3\) and keep only the cluster centers.
### 3-fold cross-validation
rdesc = makeResampleDesc("CV", iters = 3)
### Extract the compute cluster centers
r = resample("cluster.kmeans", mtcars.task, rdesc, show.info = FALSE,
centers = 3, extract = function(x) getLearnerModel(x)$centers)
##
## This is package 'modeest' written by P. PONCET.
## For a complete list of functions, use 'library(help = "modeest")' or 'help.start()'.
r$extract
## [[1]]
## mpg cyl disp hp drat wt qsec vs
## 1 15.57143 8 357.2571 200.5714 3.342857 3.889143 16.64286 0.0000000
## 2 19.53333 6 187.2000 124.3333 3.533333 3.157500 18.13667 0.6666667
## 3 27.71250 4 98.4250 78.7500 4.072500 2.191625 19.03750 1.0000000
## am gear carb
## 1 0.1428571 3.285714 3.000000
## 2 0.3333333 3.833333 3.333333
## 3 0.7500000 4.000000 1.500000
##
## [[2]]
## mpg cyl disp hp drat wt qsec vs am
## 1 13.91667 8.0 379.00 254.0 3.476667 4.124167 15.95833 0.0 0.3333333
## 2 25.67000 4.4 117.19 87.1 4.057000 2.437800 18.73200 0.8 0.7000000
## 3 17.50000 7.2 270.52 145.0 2.948000 3.541000 18.42600 0.4 0.0000000
## gear carb
## 1 3.666667 4.666667
## 2 4.200000 2.100000
## 3 3.000000 2.000000
##
## [[3]]
## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 22.790 4.8 131.39 106.800 3.9570 2.639500 18.46000 0.6 0.700 4.10 2.700
## 2 13.675 8.0 443.00 206.250 3.0600 4.966000 17.56750 0.0 0.000 3.00 3.500
## 3 15.950 8.0 308.80 199.375 3.1275 3.639375 16.82875 0.0 0.125 3.25 3.375As a second example, we extract the variable importances from fitted regression trees using function getFeatureImportance(). (For more detailed information on this topic see the feature selection page.)
### Extract the variable importance in a regression tree
r = resample("regr.rpart", bh.task, rdesc, show.info = FALSE, extract = getFeatureImportance)
r$extract
## [[1]]
## FeatureImportance:
## Task: BostonHousing-example
##
## Learner: regr.rpart
## Measure: NA
## Contrast: NA
## Aggregation: function (x) x
## Replace: NA
## Number of Monte-Carlo iterations: NA
## Local: FALSE
## crim zn indus chas nox rm age dis
## 1 4534.211 2261.008 3313.323 0 1079.433 8791.118 3719.244 2884.793
## rad tax ptratio b lstat
## 1 247.3488 2695.16 852.504 0 18792.25
##
## [[2]]
## FeatureImportance:
## Task: BostonHousing-example
##
## Learner: regr.rpart
## Measure: NA
## Contrast: NA
## Aggregation: function (x) x
## Replace: NA
## Number of Monte-Carlo iterations: NA
## Local: FALSE
## crim zn indus chas nox rm age dis rad
## 1 1779.241 4271.421 4448.099 0 5268.319 10620.09 4701.545 1947.569 0
## tax ptratio b lstat
## 1 1996.346 1274.499 0 16694.43
##
## [[3]]
## FeatureImportance:
## Task: BostonHousing-example
##
## Learner: regr.rpart
## Measure: NA
## Contrast: NA
## Aggregation: function (x) x
## Replace: NA
## Number of Monte-Carlo iterations: NA
## Local: FALSE
## crim zn indus chas nox rm age dis
## 1 1455.346 1706.152 5162.051 23.24497 5451.008 17749.73 4760.479 4108.835
## rad tax ptratio b lstat
## 1 54.8259 3014.486 3676.516 1177.974 12864.25There is also an convenience function getResamplingIndices() to extract the resampling indices from the ResampleResult object:
getResamplingIndices(r)
## $train.inds
## $train.inds[[1]]
## [1] 166 5 21 357 168 101 499 149 98 488 399 172 53 261 271 236 445
## [18] 482 76 338 430 315 371 132 73 225 425 150 413 118 52 418 140 324
## [35] 196 162 40 119 266 275 102 301 438 48 217 95 484 14 270 257 389
## [52] 85 175 248 395 272 146 57 403 130 339 34 247 197 318 309 408 228
## [69] 433 327 59 195 329 350 131 401 363 390 473 464 452 311 64 209 453
## [86] 254 432 68 343 300 106 330 100 113 374 290 375 177 109 328 167 229
## [103] 124 306 434 133 189 221 250 426 265 479 485 450 83 368 331 333 67
## [120] 8 471 169 107 114 31 19 313 435 122 188 104 2 89 292 231 274
## [137] 222 320 361 336 367 465 440 443 123 494 62 81 191 285 240 481 156
## [154] 314 65 28 478 116 54 241 410 249 284 203 121 276 164 280 421 47
## [171] 467 223 204 129 190 211 321 446 125 365 165 96 230 110 303 504 183
## [188] 411 91 173 293 423 46 184 459 154 253 470 457 27 373 233 4 500
## [205] 39 71 108 444 179 356 486 245 111 477 210 220 199 80 442 404 120
## [222] 201 409 200 346 90 308 370 439 469 455 349 366 163 436 502 143 414
## [239] 87 354 66 364 391 463 260 178 506 72 291 36 235 310 406 428 462
## [256] 393 70 402 148 32 251 29 77 205 503 429 348 382 182 213 342 353
## [273] 322 186 491 37 212 267 369 394 416 383 424 224 234 115 312 3 295
## [290] 427 239 58 97 419 171 160 139 490 307 378 475 11 468 396 45 460
## [307] 337 340 386 244 82 385 358 360 487 326 50 495 16 232 105 316 137
## [324] 43 281 345 282 215 207 287 30 158 351 256 392 26 493
##
## $train.inds[[2]]
## [1] 243 166 384 18 149 405 98 447 388 294 172 417 53 103 449 286 236
## [18] 445 76 338 132 73 341 298 206 372 118 52 324 162 40 119 474 275
## [35] 102 301 438 48 217 95 484 302 237 270 389 175 395 93 198 33 461
## [52] 403 176 34 247 318 309 433 202 195 56 151 44 159 363 390 464 25
## [69] 9 311 64 283 299 453 273 432 68 343 330 100 49 185 374 290 412
## [86] 458 23 498 441 375 258 109 334 420 407 229 13 124 492 434 133 288
## [103] 221 250 426 265 325 335 485 141 450 83 368 219 142 333 169 114 355
## [120] 255 17 134 435 122 104 2 347 157 89 92 216 263 476 292 231 274
## [137] 144 222 361 367 332 352 440 181 128 123 494 472 81 240 505 454 218
## [154] 381 156 65 437 28 478 1 6 249 497 284 86 466 60 276 359 75
## [171] 79 55 61 47 467 277 129 193 321 22 365 165 96 230 504 183 226
## [188] 41 411 91 51 246 293 423 127 459 154 38 500 39 227 63 501 187
## [205] 489 279 136 24 356 486 170 15 480 238 483 180 145 380 138 111 477
## [222] 220 112 199 80 323 442 404 120 297 469 455 278 349 264 366 163 135
## [239] 78 289 66 451 364 391 362 456 260 94 72 36 235 35 310 406 428
## [256] 393 70 7 402 148 32 251 77 205 503 12 429 400 174 213 353 491
## [273] 397 208 214 496 267 369 317 431 20 379 10 224 115 387 377 305 312
## [290] 3 448 344 58 171 304 161 88 69 84 396 422 45 460 252 340 269
## [307] 386 296 74 153 358 50 495 16 147 194 155 117 99 262 259 376 398
## [324] 126 282 415 319 287 158 192 256 392 242 26 42 268 152
##
## $train.inds[[3]]
## [1] 243 5 21 357 168 384 18 101 499 405 447 488 388 399 294 417 103
## [18] 449 286 261 271 482 430 315 371 341 298 225 425 150 206 413 372 418
## [35] 140 196 474 266 302 14 237 257 85 248 93 272 146 57 198 33 461
## [52] 130 339 176 197 408 228 327 59 202 56 151 44 329 350 131 159 401
## [69] 473 25 9 452 283 209 299 254 273 300 106 113 49 185 412 458 23
## [86] 498 441 258 177 334 328 420 407 167 13 492 306 288 189 325 335 479
## [103] 141 331 219 142 67 8 471 107 31 19 355 255 17 134 313 188 347
## [120] 157 92 216 263 476 144 320 336 465 332 352 181 443 128 62 472 191
## [137] 285 505 454 218 381 481 314 437 116 1 6 54 241 410 497 86 203
## [154] 466 60 121 359 75 79 164 55 280 61 421 223 204 277 190 211 193
## [171] 22 446 125 110 303 226 41 51 173 246 46 184 127 253 470 38 457
## [188] 27 373 233 4 227 63 501 187 489 71 279 136 108 444 24 179 170
## [205] 245 15 480 238 483 180 145 380 138 210 112 323 201 409 200 346 90
## [222] 308 370 297 439 278 264 436 135 78 289 502 143 414 87 354 451 362
## [239] 463 456 94 178 506 291 35 462 7 29 12 348 400 382 182 174 342
## [256] 322 186 397 37 212 208 214 496 394 416 317 383 431 20 379 10 424
## [273] 234 387 377 305 295 427 448 239 344 97 419 304 160 161 139 490 88
## [290] 307 378 475 69 11 84 468 422 337 252 269 244 82 296 74 385 153
## [307] 360 487 326 147 232 194 155 105 316 137 43 117 99 262 281 345 259
## [324] 376 398 126 415 215 319 207 30 351 192 242 42 268 493 152
##
##
## $test.inds
## $test.inds[[1]]
## [1] 1 6 7 9 10 12 13 15 17 18 20 22 23 24 25 33 35
## [18] 38 41 42 44 49 51 55 56 60 61 63 69 74 75 78 79 84
## [35] 86 88 92 93 94 99 103 112 117 126 127 128 134 135 136 138 141
## [52] 142 144 145 147 151 152 153 155 157 159 161 170 174 176 180 181 185
## [69] 187 192 193 194 198 202 206 208 214 216 218 219 226 227 237 238 242
## [86] 243 246 252 255 258 259 262 263 264 268 269 273 277 278 279 283 286
## [103] 288 289 294 296 297 298 299 302 304 305 317 319 323 325 332 334 335
## [120] 341 344 347 352 355 359 362 372 376 377 379 380 381 384 387 388 397
## [137] 398 400 405 407 412 415 417 420 422 431 437 441 447 448 449 451 454
## [154] 456 458 461 466 472 474 476 480 483 489 492 496 497 498 501 505
##
## $test.inds[[2]]
## [1] 4 5 8 11 14 19 21 27 29 30 31 37 43 46 54 57 59
## [18] 62 67 71 82 85 87 90 97 101 105 106 107 108 110 113 116 121
## [35] 125 130 131 137 139 140 143 146 150 160 164 167 168 173 177 178 179
## [52] 182 184 186 188 189 190 191 196 197 200 201 203 204 207 209 210 211
## [69] 212 215 223 225 228 232 233 234 239 241 244 245 248 253 254 257 261
## [86] 266 271 272 280 281 285 291 295 300 303 306 307 308 313 314 315 316
## [103] 320 322 326 327 328 329 331 336 337 339 342 345 346 348 350 351 354
## [120] 357 360 370 371 373 378 382 383 385 394 399 401 408 409 410 413 414
## [137] 416 418 419 421 424 425 427 430 436 439 443 444 446 452 457 462 463
## [154] 465 468 470 471 473 475 479 481 482 487 488 490 493 499 502 506
##
## $test.inds[[3]]
## [1] 2 3 16 26 28 32 34 36 39 40 45 47 48 50 52 53 58
## [18] 64 65 66 68 70 72 73 76 77 80 81 83 89 91 95 96 98
## [35] 100 102 104 109 111 114 115 118 119 120 122 123 124 129 132 133 148
## [52] 149 154 156 158 162 163 165 166 169 171 172 175 183 195 199 205 213
## [69] 217 220 221 222 224 229 230 231 235 236 240 247 249 250 251 256 260
## [86] 265 267 270 274 275 276 282 284 287 290 292 293 301 309 310 311 312
## [103] 318 321 324 330 333 338 340 343 349 353 356 358 361 363 364 365 366
## [120] 367 368 369 374 375 386 389 390 391 392 393 395 396 402 403 404 406
## [137] 411 423 426 428 429 432 433 434 435 438 440 442 445 450 453 455 459
## [154] 460 464 467 469 477 478 484 485 486 491 494 495 500 503 504Stratification with respect to a categorical variable makes sure that all its values are present in each training and test set in approximately the same proportion as in the original data set. Stratification is possible with regard to categorical target variables (and thus for supervised classification and survival analysis) or categorical explanatory variables.
Blocking refers to the situation that subsets of observations belong together and must not be separated during resampling. Hence, for one train/test set pair the entire block is either in the training set or in the test set.
For classification, it is usually desirable to have the same proportion of the classes in all of the partitions of the original data set. This is particularly useful in the case of imbalanced classes and small data sets. Otherwise, it may happen that observations of less frequent classes are missing in some of the training sets which can decrease the performance of the learner, or lead to model crashes. In order to conduct stratified resampling, set stratify = TRUE in makeResampleDesc().
### 3-fold cross-validation
rdesc = makeResampleDesc("CV", iters = 3, stratify = TRUE)
r = resample("classif.lda", iris.task, rdesc, show.info = FALSE)
r
## Resample Result
## Task: iris-example
## Learner: classif.lda
## Aggr perf: mmce.test.mean=0.0200053
## Runtime: 0.0306385Stratification is also available for survival tasks. Here the stratification balances the censoring rate.
Sometimes it is required to also stratify on the input data, e.g., to ensure that all subgroups are represented in all training and test sets. To stratify on the input columns, specify factor columns of your task data via stratify.cols.
If some observations “belong together” and must not be separated when splitting the data into training and test sets for resampling, you can supply this information via a blocking factor when creating the task.
### 5 blocks containing 30 observations each
task = makeClassifTask(data = iris, target = "Species", blocking = factor(rep(1:5, each = 30)))
task
## Supervised task: iris
## Type: classif
## Target: Species
## Observations: 150
## Features:
## numerics factors ordered functionals
## 4 0 0 0
## Missings: FALSE
## Has weights: FALSE
## Has blocking: TRUE
## Has coordinates: FALSE
## Classes: 3
## setosa versicolor virginica
## 50 50 50
## Positive class: NAAs already mentioned, you can specify a resampling strategy using function makeResampleDesc().
rdesc = makeResampleDesc("CV", iters = 3)
rdesc
## Resample description: cross-validation with 3 iterations.
## Predict: test
## Stratification: FALSE
str(rdesc)
## List of 4
## $ id : chr "cross-validation"
## $ iters : int 3
## $ predict : chr "test"
## $ stratify: logi FALSE
## - attr(*, "class")= chr [1:2] "CVDesc" "ResampleDesc"
str(makeResampleDesc("Subsample", stratify.cols = "chas"))
## List of 6
## $ split : num 0.667
## $ id : chr "subsampling"
## $ iters : int 30
## $ predict : chr "test"
## $ stratify : logi FALSE
## $ stratify.cols: chr "chas"
## - attr(*, "class")= chr [1:2] "SubsampleDesc" "ResampleDesc"The result rdesc inherits from class ResampleDesc (makeResampleDesc()) (short for resample description) and, in principle, contains all necessary information about the resampling strategy including the number of iterations, the proportion of training and test sets, stratification variables, etc.
Given either the size of the data set at hand or the Task(), function makeResampleInstance() draws the training and test sets according to the ResampleDesc (makeResampleDesc()).
### Create a resample instance based an a task
rin = makeResampleInstance(rdesc, iris.task)
rin
## Resample instance for 150 cases.
## Resample description: cross-validation with 3 iterations.
## Predict: test
## Stratification: FALSE
str(rin)
## List of 5
## $ desc :List of 4
## ..$ id : chr "cross-validation"
## ..$ iters : int 3
## ..$ predict : chr "test"
## ..$ stratify: logi FALSE
## ..- attr(*, "class")= chr [1:2] "CVDesc" "ResampleDesc"
## $ size : int 150
## $ train.inds:List of 3
## ..$ : int [1:100] 139 97 38 116 86 14 63 147 98 12 ...
## ..$ : int [1:100] 116 86 146 66 63 109 84 65 6 48 ...
## ..$ : int [1:100] 139 97 38 146 66 14 109 84 147 98 ...
## $ test.inds :List of 3
## ..$ : int [1:50] 1 4 6 9 21 23 25 26 27 31 ...
## ..$ : int [1:50] 2 11 12 13 14 16 20 24 29 35 ...
## ..$ : int [1:50] 3 5 7 8 10 15 17 18 19 22 ...
## $ group : Factor w/ 0 levels:
## - attr(*, "class")= chr "ResampleInstance"
### Create a resample instance given the size of the data set
rin = makeResampleInstance(rdesc, size = nrow(iris))
str(rin)
## List of 5
## $ desc :List of 4
## ..$ id : chr "cross-validation"
## ..$ iters : int 3
## ..$ predict : chr "test"
## ..$ stratify: logi FALSE
## ..- attr(*, "class")= chr [1:2] "CVDesc" "ResampleDesc"
## $ size : int 150
## $ train.inds:List of 3
## ..$ : int [1:100] 97 74 128 108 131 10 96 30 81 51 ...
## ..$ : int [1:100] 97 122 74 128 113 32 71 10 121 81 ...
## ..$ : int [1:100] 122 108 113 32 71 131 96 121 30 51 ...
## $ test.inds :List of 3
## ..$ : int [1:50] 7 9 13 14 19 21 32 35 38 44 ...
## ..$ : int [1:50] 2 3 5 8 12 15 16 20 23 24 ...
## ..$ : int [1:50] 1 4 6 10 11 17 18 22 25 26 ...
## $ group : Factor w/ 0 levels:
## - attr(*, "class")= chr "ResampleInstance"
### Access the indices of the training observations in iteration 3
rin$train.inds[[3]]
## [1] 122 108 113 32 71 131 96 121 30 51 82 110 91 48 112 93 44
## [18] 95 63 116 118 47 33 53 69 104 14 23 24 19 85 101 77 55
## [35] 105 42 66 12 129 31 146 103 59 80 21 130 49 75 136 2 61
## [52] 35 43 150 5 46 72 8 140 107 139 15 13 106 68 83 16 79
## [69] 34 76 94 87 90 65 7 41 54 9 58 102 20 78 141 126 117
## [86] 56 57 142 135 73 100 125 38 37 145 88 115 133 3 111The result rin inherits from class ResampleInstance (makeResampleInstance()) and contains lists of index vectors for the train and test sets.
If a ResampleDesc (makeResampleDesc()) is passed to resample(), it is instantiated internally. Naturally, it is also possible to pass a ResampleInstance (makeResampleInstance()) directly.
While the separation between resample descriptions, resample instances, and the resample() function itself seems overly complicated, it has several advantages:
rdesc = makeResampleDesc("CV", iters = 3)
rin = makeResampleInstance(rdesc, task = iris.task)
### Calculate the performance of two learners based on the same resample instance
r.lda = resample("classif.lda", iris.task, rin, show.info = FALSE)
r.rpart = resample("classif.rpart", iris.task, rin, show.info = FALSE)
r.lda$aggr
## mmce.test.mean
## 0.02666667
r.rpart$aggr
## mmce.test.mean
## 0.06666667ResampleDesc (makeResampleDesc()) and ResampleInstance (makeResampleInstance()) classes, but you do neither have to touch resample() nor any further methods that use the resampling strategy.Usually, when calling makeResampleInstance() the train and test index sets are drawn randomly. Mainly for holdout (test sample) estimation you might want full control about the training and tests set and specify them manually. This can be done using function makeFixedHoldoutInstance().
In each resampling iteration \(b = 1,\ldots,B\) we get performance values \(S(D^{*b}, D \setminus D^{*b})\) (for each measure we wish to calculate), which are then aggregated to an overall performance.
For the great majority of common resampling strategies (like holdout, cross-validation, subsampling) performance values are calculated on the test data sets only and for most measures aggregated by taking the mean (test.mean(aggregations())).
Each performance Measure (makeMeasure()) in mlr has a corresponding default aggregation method which is stored in slot $aggr. The default aggregation for most measures is test.mean(aggregations()). One exception is the root mean square error (rmse).
### Mean misclassification error
mmce$aggr
## Aggregation function: test.mean
mmce$aggr$fun
## function(task, perf.test, perf.train, measure, group, pred) mean(perf.test)
## <bytecode: 0x55732ef0cb08>
## <environment: namespace:mlr>
### Root mean square error
rmse$aggr
## Aggregation function: test.rmse
rmse$aggr$fun
## function(task, perf.test, perf.train, measure, group, pred) sqrt(mean(perf.test^2))
## <bytecode: 0x55732b8d7a60>
## <environment: namespace:mlr>You can change the aggregation method of a Measure (makeMeasure()) via function setAggregation(). All available aggregation schemes are listed on the aggregations() documentation page.
The aggregation schemes test.median (aggregations()), test.min (aggregations()), and text.max (aggregations()) compute the median, minimum, and maximum of the performance values on the test sets.
mseTestMedian = setAggregation(mse, test.median)
mseTestMin = setAggregation(mse, test.min)
mseTestMax = setAggregation(mse, test.max)
mseTestMedian
## Name: Mean of squared errors
## Performance measure: mse
## Properties: regr,req.pred,req.truth
## Minimize: TRUE
## Best: 0; Worst: Inf
## Aggregated by: test.median
## Arguments:
## Note: Defined as: mean((response - truth)^2)
rdesc = makeResampleDesc("CV", iters = 3)
r = resample("regr.lm", bh.task, rdesc, measures = list(mse, mseTestMedian, mseTestMin, mseTestMax))
## Resampling: cross-validation
## Measures: mse mse mse mse
## [Resample] iter 1: 19.222368519.222368519.222368519.2223685
## [Resample] iter 2: 20.899258120.899258120.899258120.8992581
## [Resample] iter 3: 35.869432435.869432435.869432435.8694324
##
## Aggregated Result: mse.test.mean=25.3303530,mse.test.median=20.8992581,mse.test.min=19.2223685,mse.test.max=35.8694324
##
r
## Resample Result
## Task: BostonHousing-example
## Learner: regr.lm
## Aggr perf: mse.test.mean=25.3303530,mse.test.median=20.8992581,mse.test.min=19.2223685,mse.test.max=35.8694324
## Runtime: 0.0502353
r$aggr
## mse.test.mean mse.test.median mse.test.min mse.test.max
## 25.33035 20.89926 19.22237 35.86943Below we calculate the mean misclassification error (mmce) on the training and the test data sets. Note that we have to set predict = "both" when calling makeResampleDesc() in order to get predictions on both training and test sets.
mmceTrainMean = setAggregation(mmce, train.mean)
rdesc = makeResampleDesc("CV", iters = 3, predict = "both")
r = resample("classif.rpart", iris.task, rdesc, measures = list(mmce, mmceTrainMean))
## Resampling: cross-validation
## Measures: mmce.train mmce.test
## [Resample] iter 1: 0.0200000 0.0800000
## [Resample] iter 2: 0.0500000 0.0200000
## [Resample] iter 3: 0.0300000 0.0800000
##
## Aggregated Result: mmce.test.mean=0.0600000,mmce.train.mean=0.0333333
##
r$measures.train
## iter mmce mmce
## 1 1 0.02 0.02
## 2 2 0.05 0.05
## 3 3 0.03 0.03
r$aggr
## mmce.test.mean mmce.train.mean
## 0.06000000 0.03333333In out-of-bag bootstrap estimation \(B\) new data sets \(D^{*1}, \ldots, D^{*B}\) are drawn from the data set \(D\) with replacement, each of the same size as \(D\). In the \(b\)-th iteration, \(D^{*b}\) forms the training set, while the remaining elements from \(D\), i.e., \(D \setminus D^{*b}\), form the test set.
The b632 and b632+ variants calculate a convex combination of the training performance and the out-of-bag bootstrap performance and thus require predictions on the training sets and an appropriate aggregation strategy.
### Use bootstrap as resampling strategy and predict on both train and test sets
rdesc = makeResampleDesc("Bootstrap", predict = "both", iters = 10)
### Set aggregation schemes for b632 and b632+ bootstrap
mmceB632 = setAggregation(mmce, b632)
mmceB632plus = setAggregation(mmce, b632plus)
mmceB632
## Name: Mean misclassification error
## Performance measure: mmce
## Properties: classif,classif.multi,req.pred,req.truth
## Minimize: TRUE
## Best: 0; Worst: 1
## Aggregated by: b632
## Arguments:
## Note: Defined as: mean(response != truth)
r = resample("classif.rpart", iris.task, rdesc, measures = list(mmce, mmceB632, mmceB632plus),
show.info = FALSE)
head(r$measures.train)
## iter mmce mmce mmce
## 1 1 0.02000000 0.02000000 0.02000000
## 2 2 0.03333333 0.03333333 0.03333333
## 3 3 0.03333333 0.03333333 0.03333333
## 4 4 0.03333333 0.03333333 0.03333333
## 5 5 0.02666667 0.02666667 0.02666667
## 6 6 0.02666667 0.02666667 0.02666667
### Compare misclassification rates for out-of-bag, b632, and b632+ bootstrap
r$aggr
## mmce.test.mean mmce.b632 mmce.b632plus
## 0.05789270 0.04640152 0.04702951The functionality described on this page allows for much control and flexibility. However, when quickly trying out some learners, it can get tedious to type all the code for defining the resampling strategy, setting the aggregation scheme and so on. As mentioned above, mlr includes some pre-defined resample description objects for frequently used strategies like, e.g., 5-fold cross-validation (cv5 (makeResampleDesc())). Moreover, mlr provides special functions for the most common resampling methods, for example holdout (resample()), crossval (resample()), or bootstrapB632 (resample()).
crossval("classif.lda", iris.task, iters = 3, measures = list(mmce, ber))
## Resampling: cross-validation
## Measures: mmce ber
## [Resample] iter 1: 0.0000000 0.0000000
## [Resample] iter 2: 0.0200000 0.0208333
## [Resample] iter 3: 0.0400000 0.0360624
##
## Aggregated Result: mmce.test.mean=0.0200000,ber.test.mean=0.0189652
##
## Resample Result
## Task: iris-example
## Learner: classif.lda
## Aggr perf: mmce.test.mean=0.0200000,ber.test.mean=0.0189652
## Runtime: 0.0341454
bootstrapB632plus("regr.lm", bh.task, iters = 3, measures = list(mse, mae))
## Resampling: OOB bootstrapping
## Measures: mse.train mae.train mse.test mae.test
## [Resample] iter 1: 26.3220768 3.6285816 18.6233994 3.2662127
## [Resample] iter 2: 28.8128077 3.7733788 23.3713827 3.7176257
## [Resample] iter 3: 20.0009969 3.2507700 25.6730794 3.4428892
##
## Aggregated Result: mse.b632plus=23.5444133,mae.b632plus=3.5054215
##
## Resample Result
## Task: BostonHousing-example
## Learner: regr.lm
## Aggr perf: mse.b632plus=23.5444133,mae.b632plus=3.5054215
## Runtime: 0.083518