This Vignette is supposed to give you a short introductory glance at the key features of mlr. A more detailed in depth and continuously updated tutorial can be found on the GitHub project page:

Purpose

The main goal of mlr is to provide a unified interface for machine learning tasks as classification, regression, cluster analysis and survival analysis in R. In lack of a common interface it becomes a hassle to carry out standard methods like cross-validation and hyperparameter tuning for different learners. Hence, mlr offers the following features:

  • Possibility to fit, predict, evaluate and resample models
  • Easy extension mechanism through S3 inheritance
  • Abstract description of learners and tasks by properties
  • Parameter system for learners to encode data types and constraints
  • Many convenience methods and generic building blocks for your machine learning experiments
  • Resampling like bootstrapping, cross-validation and subsampling
  • Different visualizations for e.g. ROC curves and predictions
  • Benchmarking of learners for multiple data sets
  • Easy hyperparameter tuning using different optimization strategies
  • Variable selection with filters and wrappers
  • Nested resampling of models with tuning and feature selection
  • Cost-sensitive learning, threshold tuning and imbalance correction
  • Wrapper mechanism to extend learner functionality and complex and custom ways
  • Combine different processing steps to a complex data mining chain that can be jointly optimized
  • Extension points to integrate your own stuff
  • Parallelization is built-in

Quick Start

To highlight the main principles of mlr we give a quick introduction to the package. We demonstrate how to simply perform a classification analysis using a stratified cross validation, which illustrates some of the major building blocks of the mlr workflow, namely tasks and learners.

library(mlr)
data(iris)

# Define the task:
task = makeClassifTask(id = "tutorial", data = iris, target = "Species")
print(task)
## Supervised task: tutorial
## Type: classif
## Target: Species
## Observations: 150
## Features:
##    numerics     factors     ordered functionals 
##           4           0           0           0 
## Missings: FALSE
## Has weights: FALSE
## Has blocking: FALSE
## Has coordinates: FALSE
## Classes: 3
##     setosa versicolor  virginica 
##         50         50         50 
## Positive class: NA
# Define the learner:
lrn = makeLearner("classif.lda")
print(lrn)
## Learner classif.lda from package MASS
## Type: classif
## Name: Linear Discriminant Analysis; Short name: lda
## Class: classif.lda
## Properties: twoclass,multiclass,numerics,factors,prob
## Predict-Type: response
## Hyperparameters:
# Define the resampling strategy:
rdesc = makeResampleDesc(method = "CV", stratify = TRUE)

# Do the resampling:
r = resample(learner = lrn, task = task, resampling = rdesc)
## Resampling: cross-validation
## Measures:             mmce
## [Resample] iter 1:    0.0000000
## [Resample] iter 2:    0.0000000
## [Resample] iter 3:    0.0666667
## [Resample] iter 4:    0.0000000
## [Resample] iter 5:    0.0666667
## [Resample] iter 6:    0.0000000
## [Resample] iter 7:    0.0000000
## [Resample] iter 8:    0.0000000
## [Resample] iter 9:    0.0666667
## [Resample] iter 10:   0.0000000
## 
## Aggregated Result: mmce.test.mean=0.0200000
## 
print(r)
## Resample Result
## Task: tutorial
## Learner: classif.lda
## Aggr perf: mmce.test.mean=0.0200000
## Runtime: 0.17222
# Get the mean misclassification error:
r$aggr
## mmce.test.mean 
##           0.02

Detailed Tutorial

The previous example just demonstrated a tiny fraction of the capabilities of mlr. More features are covered in the tutorial which can be found online on the mlr project page. It covers among others: benchmarking, preprocessing, imputation, feature selection, ROC analysis, how to implement your own learner and the list of all supported learners. Reading is highly recommended!