Thanks to visit codestin.com
Credit goes to github.com

Skip to content

t-kalinowski/quickr

Repository files navigation

quickr

R-CMD-check Codecov test coverage

The goal of quickr is to make your R code run quicker.

Overview

R is an extremely flexible and dynamic language, but that flexibility and dynamicism can come at the expense of speed. This package lets you trade back some of that flexibility for some speed, for the context of a single function.

The main exported function is quick(). Here is how you use it:

library(quickr)

convolve <- quick(function(a, b) {
  declare(type(a = double(NA)),
          type(b = double(NA)))
  ab <- double(length(a) + length(b) - 1)
  for (i in seq_along(a)) {
    for (j in seq_along(b)) {
      ab[i+j-1] <- ab[i+j-1] + a[i] * b[j]
    }
  }
  ab
})

quick() returns a quicker R function. How much quicker? Let’s benchmark it! For reference, we’ll also compare it to a pure-C implementation.

slow_convolve <- function(a, b) {
  declare(type(a = double(NA)),
          type(b = double(NA)))
  ab <- double(length(a) + length(b) - 1)
  for (i in seq_along(a)) {
    for (j in seq_along(b)) {
      ab[i+j-1] <- ab[i+j-1] + a[i] * b[j]
    }
  }
  ab
}

library(quickr)
quick_convolve <- quick(slow_convolve)

convolve_c <- inline::cfunction(
  sig = c(a = "SEXP", b = "SEXP"), body = r"({
    int na, nb, nab;
    double *xa, *xb, *xab;
    SEXP ab;

    a = PROTECT(Rf_coerceVector(a, REALSXP));
    b = PROTECT(Rf_coerceVector(b, REALSXP));
    na = Rf_length(a); nb = Rf_length(b); nab = na + nb - 1;
    ab = PROTECT(Rf_allocVector(REALSXP, nab));
    xa = REAL(a); xb = REAL(b); xab = REAL(ab);
    for(int i = 0; i < nab; i++) xab[i] = 0.0;
    for(int i = 0; i < na; i++)
        for(int j = 0; j < nb; j++)
            xab[i + j] += xa[i] * xb[j];
    UNPROTECT(3);
    return ab;
})")



a <- runif(100000); b <- runif(100)

timings <- bench::mark(
  r = slow_convolve(a, b),
  quickr = quick_convolve(a, b),
  c = convolve_c(a, b),
  min_time = 2
)
timings
#> # A tibble: 3 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 r             481ms 481.18ms      2.08     782KB     8.31
#> 2 quickr        911µs   1.06ms    940.       782KB    18.2 
#> 3 c             916µs   1.06ms    946.       782KB    17.9
plot(timings) + bench::scale_x_bench_time(base = NULL)

In the case of convolve(), quick() returns a function approximately 200 times quicker, with performance similar to the C function.

quick() can accelerate any R function, with some restrictions:

  • Function arguments must have their types and shapes declared using declare().
  • Only atomic vectors, matrices, and array are currently supported: integer, double, logical, and complex.
  • The return value must be an atomic array (e.g., not a list)
  • Named variables must have consistent shapes throughout their lifetimes.
  • NA values are not supported.
  • Only a subset of R’s vocabulary is currently supported.
#>  [1] -            :            !            !=           (           
#>  [6] [            [<-          [<<-         {            *           
#> [11] /            &            &&           %*%          %/%         
#> [16] %%           %o%          ^            +            <           
#> [21] <-           <<-          <=           =            ==          
#> [26] >            >=           |            ||           Arg         
#> [31] Conj         Fortran      Im           Mod          Re          
#> [36] abs          acos         array        as.double    asin        
#> [41] atan         backsolve    break        c            cat         
#> [46] cbind        ceiling      character    chol         chol2inv    
#> [51] cos          crossprod    declare      diag         dim         
#> [56] double       exp          floor        for          forwardsolve
#> [61] if           ifelse       integer      length       log         
#> [66] log10        logical      matrix       max          min         
#> [71] ncol         next         nrow         numeric      outer       
#> [76] print        prod         qr.solve     raw          rbind       
#> [81] repeat       runif        seq          seq_along    seq_len     
#> [86] sin          solve        sqrt         sum          t           
#> [91] tan          tcrossprod   which.max    which.min    while

Many of these restrictions are expected to be relaxed as the project matures. However, quickr is intended primarily for high-performance numerical computing, so features like polymorphic dispatch or support for complex or dynamic types are out of scope.

declare(type()) syntax:

The shape and mode of all function arguments must be declared. Local and return variables may optionally also be declared.

declare(type()) also has support for declaring size constraints, or size relationships between variables. Here are some examples of declare calls:

declare(type(x = double(NA))) # x is a 1-d double vector of any length
declare(type(x = double(10))) # x is a 1-d double vector of length 10
declare(type(x = double(1)))  # x is a scalar double

declare(type(x = integer(2, 3)))  # x is a 2-d integer matrix with dim (2, 3)
declare(type(x = integer(NA, 3))) # x is a 2-d integer matrix with dim (<any>, 3)

# x is a 4-d logical matrix with dim (<any>, 24, 24, 3)
declare(type(x = logical(NA, 24, 24, 3)))

# x and y are 1-d double vectors of any length
declare(type(x = double(NA)),
        type(y = double(NA)))

# x and y are 1-d double vectors of the same length
declare(
  type(x = double(n)),
  type(y = double(n)),
)

# x and y are 1-d double vectors, where length(y) == length(x) + 2
declare(type(x = double(n)),
        type(y = double(n+2)))

Tip: declare shapes as specifically as you can. quickr uses these size constraints both for compile-time checking and for choosing more efficient implementations. For example, if you know a matrix is square, declare it as type(A = double(n, n)) (not double(n, k)). That can allow quickr to use a faster code path for operations like solve(A, b). If the compiler can’t prove a matrix is square, it may need to fall back to a more general rectangular solver.

More examples:

viterbi

The Viterbi algorithm is an example of a dynamic programming algorithm within the family of Hidden Markov Models (https://en.wikipedia.org/wiki/Viterbi_algorithm). Here, quick() makes viterbi() approximately 50 times faster.

slow_viterbi <- function(observations, states, initial_probs, transition_probs, emission_probs) {
    declare(
      type(observations = integer(num_steps)),
      type(states = integer(num_states)),
      type(initial_probs = double(num_states)),
      type(transition_probs = double(num_states, num_states)),
      type(emission_probs = double(num_states, num_obs)),
    )

    trellis <- matrix(0, nrow = length(states), ncol = length(observations))
    backpointer <- matrix(0L, nrow = length(states), ncol = length(observations))
    trellis[, 1] <- initial_probs * emission_probs[, observations[1]]

    for (step in 2:length(observations)) {
      for (current_state in 1:length(states)) {
        probabilities <- trellis[, step - 1] * transition_probs[, current_state]
        trellis[current_state, step] <- max(probabilities) * emission_probs[current_state, observations[step]]
        backpointer[current_state, step] <- which.max(probabilities)
      }
    }

    path <- integer(length(observations))
    path[length(observations)] <- which.max(trellis[, length(observations)])
    for (step in seq(length(observations) - 1, 1)) {
      path[step] <- backpointer[path[step + 1], step + 1]
    }

    out <- states[path]
    out
}

quick_viterbi <- quick(slow_viterbi)

set.seed(1234)
num_steps <- 16
num_states <- 8
num_obs <- 16

observations <- sample(1:num_obs, num_steps, replace = TRUE)
states <- 1:num_states
initial_probs <- runif (num_states)
initial_probs <- initial_probs / sum(initial_probs)  # normalize to sum to 1
transition_probs <- matrix(runif (num_states * num_states), nrow = num_states)
transition_probs <- transition_probs / rowSums(transition_probs)  # normalize rows
emission_probs <- matrix(runif (num_states * num_obs), nrow = num_states)
emission_probs <- emission_probs / rowSums(emission_probs)  # normalize rows

timings <- bench::mark(
  slow_viterbi = slow_viterbi(observations, states, initial_probs,
                              transition_probs, emission_probs),
  quick_viterbi = quick_viterbi(observations, states, initial_probs,
                                transition_probs, emission_probs)
)
timings
#> # A tibble: 2 × 6
#>   expression         min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 slow_viterbi   62.36µs  72.08µs    13291.    1.59KB     29.2
#> 2 quick_viterbi   1.68µs   1.89µs   522138.        0B      0
plot(timings)

Diffusion simulation

Simulate how heat spreads over time across a 2D grid, using the finite difference method applied to the Heat Equation.

Note the use of local closures within the quick function. Here, quick() returns a function approximately 20 times faster. The speedup is relatively modest because the core operation of computing the laplacian is already expressed as a vectorized operation. If we were instead comparing against for loops that operate on individual array elements, the speedup would be much more substantial. In general, idiomatic vectorized R code is quite fast already!

diffuse_heat <- function(nx, ny, dx, dy, dt, k, steps) {
  declare(
    type(nx = integer(1)),
    type(ny = integer(1)),
    type(dx = integer(1)),
    type(dy = integer(1)),
    type(dt = double(1)),
    type(k = double(1)),
    type(steps = integer(1))
  )

  temp <- matrix(0, nx, ny)
  temp[nx %/% 2L, ny %/% 2L] <- 100

  apply_boundary_conditions <- function(temp) {
    temp[1, ] <- 0
    temp[nx, ] <- 0
    temp[, 1] <- 0
    temp[, ny] <- 0
    temp
  }

  update_temperature <- function(temp) {
    temp_new <- temp

    i <- 2:(nx - 1)
    j <- 2:(ny - 1)

    laplacian <-
      (temp[i + 1, j] - 2 * temp[i, j] + temp[i - 1, j]) / dx ^ 2 +
      (temp[i, j + 1] - 2 * temp[i, j] + temp[i, j - 1]) / dy ^ 2

    temp_new[i, j] <- temp[i, j] + k * dt * laplacian
    temp_new
  }

  for (step in seq_len(steps)) {
    temp <- temp |>
      apply_boundary_conditions() |>
      update_temperature()
  }

  temp
}


quick_diffuse_heat <- quick(diffuse_heat)

# Parameters
nx <- 100L      # Grid size in x
ny <- 100L      # Grid size in y
dx <- 1L        # Grid spacing
dy <- 1L        # Grid spacing
dt <- 0.01      # Time step
k <- 0.1        # Thermal diffusivity
steps <- 500L   # Number of time steps

timings <- bench::mark(
  diffuse_heat = diffuse_heat(nx, ny, dx, dy, dt, k, steps),
  quick_diffuse_heat = quick_diffuse_heat(nx, ny, dx, dy, dt, k, steps)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
summary(timings, relative = TRUE)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression           min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <dbl>  <dbl>     <dbl>     <dbl>    <dbl>
#> 1 diffuse_heat        10.1   11.6       1       4893.      Inf
#> 2 quick_diffuse_heat   1      1        13.4        1       NaN
plot(timings)

Rolling Mean

Here is quickr used to calculate a rolling mean. The CRAN package RcppRoll already provides a highly optimized rolling mean, which we include in the benchmarks for comparison.

slow_roll_mean <- function(x, weights, normalize = TRUE) {
  declare(
    type(x = double(NA)),
    type(weights = double(NA)),
    type(normalize = logical(1))
  )
  out <- double(length(x) - length(weights) + 1)
  n <- length(weights)
  if (normalize)
    weights <- weights/sum(weights)*length(weights)

  for(i in seq_along(out)) {
    out[i] <- sum(x[i:(i+n-1)] * weights) / length(weights)
  }
  out
}

quick_roll_mean <- quick(slow_roll_mean)

x <- dnorm(seq(-3, 3, len = 100000))
weights <- dnorm(seq(-1, 1, len = 100))

timings <- bench::mark(
  r = slow_roll_mean(x, weights),
  rcpp = RcppRoll::roll_mean(x, weights = weights),
  quickr = quick_roll_mean(x, weights = weights)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
timings
#> # A tibble: 3 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 r           68.34ms  78.79ms      9.64  124.24MB    21.2 
#> 2 rcpp         5.92ms   6.29ms    155.      4.46MB     1.99
#> 3 quickr       2.15ms   2.26ms    438.    781.35KB     3.98

timings$expression <- factor(names(timings$expression), rev(names(timings$expression)))
plot(timings) + bench::scale_x_bench_time(base = NULL)

Linear regression benchmark (fastLm)

quickr supports a variety of matrix operations from base R, including most of the common linear algebra functions like %*%, crossprod(), tcrossprod(), solve(), and chol(). Performance is generally similar to running the same code in R itself, but as you start to chain multiple operations in a function, you may see speed-ups due to reduced interpreter overhead. Actual performance depends on the BLAS/LAPACK libraries linked to your R build (e.g. OpenBLAS, MKL, Accelerate).

To illustrate, here is a benchmark showing a fastLm-style implementation comparing base R, quickr, and (for reference) a compiled RcppArmadillo implementation. The fastLm function is adapted from the README of the RcppArmadillo project.

fast_lm <- function(X, y) {
  declare(
    type(X = double(n, k)),
    type(y = double(n))
  )

  n <- nrow(X)
  k <- ncol(X)
  df_residual <- n - k

  coef <- qr.solve(X, y)     # fit model y ~ X
  res <- y - X %*% coef      # residuals
  s2 <- sum(res * res) / df_residual

  XtX <- crossprod(X)
  XtX_inv <- qr.solve(XtX, diag(k))
  std_err <- sqrt(s2 * diag(XtX_inv))

  list(
    coefficients = coef,
    stderr       = std_err,
    df.residual  = df_residual
  )
}

quick_fast_lm <- quick(fast_lm)

Rcpp::sourceCpp(
code = '
#include <RcppArmadillo/Lighter>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
Rcpp::List rcpp_armadillo_fast_lm(const arma::mat& X, const arma::colvec& y) {
    int n = X.n_rows, k = X.n_cols;

    arma::colvec coef = arma::solve(X, y);     // fit model y ~ X
    arma::colvec res  = y - X*coef;            // residuals
    double s2 = arma::dot(res, res) / (n - k); // std.errors of coefficients
    arma::colvec std_err = arma::sqrt(s2 * arma::diagvec(arma::pinv(arma::trans(X)*X)));

    return Rcpp::List::create(Rcpp::Named("coefficients") = coef,
                              Rcpp::Named("stderr")       = std_err,
                              Rcpp::Named("df.residual")  = n - k);
}')

Benchmark

set.seed(1)

beta <- c(0.5, 1.0, -2.0, 10, 5)
X <- cbind(1, matrix(rnorm(3 * 10^5), ncol = 4))
y <- as.vector(X %*% beta + rnorm(nrow(X), sd = 2))

timings <- bench::mark(
  R = fast_lm(X, y),
  quickr = quick_fast_lm(X, y),
  RcppArmadillo = rcpp_armadillo_fast_lm(X, y)
)

timings
#> # A tibble: 3 × 6
#>   expression         min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 R               1.28ms   1.82ms      545.    13.7MB     148.
#> 2 quickr          1.65ms   1.73ms      575.        0B       0 
#> 3 RcppArmadillo   1.67ms   2.02ms      491.        0B       0
plot(timings)


Parallelize loops with OpenMP

Use declare(parallel()) to annotate the next for loop or sapply() call for OpenMP parallelization. Parallel loops must be order-independent: avoid shared-state updates or inter-iteration dependencies. OpenMP adds overhead, so it can be slower for small workloads, but substantially faster for larger ones.

Here is a concrete example using colSums(). At smaller sizes, the quickr serial version is fastest (even faster than base colSums()). As sizes grow, the two serial versions converge, and the parallel version pulls ahead. However, the speedup is not linear with core count (e.g., with 12 cores, the speedup is closer to ~6x).

colSums_quick_parallel <- quick(function(x) {
  declare(type(x = double(NA, NA)))
  declare(parallel())
  sapply(seq_len(nrow(x)), \(r) sum(x[, r]))
})

colSums_quick <- quick(function(x) {
  declare(type(x = double(NA, NA)))
  sapply(seq_len(nrow(x)), \(r) sum(x[, r]))
})

r <- bench::press(
  n = 2^(4:14),
  {
    m <- array(runif(n * n), c(n, n))
    bench::mark(
      parallel = colSums_quick_parallel(m),
      quick = colSums_quick(m),
      base = colSums(m),
    )
  },
  .quiet = TRUE
)

library(ggplot2)
library(dplyr, warn.conflicts = FALSE)

r |>
  mutate(.before = 1,
         desc = attr(expression, "description")) |>
  select(desc, n, median) |>
  ggplot(aes(x = n, y = median, color = desc)) +
  geom_point() + geom_line() +
  scale_x_log10() + bench::scale_y_bench_time()

quickr does not set OpenMP thread counts. To control threads, set OMP_NUM_THREADS (and optionally OMP_THREAD_LIMIT or OMP_DYNAMIC) before calling a compiled function, e.g. Sys.setenv(OMP_NUM_THREADS = "4").

Using quickr in an R package

When called in a package, quick() will pre-compile the quick functions and place them in the ./src directory. Run devtools::load_all() or quickr::compile_package() to ensure that the generated files in ./src and ./R are in sync with each other.

In a package, you must provide a function name to quick(). For example:

my_fun <- quick(name = "my_fun", function(x) ....)

Installation

You can install quickr from CRAN with:

install.packages("quickr")

You can install the development version of quickr from GitHub with:

# install.packages("pak")
pak::pak("t-kalinowski/quickr")

You will also need a C and Fortran compiler, preferably the same ones used to build R itself.

On macOS:

  • Make sure xcode tools and gfortran are installed, as described in https://mac.r-project.org/tools/. In Terminal, run:

    sudo xcode-select --install
    # curl -LO https://mac.r-project.org/tools/gfortran-12.2-universal.pkg # R 4.4
    # sudo installer -pkg gfortran-12.2-universal.pkg -target /
    curl -LO https://mac.r-project.org/tools/gfortran-14.2-universal.pkg   # R 4.5
    sudo installer -pkg gfortran-14.2-universal.pkg -target /
  • Optional: install flang-new via Homebrew (used by quickr on macOS when available):

    brew install flang

On Windows:

  • Install the latest version of Rtools

On Linux:

  • The “Install Required Dependencies” section here provides detailed instructions for installing R build tools on various Linux flavors.

About

R to Fortran Transpiler

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published

Contributors 5