-
Notifications
You must be signed in to change notification settings - Fork 19
Description
irlba seems to behave inconsistently depending on whether column-centring is performed explicitly or via center. To demonstrate, I've mocked up some single-cell RNA-seq data:
set.seed(1000)
ncells <- 100
ngenes <- 10000
counts <- matrix(as.double(rpois(ncells*ngenes, lambda=100)), ncol=ncells)
centers <- rowMeans(counts)If I apply irlba on the transposed matrix (i.e., genes are now columns, cells are rows) with explicit centring outside the function or via center, I get substantially different results:
library(irlba)
set.seed(100)
out <- irlba(t(counts - centers), nu=10, nv=10)
head(out$d)
## [1] 1105.339 1091.932 1086.880 1085.875 1083.415 1080.327
set.seed(100)
out2 <- irlba(t(counts), center=centers, nu=10, nv=10)
head(out2$d)
## [1] 3961.623 2629.205 1221.687 1190.174 1170.183 1165.110I might have expected some small differences due to vagaries of random initialization or numerical precision, but these differences in the singular values seem to be rather large. On a related note, running the following code in a fresh R session results in a segfault ("memory not mapped"):
set.seed(1000)
ncells <- 100
ngenes <- 10000
counts <- matrix(rpois(ncells*ngenes, lambda=100), ncol=ncells)
centers <- rowMeans(counts)
out <- irlba::irlba(t(counts), center=centers, nu=10, nv=10)Presumably, it's something to do with the integer nature of counts, as coercion to double-precision avoids the problem. Anyway, here's my session information:
R version 3.4.0 Patched (2017-04-24 r72627)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
Matrix products: default
BLAS: /home/cri.camres.org/lun01/Software/R/R-3-4-branch_devel/lib/libRblas.so
LAPACK: /home/cri.camres.org/lun01/Software/R/R-3-4-branch_devel/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] irlba_2.2.1 Matrix_1.2-11
loaded via a namespace (and not attached):
[1] compiler_3.4.0 grid_3.4.0 lattice_0.20-35