Description
Randomized Singular Value Decomposition Algorithms with 'RcppEigen'.
Description
Randomized Singular Value Decomposition (RSVD) methods proposed in the 'PCAone' paper by Li (2022) <doi:10.1101/2022.05.25.493261>, where we implement and propose two RSVD methods. One is based on Yu (2017) <arXiv:1704.07669> single pass RSVD but with power iteration scheme. The other is our new window based RSVD.
README.md
PCAone algorithms in R with RcppEigen!
Installation
You can install the development version of PCAoneR from GitHub with:
install.packages("devtools") # if not installed
devtools::install_github("Zilong-Li/PCAoneR")
Example
This is a basic example which shows you how to use pcaone:
library(pcaone)
mat <- matrix(rnorm(100*5000), 100, 5000)
res <- pcaone(mat, k = 10)
str(res)
#> List of 3
#> $ d: num [1:10] 80.6 80.1 79.6 79.1 79 ...
#> $ u: num [1:100, 1:10] -0.0565 -0.0404 -0.0268 -0.1161 0.0132 ...
#> $ v: num [1:5000, 1:10] -0.001085 -0.01002 -0.001169 -0.000801 -0.015958 ...
#> - attr(*, "class")= chr "pcaone"
Benchmarking
Let’s see the performance of pcaone
compared to the other rsvd packages.
library(microbenchmark)
library(pcaone)
library(rsvd)
data(tiger)
timing <- microbenchmark(
'SVD' = svd(tiger, nu=150, nv=150),
'rSVD' = rsvd(tiger, k=150, q = 3),
'pcaone.alg1' = pcaone(tiger, k=150, p = 3, method = "alg1"),
'pcaone.alg2' = pcaone(tiger, k=150, p = 3, windows = 8),
times=10)
print(timing, unit='s')
#> Unit: seconds
#> expr min lq mean median uq max neval
#> SVD 6.9045770 7.0373806 7.6021626 7.1162474 7.3132482 11.9759831 10
#> rSVD 2.9189460 2.9601497 3.0412117 3.0452150 3.1174170 3.1540262 10
#> pcaone.alg1 0.5404496 0.5748938 0.6106437 0.5886755 0.6262602 0.8051892 10
#> pcaone.alg2 0.8177738 0.8211726 0.8621549 0.8587051 0.8740997 0.9599908 10
The above test is run on my MacBook Pro 2019 with processor 2.6 GHz 6-Core Intel Core i7. Note that the external BLAS or MKL routine is disabled by export OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 MKL_NUM_THREADS=1
.
References
- Zilong Li, Jonas Meisner, Anders Albrechtsen (2022). PCAone: fast and accurate out-of-core PCA framework for large scale biobank data
- Wenjian Yu, Yu Gu, Jian Li, Shenghua Liu, Yaohang Li (2017). Single-Pass PCA of Large High-Dimensional Data
Todo
- write
configure
to detect and use MKL.