MyNixOS website logo
Description

Fast Imputations Using 'Rcpp' and 'Armadillo'.

Fast imputations under the object-oriented programming paradigm. Moreover there are offered a few functions built to work with popular R packages such as 'data.table' or 'dplyr'. The biggest improvement in time performance could be achieve for a calculation where a grouping variable have to be used. A single evaluation of a quantitative model for the multiple imputations is another major enhancement. A new major improvement is one of the fastest predictive mean matching in the R world because of presorting and binary search.

miceFast

Author: Maciej Nasinski

Check the miceFast website for more details

R build status CRAN codecov Dependencies

Overview

miceFast provides fast methods for imputing missing data, leveraging an object-oriented programming paradigm and optimized linear algebra routines.
The package includes convenient helper functions compatible with data.table, dplyr, and other popular R packages.

Major speed improvements occur when:

  • Using a grouping variable, where the data is automatically sorted by group, significantly reducing computation time.
  • Performing multiple imputations, by evaluating the underlying quantitative model only once for multiple draws.
  • Running Predictive Mean Matching (PMM), thanks to presorting and binary search.

For performance details, see performance_validity.R in the extdata folder.

It is recommended to read the Advanced Usage Vignette.

Installation

You can install miceFast from CRAN:

install.packages("miceFast")

Or install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("polkas/miceFast")

Quick Example

Below is a short demonstration. See the vignette for advanced usage and best practices.

library(miceFast)

set.seed(1234)
data(air_miss)

# Visualize the NA structure
upset_NA(air_miss, 6)

# Simple and naive fill
imputed_data <- naive_fill_NA(air_miss)

# Compare with other packages:
# Hmisc
library(Hmisc)
data.frame(Map(function(x) Hmisc::impute(x, "random"), air_miss))

# mice
library(mice)
mice::complete(mice::mice(air_miss, printFlag = FALSE))

Key Features

  • Object-Oriented Interface via miceFast objects (Rcpp modules).
  • Convenient Helpers:
    • fill_NA(): Single imputation (lda, lm_pred, lm_bayes, lm_noise).
    • fill_NA_N(): Multiple imputations (pmm, lm_bayes, lm_noise).
    • VIF(): Variance Inflation Factor calculations.
    • naive_fill_NA(): Automatic naive imputations.
    • compare_imp(): Compare original vs. imputed values.
    • upset_NA(): Visualize NA structure using UpSetR.

Quick Reference Table:

FunctionDescription
new(miceFast)Creates an OOP instance with numerous imputation methods (see the vignette).
fill_NA()Single imputation: lda, lm_pred, lm_bayes, lm_noise.
fill_NA_N()Multiple imputations (N repeats): pmm, lm_bayes, lm_noise.
VIF()Computes Variance Inflation Factors.
naive_fill_NA()Performs automatic, naive imputations.
compare_imp()Compares imputations vs. original data.
upset_NA()Visualizes NA structure using an UpSet plot.

Performance Highlights

Benchmark testing (on R 4.2, macOS M1) shows miceFast can significantly reduce computation time, especially in these scenarios:

  • Linear Discriminant Analysis (LDA): ~5x faster.
  • Grouping Variable Imputations: ~10x faster (and can exceed 100x in some edge cases).
  • Multiple Imputations: ~x * (number of multiple imputations) faster, since the model is computed only once.
  • Variance Inflation Factors (VIF): ~5x faster, because we only compute the inverse of X'X.
  • Predictive Mean Matching (PMM): ~3x faster, thanks to presorting and binary search.

For performance details, see performance_validity.R in the extdata folder.

Metadata

Version

0.8.5

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows