MyNixOS website logo
Description

Autoencoding Random Forests.

Autoencoding Random Forests ('RFAE') provide a method to autoencode mixed-type tabular data using Random Forests ('RF'), which involves projecting the data to a latent feature space of user-chosen dimensionality (usually a lower dimension), and then decoding the latent representations back into the input space. The encoding stage is useful for feature engineering and data visualisation tasks, akin to how principal component analysis ('PCA') is used, and the decoding stage is useful for compression and denoising tasks. At its core, 'RFAE' is a post-processing pipeline on a trained random forest model. This means that it can accept any trained RF of 'ranger' object type: 'RF', 'URF' or 'ARF'. Because of this, it inherits Random Forests' robust performance and capacity to seamlessly handle mixed-type tabular data. For more details, see Vu et al. (2025) <doi:10.48550/arXiv.2505.21441>.

Autoencoding Random Forests

Autoencoding Random Forests ('RFAE') provide a method to autoencode data using Random Forests ('RF'), which involves projecting the data to a latent feature space of chosen dimensionality (usually a lower dimension), and then decoding the latent representations back into the input space. The encoding stage is useful for feature engineering and data visualisation tasks, akin to how principal component analysis ('PCA') is used , and the decoding stage is usefulfor compression and denoising tasks. At its core, 'RFAE' is a post-processing pipeline on a trained random forest model. This means that it can accept any trained RF of ranger object type: 'RF', 'URF' or ARFs'. Because of this, it inherits RFs' robust performance and capacity to seamlessly handle mixed-type tabular data.

The package can be installed by running:

devtools::install_github("bips-hb/RFAE")

You can also clone the repository and run:

devtools::build("RFAE")

Examples

Using Fisher's iris dataset, we train a RF and pass it through the autoencoding pipeline:

# Set seed
set.seed(1)
# Split training and test
trn <- sample(1:nrow(iris), 100)
tst <- setdiff(1:nrow(iris), trn)
# Train RF
rf <- ranger::ranger(Species ~ ., data = iris[trn, ], num.trees=50)

Encode data and project test data to create new embeddings:

# Fit encoder object
emap <- encode(rf, iris[trn, ], k=2)
# Embed new test samples
emb <- predict(emap, rf, iris[tst, ])

Decode test samples back to the input space:

# Decode samples
out <- decode_knn(rf, emap, emb, k=5)$x_hat

Measure the reconstruction error between decoded and actual samples:

error <- reconstruction_error(out, iris[tst, ])

For more detailed examples, refer to the package vignette.

Python Library

The Python version of RFAE is currently under development. A preliminary version is currently available at RFAE_py

References

  • Vu, B. D., Kapar, J., Wright, M., & Watson, D. S. (2025). Autoencoding Random Forests. arXiv preprint arXiv:2505.21441. Link here - NeurIPS version coming soon!
Metadata

Version

0.1.0

License

Unknown

Platforms (78)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows