MyNixOS website logo
Description

'GeneSelectR' - Comprehensive Feature Selection Workflow for Bulk RNAseq Datasets.

The workflow is a versatile R package designed for comprehensive feature selection in bulk RNAseq datasets. Its key innovation lies in the seamless integration of the 'Python' 'scikit-learn' (<https://scikit-learn.org/stable/index.html>) machine learning framework with R-based bioinformatics tools. 'GeneSelectR' performs robust Machine Learning-driven (ML) feature selection while leveraging 'Gene Ontology' (GO) enrichment analysis as described by Thomas PD et al. (2022) <doi:10.1002/pro.4218>, using 'clusterProfiler' (Wu et al., 2021) <doi:10.1016/j.xinn.2021.100141> and semantic similarity analysis powered by 'simplifyEnrichment' (Gu, Huebschmann, 2021) <doi:10.1016/j.gpb.2022.04.008>. This combination of methodologies optimizes computational and biological insights for analyzing complex RNAseq datasets.

GeneSelectR

R-CMD-check Docker-Deploy

Overview

GeneSelectR is an R package designed to streamline the process of gene selection and evaluation in bulk RNAseq datasets. Built on top of the powerful scikit-learn Python library via the reticulate package, GeneSelectR offers a seamless integration of machine learning and bioinformatics capabilities in a single workflow.

Features

Comprehensive Workflow GeneSelectR provides an end-to-end solution for feature selection, combining the machine learning prowess of scikit-learn with the bioinformatics utilities of R packages like clusterprofiler and simplifyEnrichment.

Customizable Yet User-Friendly

While GeneSelectR offers a high degree of customization to cater to specific research needs, it also comes with preset configurations that are suitable for most use-cases, making it accessible for both novice and experienced users.

Diverse Feature Selection Methods

The package includes a variety of inbuilt feature selection methods, such as:

  • SelectFromModel with RandomForest
  • SelectFromModel with Logistic Regression (L1 penalty)
  • Boruta
  • Univariate Filtering

Main Functionality

The core function, GeneSelectR, performs gene selection using various methods and evaluates their performance through cross-validation. It also supports hyperparameter tuning, permutation feature importance calculation, and more.

Installation

GeneSelectR depends on reticulate that creates a conda working environment. Please, install Anaconda distribution before you proceed. You can install the development version of GeneSelectR from GitHub with:

# install.packages("devtools")
devtools::install_github("dzhakparov/GeneSelectR")

Usage and Example

A tutorial detailing how to use GeneSelectR can be accessed in this vignette.

Docker Image

GeneSelectR is available as a container image on Docker Hub. You can pull the image using the following command:

docker pull dzhakparov/geneselectr-image:latest
docker run -e PASSWORD=your_password -p 8787:8787 dzhakparov/geneselectr-image:latest

After running these commands, open your browser and go to localhost:8787 (http//local-ip-address:8787 in Windows). You will be prompted to enter username and password. The default username is rstudio and the password is the one you specified in the command above.

Citation

Please cite the following paper if you use GeneSelectR in your research:

Feedback and Contribution

Any feedback is welcome and appreciated! Feel free to create issues or pull requests. For any other questions please write to: [email protected].

Metadata

Version

1.0.1

License

Unknown

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows