MyNixOS website logo
Description

OPTICS K-Xi Density-Based Clustering.

Provides a novel density-based cluster extraction method, OPTICS k-Xi, and a framework to compare k-Xi models using distance-based metrics to investigate datasets with unknown number of clusters.

OPTICS k-Xi

This R package provides a novel cluster extraction method for the OPTICS algorithm, OPTICS k-Xi, along with ggplot2 visualizations and a framework to compare clustering models with varying parameters using distance-based metrics.

Summary

Density-based clustering methods are well adapted to the clustering of high-dimensional data and enable the discovery of core groups of various shapes despite large amounts of noise.

The opticskxi R package provides a novel density-based cluster extraction method, OPTICS k-Xi, and a framework to compare k-Xi models using distance-based metrics to investigate datasets with unknown number of clusters. The vignette first introduces density-based algorithms with simulated datasets, then presents and evaluates the k-Xi cluster extraction method. Finally, the models comparison framework is described and experimented on 2 genetic datasets to identify groups and their discriminating features.

The k-Xi algorithm is a novel OPTICS cluster extraction method that specifies directly the number of clusters and does not require fine-tuning of the steepness parameter as the OPTICS Xi method. Combined with a framework that compares models with varying parameters, the OPTICS k-Xi method can identify groups in noisy datasets with unknown number of clusters.

Installation

Using the devtools package in R:

  devtools::install_git('https://framagit.org/thomaschln/opticskxi.git')

Usage

Compute OPTICS profile and k-Xi clustering

  data('multishapes')
  optics_shapes <- dbscan::optics(multishapes[1:2])
  kxi_shapes <- opticskxi(optics_shapes, n_xi = 5, pts = 30)

Visualize with ggplot2

  ggplot_optics(optics_shapes)
  ggplot_kxi_profile(kxi_shapes)

Compare multiple k-Xi models in dataset with unknown number of clusters and visualize the best models:

  • Compute k-Xi models with varying parameters and their distance-based metrics
   data('hla')
   m_hla <- hla[-c(1:2)] %>% scale
   df_params_hla <- expand.grid(n_xi = 3:5, pts = c(20, 30, 40),
     dist = c('manhattan', 'euclidean', 'abscorrelation', 'abspearson'))
   df_kxi_hla <- opticskxi_pipeline(m_hla, df_params_hla)
  • Visualize the metrics and OPTICS profiles of the models with highest average silhouette width
   ggplot_kxi_metrics(df_kxi_hla, n = 8)
   gtable_kxi_profiles(df_kxi_hla) %>% plot
  • Extract the second best model and visualize the clusters using PCA dimension reduction
   best_kxi_hla <- get_best_kxi(df_kxi_hla, rank = 2)
   clusters_hla <- best_kxi_hla$clusters
   fortify_pca(m_hla, sup_vars = data.frame(Clusters = clusters_hla)) %>%
     ggpairs('Clusters', ellipses = TRUE, variables = TRUE)

See the vignette for results and further details.

Acknowledgements

This work was inspired by Jérôme Wojcik (Precision for Medicine) and Sviatoslav Voloshynovskiy (University of Geneva).

License

This package is free and open source software, licensed under GPL-3.

Metadata

Version

0.1

License

Unknown

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows