MyNixOS website logo
Description

Meta Clustering with Similarity Network Fusion.

Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.

Meta clustering with Similarity Network Fusion

Brief Overview

metasnf is a package that facilitates usage of the meta clustering paradigm described in (Caruana et al. 2006) with the similarity network fusion (SNF) data integration procedure developed in (Wang et al. 2014). The package offers a comprehensive suite of tools to assist users in transforming multi-modal tabular data into cluster solutions, decision making in the clustering process, and visualization along the way with a strong emphasis on context-specific utility and principled validation of results.

Installation

You will need R version 4.1.0 or higher to install this package. metasnf can be installed from CRAN:

install.packages("metasnf")

Development versions can be installed from GitHub:

# Latest development version
devtools::install_github("BRANCHlab/metasnf")

# Install a specific tagged version
devtools::install_github("BRANCHlab/[email protected]")

Quick Start

Minimal usage of the package looks like this:

# Load the package
library(metasnf)

# Setting up the data
data_list <- generate_data_list(
    list(abcd_cort_t, "cortical_thickness", "neuroimaging", "continuous"),
    list(abcd_cort_sa, "cortical_surface_area", "neuroimaging", "continuous"),
    list(abcd_subc_v, "subcortical_volume", "neuroimaging", "continuous"),
    list(abcd_income, "household_income", "demographics", "continuous"),
    list(abcd_pubertal, "pubertal_status", "demographics", "continuous"),
    uid = "patient"
)
#> Warning in generate_data_list(list(abcd_cort_t, "cortical_thickness",
#> "neuroimaging", : 200 subject(s) dropped due to incomplete data.

# Specifying 5 different sets of settings for SNF
set.seed(42)
settings_matrix <- generate_settings_matrix(
    data_list,
    nrow = 5,
    max_k = 40
)

# This matrix has clustering solutions for each of the 5 SNF runs!
solutions_matrix <- batch_snf(data_list, settings_matrix)

Check out the tutorial vignettes below to learn about how the package can be used:

And more tutorials can be found under the “articles” section of the documentation home page: https://branchlab.github.io/metasnf/index.html

Background

Why use meta clustering?

Clustering algorithms seek solutions where members of the same cluster are very similar to each other and members of distinct clusters are very dissimilar to each other. In sufficiently noisy datasets where many qualitatively distinct solutions with similar scores of clustering quality exist, it is not necessarily the case that the top solution selected by a clustering algorithm will also be the most useful one for the user’s context.

To address this issue, the original meta clustering procedure Caruana et al., 2006 involved generating a large number of reasonable clustering solutions, clustering those solutions into qualitatively similar ones, and having the user examine those “meta clusters” to find something that seems like it’ll be the most useful.

Why use SNF?

In the clinical data setting, we often have access to patient data across a wide range of domains, such as imaging, genetics, biomarkers, demographics. When trying to extract subtypes out of all this information, direct concatenation of the data followed by cluster analysis can result in a substantial amount of lost (valuable) signal contained in each individual domain. Empirically, SNF has been demonstrated to effectively integrate highly diverse patient data for the purposes of clinical subtyping.

Documentation

Example workflows

Essential objects

Further customization of generated solutions

Additional functionality

Plotting

References

Caruana, Rich, Mohamed Elhawary, Nam Nguyen, and Casey Smith. 2006. “Meta Clustering.” In Sixth International Conference on Data Mining (ICDM’06), 107–18. https://doi.org/10.1109/ICDM.2006.103.

Wang, Bo, Aziz M. Mezlini, Feyyaz Demir, Marc Fiume, Zhuowen Tu, Michael Brudno, Benjamin Haibe-Kains, and Anna Goldenberg. 2014. “Similarity Network Fusion for Aggregating Data Types on a Genomic Scale.” Nature Methods 11 (3): 333–37. https://doi.org/10.1038/nmeth.2810.

Metadata

Version

1.1.2

License

Unknown

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows