MyNixOS website logo
Description

Circular, Periodic, or Framed Data Clustering: Fast, Optimal, and Reproducible.

Fast, optimal, and reproducible clustering algorithms for circular, periodic, or framed data. The algorithms introduced here are based on a core algorithm for optimal framed clustering the authors have developed (Debnath & Song 2021) <doi:10.1109/TCBB.2021.3077573>. The runtime of these algorithms is O(K N log^2 N), where K is the number of clusters and N is the number of circular data points. On a desktop computer using a single processor core, millions of data points can be grouped into a few clusters within seconds. One can apply the algorithms to characterize events along circular DNA molecules, circular RNA molecules, and circular genomes of bacteria, chloroplast, and mitochondria. One can also cluster climate data along any given longitude or latitude. Periodic data clustering can be formulated as circular clustering. The algorithms offer a general high-performance solution to circular, periodic, or framed data clustering.

The 'OptCirClust' R package

Overview

The package provides fast, optimal, and reproducible clustering for circular, periodic, or framed data. The performance is guaranteed by embedding divide-and-conquer, dynamic programming, and another divide-and-conquer. The runtime of the algorithms is $O(K N \log^2 N)$, where $K$ is the number of clusters and $N$ is the number of circular data points.

The core algorithm solves the framed clustering problem by minimizing the within-cluster sum of squared distances. In contrast to heuristic clustering, the efficiency and accuracy of fast optimal framed clustering is evident both theoretically and practically. The fast algorithm use divide-and-conquer inside dynamic programming inside another divide-and-conquer, guaranteeing cluster optimality---the total of within-cluster sums of squared distances is always the minimum given the number of clusters. Two plot functions visualize circular and framed clusters to reveal patterns in data. On a desktop computer using a single processor core, millions of data points can be clustered within seconds. The algorithms offer a general high-performance solution to circular, periodic, or framed data clustering.

Main methods

The function to perform fast optimal circular data clustering is CirClust(). The function to perform the fast framed clustering is FramedClust(). Both functions with default method have a runtime of $O(K N \log^2N)$, linear polylogarithmic in sample size $N$ and linear in the number of clusters $K$. The space complexity of both functions is $O(KN)$. Their implementations scale large input data to improve numerical precision.

When to use the package

One can use the algorithms to cluster circular, periodic, or framed data. Here circular data broadly refer to data points on any non-self-intersecting loop. Periodic data clustering can be formulated as circular clustering. Framed clustering finds one frame of fixed size that contains best clusters. One can use them to characterize events along circular DNA molecules, circular RNA molecules, and circular genomes of bacteria, chloroplast, and mitochondria. One can also cluster climate data along any given longitude or latitude.

To download and install the package

install.packages("OptCirClust")
Metadata

Version

0.0.4

License

Unknown

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows