MyNixOS website logo
Description

Joint Clustering and Alignment of Functional Data.

Implementations of the k-means, hierarchical agglomerative and DBSCAN clustering methods for functional data which allows for jointly aligning and clustering curves. It supports functional data defined on one-dimensional domains but possibly evaluating in multivariate codomains. It supports functional data defined in arrays but also via the 'fd' and 'funData' classes for functional data defined in the 'fda' and 'funData' packages respectively. It currently supports shift, dilation and affine warping functions for functional data defined on the real line and uses the SRSF framework to handle boundary-preserving warping for functional data defined on a specific interval. Main reference for the k-means algorithm: Sangalli L.M., Secchi P., Vantini S., Vitelli V. (2010) "k-mean alignment for curve clustering" <doi:10.1016/j.csda.2009.12.008>. Main reference for the SRSF framework: Tucker, J. D., Wu, W., & Srivastava, A. (2013) "Generative models for functional data using phase and amplitude separation" <doi:10.1016/j.csda.2012.12.001>.

fdacluster

check-standard test-coverage codecov pkgdown CRANstatus

The fdacluster package provides implementations of the $k$-means, hierarchical agglomerative and DBSCAN clustering methods for functional data. Variability in functional data is intrinsically divided into three components: amplitude, phase and ancillary variability. The first two sources of variability can be captured with a dedicated statistical analysis that integrates a curve alignment step. The $k$-means and HAC algorithms implemented in fdacluster provide clustering structures that are based either on ampltitude variation (default behavior) or phase variation. This is achieved by jointly performing clustering and alignment of a functional data set. The three main related functions are fdakmeans() for the $k$-means, fdahclust() for HAC and fdadbscan() for DBSCAN. The methods handle multivariate codomains.

Installation

You can install the official version from CRAN via:

install.packages("fdacluster")

or you can opt to install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("astamm/fdacluster")

Example

Data set

Let us consider the following simulated example of $30$ $1$-dimensional curves:

Looking at the data set, it seems that we shall expect $3$ groups if we aim at clustering based on phase variability but probably only $2$ groups if we search for a clustering structure based on amplitude variability.

$k$-means based on amplitude variability

We can perform $k$-means clustering based on amplitude variability as follows:

out1 <- fdakmeans(
  simulated30$x,
  simulated30$y,
  seeds = c(1, 21),
  n_clusters = 2,
  centroid_type = "mean",
  warping_class = "affine",
  metric = "pearson", 
  cluster_on_phase = FALSE
)
#> Information about the data set:
#>  - Number of observations: 30
#>  - Number of dimensions: 1
#>  - Number of points: 200
#> 
#> Information about cluster initialization:
#>  - Number of clusters: 2
#>  - Initial seeds for cluster centers:          1        21
#> 
#> Information about the methods used within the algorithm:
#>  - Warping method: affine
#>  - Center method: mean
#>  - Dissimilarity method: pearson
#>  - Optimization method: bobyqa
#> 
#> Information about warping parameter bounds:
#>  - Warping options:    0.1500   0.1500
#> 
#> Information about convergence criteria:
#>  - Maximum number of iterations: 100
#>  - Distance relative tolerance: 0.001
#> 
#> Information about parallelization setup:
#>  - Number of threads: 1
#>  - Parallel method: 0
#> 
#> Other information:
#>  - Use fence to robustify: 0
#>  - Check total dissimilarity: 1
#>  - Compute overall center: 0
#> 
#> Running k-centroid algorithm:
#>  - Iteration #1
#>    * Size of cluster #0: 20
#>    * Size of cluster #1: 10
#>  - Iteration #2
#>    * Size of cluster #0: 20
#>    * Size of cluster #1: 10
#> 
#> Active stopping criteria:
#>  - Memberships did not change.

All of fdakmeans(), fdahclust() and fdadbscan() functions returns an object of class caps (for Clustering with Amplitude and Phase Separation) for which S3 specialized methods of ggplot2::autoplot() and graphics::plot() have been implemented. Therefore, we can visualize the results simply with:

plot(out1, type = "amplitude")
plot(out1, type = "phase")

$k$-means based on phase variability

We can perform $k$-means clustering based on phase variability only by switch the cluster_on_phase argument to TRUE:

out2 <- fdakmeans(
  simulated30$x,
  simulated30$y,
  seeds = c(1, 11, 21),
  n_clusters = 3,
  centroid_type = "mean",
  warping_class = "affine",
  metric = "pearson", 
  cluster_on_phase = TRUE
)
#> Information about the data set:
#>  - Number of observations: 30
#>  - Number of dimensions: 1
#>  - Number of points: 200
#> 
#> Information about cluster initialization:
#>  - Number of clusters: 3
#>  - Initial seeds for cluster centers:          1        11        21
#> 
#> Information about the methods used within the algorithm:
#>  - Warping method: affine
#>  - Center method: mean
#>  - Dissimilarity method: pearson
#>  - Optimization method: bobyqa
#> 
#> Information about warping parameter bounds:
#>  - Warping options:    0.1500   0.1500
#> 
#> Information about convergence criteria:
#>  - Maximum number of iterations: 100
#>  - Distance relative tolerance: 0.001
#> 
#> Information about parallelization setup:
#>  - Number of threads: 1
#>  - Parallel method: 0
#> 
#> Other information:
#>  - Use fence to robustify: 0
#>  - Check total dissimilarity: 1
#>  - Compute overall center: 0
#> 
#> Running k-centroid algorithm:
#>  - Iteration #1
#>    * Size of cluster #0: 10
#>    * Size of cluster #1: 10
#>    * Size of cluster #2: 10
#>  - Iteration #2
#>    * Size of cluster #0: 10
#>    * Size of cluster #1: 10
#>    * Size of cluster #2: 10
#> 
#> Active stopping criteria:
#>  - Memberships did not change.

We can inspect the result:

plot(out2, type = "amplitude")
plot(out2, type = "phase")

We can perform similar analyses using HAC or DBSCAN instead of $k$-means. The fdacluster package also provides visualization tools to help choosing the optimal number of cluster based on WSS and silhouette values. This can be achieved by using a combination of the functions compare_caps() and plot.mcaps().

Metadata

Version

0.3.0

License

Unknown

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows