MyNixOS website logo

Performs Tests for Cluster Tendency of a Data Set.

Test for cluster tendency (clusterability) of a data set. The methods implemented - reducing the data set to a single dimension using principal component analysis or computing pairwise distances, and performing a multimodality test like the Dip Test or Silverman's Critical Bandwidth Test - are described in Adolfsson, Ackerman, and Brownstein (2019) <doi:10.1016/j.patcog.2018.10.026>. Such methods can inform whether clustering algorithms are appropriate for a data set.

Clusterability R Package

The clusterability package tests for cluster tendancy of a dataset. Results of these tests can inform whether clustering algorithms are appropriate for the data.


You can install the released version of clusterability from CRAN with:


If you would prefer to use a newer version of clusterability not yet available on CRAN, it can be downloaded as a binary package from this repository and installed locally. Documentation on this process can be found on the R project website.


This demonstrates the use of the clusterabilitytest function to determine if the four numeric variables of the iris dataset have a natural cluster tendency.


iris_numeric <- iris[,c(1:4)]
iris_result <- clusterabilitytest(iris_numeric, "dip")


Clusterability Test

Data set name: iris_numeric
Your data set has 150 observation(s) and 4 variable(s).
There were no missing values. Your data set is complete.

Data Reduced Using: PCA

Results: Dip Test of Unimodality

Null Hypothesis: number of modes = 1
Alternative Hypothesis: number of modes > 1
p-value: 0 
Dip statistic: 0.107841006841301 

Test Options Used

Default values for the optional parameters were used. To learn more about customizing the behavior of the clusterabilitytest, please see the R documentation.

Required Parameters

The data and test parameters are required when calling the clusterabilitytest() function.


The dataset to be used in the test. Internally, the as.matrix R function is used to coerce the data argument, so the data argument should be a dataframe, matrix, or other object that can be coerced to a matrix. The dataset should consist only of numeric values.


The test to be performed. Valid values are "dip", which will perform the Dip Test of Unimodality, or "silverman", which will perform Silverman's Critical Bandwidth test.

Additional Parameters

The following parameters are optional and can be used to further customize the behavior of the clusterabilitytest() function.


The dimension reduction technique to be used to reduce the data to a unidimensional dataset.

  • Principal Component Analysis can be used by specifying the value "pca". This is the default behavior.
  • Pairwise Distances can be used by specifying the value "distance".
  • If the data argument is a one-dimensional data set, the "none" option can be used.

If using pairwise distances as the dimension reduction technique, this is the metric to be used in computing the distances. The default is "euclidean". See the documentation for the clusterabilitytest() function for a list of the available metrics.


If using pairwise distances for dimension reduction, this is how the variables should be standardized before computing the distances. The default is "std", which standardizes each variable to have mean 0 and standard deviation 1. See the documentation for a list of the available standardization methods.


If using PCA as the dimension reduction technique, this is a logical determines if the variables are shifted to be zero centered. The default is TRUE.


If using PCA for dimension reduction, this is a logical value that determines if the variables are scaled to have unit variance. The default is TRUE.


This is a logical value indicating if the data argument is a distance matrix. This is FALSE by default. If it is TRUE, then the lower triangular portion of data will be extracted and used.


This is a logical value indicating if a complete case analysis should be performed. This is FALSE by default. Missing data must be removed before a test can be performed, which can be done either manually by the user or by specifying TRUE for the completecase argument.

Additional Parameters and Details

Parameters to customize the Dip Test are prefixed with d_ and the Silverman Test with s_. Documentation for these parameters, along with additional details for the parameters described above, is provided in the documentation for clusterabilitytest(), which can be found by executing the following command:


Documentation is also available in the accompanying paper.

Supplemental Files


This contains code to test the relative computational performance of each test and dimension reduction combination.


This contains code to replicate the examples in the accompanying paper.


This contains code to replicate the plots provided in the accompanying paper.





Platforms (75)

Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows