MyNixOS website logo
Description

Learn Clustering Techniques Through Examples and Code.

Clustering methods, which (if asked) can provide step-by-step explanations of the algorithms used, as described in Ezugwu et. al., (2022) <doi:10.1016/j.engappai.2022.104743>; and datasets to test them on, which highlight the strengths and weaknesses of each technique, as presented in the clustering section of 'scikit-learn' (Pedregosa et al., 2011) <https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html>.

clustlearn

The goal of clustlearn is to provide a set of functions to perform clustering analysis along with comprehensive explanations of the algorithms, their pros and cons, and their applications.

Installation

You can install the released version of clustlearn from CRAN with:

install.packages("clustlearn")

You can install the development version of clustlearn from GitHub with:

devtools::install_github("Ediu3095/clustlearn")

Example

This is a basic example which shows you how to cluster a dataset:

# # Load the clustlearn package
# library(clustlearn)

# Perform the clustering (the clustlearn:: prefix is not necessary)
cl <- clustlearn::dbscan(clustlearn::db1, 0.3)

# Plot the results
out <- cl$cluster == 0
plot(clustlearn::db1[!out, ], col = cl$cluster[!out], pch = 20, asp = 1)
points(clustlearn::db1[out, ], col = max(cl$cluster) + 1, pch = 4, lwd = 2)

This is yet another basic example which shows you how to see the step-by-step procedure of the clustering algorithm:

# # Load the clustlearn package
# library(clustlearn)

# Perform the clustering (the clustlearn:: prefix is not necessary)
# The details argument is set to TRUE to see the step-by-step procedure
cl <- clustlearn::agglomerative_clustering(
  clustlearn::db5[1:6, ],
  'single',
  details = TRUE,
  waiting = FALSE
)
## ________________________________________________________________________________

## EXPLANATION:

## 

## The Agglomerative Hierarchical Clustering algorithm defines a clustering hierarc

## hy for a dataset following a `n` step process, which repeats until a single clus

## ter remains:

## 

##     1. Initially, each object is assigned to its own cluster. The matrix of dist

##     ances between clusters is computed.

##     2. The two clusters with closest proximity will be joined together and the p

##     roximity matrix updated. This is done according to the specified proximity.

##     This step is repeated until a single cluster remains.

## 

## The definitions of proximity considered by this function are:

## 

##     1. `single`. Defines the proximity between two clusters as the distance betw

##     een the closest objects among the two clusters. It produces clusters where e

##     ach object is closest to at least one other object in the same cluster. It i

##     s known as SLINK, single-link or minimum-link.

##     2. `complete`. Defines the proximity between two clusters as the distance be

##     tween the furthest objects among the two clusters. It is known as CLINK, com

##     plete-link or maximum-link.

##     3. `average`. Defines the proximity between two clusters as the average dist

##     ance between every pair of objects, one from each cluster. It is also known

##     as UPGMA or average-link.

## 

## ________________________________________________________________________________

## STEP 1:

## 

## Initially, each object is assigned to its own cluster. This leaves us with the f

## ollowing clusters:

## CLUSTER #-1 (size: 1)
##           x         y
## 1 -1.578117 -1.292868
## CLUSTER #-2 (size: 1)
##           x        y
## 2 0.7027994 1.193823
## CLUSTER #-3 (size: 1)
##           x        y
## 3 0.7854535 1.191428
## CLUSTER #-4 (size: 1)
##           x           y
## 4 0.6757613 -0.04002442
## CLUSTER #-5 (size: 1)
##           x         y
## 5 0.8484305 0.2230609
## CLUSTER #-6 (size: 1)
##          x         y
## 6 0.515489 0.3014147

## 

## The matrix of distances between clusters is computed:

## Distances:
##       -1    -2    -3    -4    -5
## -2 3.374                        
## -3 3.429 0.083                  
## -4 2.579 1.234 1.236            
## -5 2.861 0.982 0.970 0.315      
## -6 2.632 0.912 0.930 0.377 0.342

## 

## ________________________________________________________________________________

## STEP 2:

## 

## The two clusters with closest proximity are identified:

## Clusters:
## CLUSTER #-2 (size: 1)
## CLUSTER #-3 (size: 1)
## Proximity:
## [1] 0.08268877

## 

## They are merged into a new cluster:

## CLUSTER #1 (size: 2) [CLUSTER #-2 + CLUSTER #-3]

## 

## The proximity matrix is updated. To do so the rows/columns of the merged cluster

## s are removed, and the rows/columns of the new cluster are added:

## Distances:
##       -1    -4    -5    -6
## -4 2.579                  
## -5 2.861 0.315            
## -6 2.632 0.377 0.342      
## 1  3.374 1.234 0.970 0.912

## 

## ________________________________________________________________________________

## STEP 3:

## 

## The two clusters with closest proximity are identified:

## Clusters:
## CLUSTER #-4 (size: 1)
## CLUSTER #-5 (size: 1)
## Proximity:
## [1] 0.3146881

## 

## They are merged into a new cluster:

## CLUSTER #2 (size: 2) [CLUSTER #-4 + CLUSTER #-5]

## 

## The proximity matrix is updated. To do so the rows/columns of the merged cluster

## s are removed, and the rows/columns of the new cluster are added:

## Distances:
##       -1    -6     1
## -6 2.632            
## 1  3.374 0.912      
## 2  2.579 0.342 0.970

## 

## ________________________________________________________________________________

## STEP 4:

## 

## The two clusters with closest proximity are identified:

## Clusters:
## CLUSTER #-6 (size: 1)
## CLUSTER #2 (size: 2)
## Proximity:
## [1] 0.342037

## 

## They are merged into a new cluster:

## CLUSTER #3 (size: 3) [CLUSTER #-6 + CLUSTER #2]

## 

## The proximity matrix is updated. To do so the rows/columns of the merged cluster

## s are removed, and the rows/columns of the new cluster are added:

## Distances:
##      -1     1
## 1 3.374      
## 3 2.579 0.912

## 

## ________________________________________________________________________________

## STEP 5:

## 

## The two clusters with closest proximity are identified:

## Clusters:
## CLUSTER #1 (size: 2)
## CLUSTER #3 (size: 3)
## Proximity:
## [1] 0.9118542

## 

## They are merged into a new cluster:

## CLUSTER #4 (size: 5) [CLUSTER #1 + CLUSTER #3]

## 

## The proximity matrix is updated. To do so the rows/columns of the merged cluster

## s are removed, and the rows/columns of the new cluster are added:

## Distances:
##      -1
## 4 2.579

## 

## ________________________________________________________________________________

## STEP 6:

## 

## The two clusters with closest proximity are identified:

## Clusters:
## CLUSTER #-1 (size: 1)
## CLUSTER #4 (size: 5)
## Proximity:
## [1] 2.578678

## 

## They are merged into a new cluster:

## CLUSTER #5 (size: 6) [CLUSTER #-1 + CLUSTER #4]

## 

## The proximity matrix is updated. To do so the rows/columns of the merged cluster

## s are removed, and the rows/columns of the new cluster are added:

## Distances:
## dist(0)

## 

## ________________________________________________________________________________

## RESULTS:

## 

## Since all clusters have been merged together, the final clustering hierarchy is:

## (Check the plot for the dendrogram representation of the hierarchy)

## 

## ________________________________________________________________________________

Metadata

Version

1.0.0

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows