MyNixOS website logo
Description

Determining Hierarchical Clustering Easily.

Facilitates hierarchical clustering analysis with functions to read data in 'txt', 'xlsx', and 'xls' formats, apply normalization techniques to the dataset, perform hierarchical clustering and construct scatter plot from principal component analysis to evaluate the groups obtained.

hclusteasy

Lifecycle:experimental CRANstatus R-CMD-check

Overview

The hclusteasy package simplifies HCA (Hierarchical Clustering Analysis) by integrating essential functions from renowned R packages. It enables data reading in .txt, .xlsx and .xls formats using utils and readxl, offers data normalization techniques from the clusterSim, performs hierarchical clustering analysis with the stats and conducts PCA (Principal Component Analysis), plotting the first two components using the stats and factoextra packages.

Packages Used in hclusteasy

For more information about the packages used in the development of hclusteasy, please visit their documentation on CRAN or their repositories on GitHub.

  • utils: Documentation available in RStudio with ?utils
  • readxl: Documentation on CRAN and GitHub
  • clusterSim: Documentation on CRAN
  • stats: Documentation available in RStudio with ?stats
  • factoextra: Documentation on CRAN and GitHub

Table of Contents

Installation

You can install hclusteasy from GitHub using the devtools package:

# Install devtools if you haven't already
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}

# Install the hclusteasy package directly from GitHub
devtools::install_github("tsukubai/hclusteasy")

Introduction

hclusteasy is designed to streamline the process of hierarchical clustering analysis by integrating essential functions from popular R packages. The main goal of this package is to provide an easy-to-use interface for reading data, performing data normalization, conducting hierarchical clustering, and visualizing results through PCA.

Datasets

DatasetDescription
iris_uciThe Iris dataset is a classic dataset used for analysis and machine learning, containing 150 samples of iris flowers from three different species: Iris setosa, Iris versicolor, and Iris virginica. Each sample has four morphological features: sepal length, sepal width, petal length, and petal width, measured in centimeters. Created by Ronald A. Fisher in 1936, the dataset is often used for testing classification algorithms.
wine_uciThe Wine dataset is a well-known dataset used for classification and clustering in machine learning, containing chemical analysis results of wines grown in the same region in Italy but derived from three different cultivars. The dataset comprises 178 samples, each described by 13 continuous attributes such as alcohol content, malic acid, ash, and flavanoids. Created by Forina et al., it is commonly used to test the performance of various classification algorithms.

Functions

FunctionsDescription
read.dataReads data from different formats (txt, xlsx, xls) for analysis.
normalizationApplies data normalization techniques.
hcaPerforms hierarchical clustering analysis on the data using Euclidean distance and returns the generated groups.
pcaPerforms principal component analysis and generates a scatter plot from the first two principal components.

Usage

# Library
library(hclusteasy)

Read iris dataset in xlsx format

# File path: 'iris_uci.xlsx'
file_path <- system.file("extdata",
                         "iris_uci.xlsx",
                         package = "hclusteasy")


# Read iris in xlsx format
iris <- read.data(file_path, col.names = TRUE)


iris[1:3,]
#>   sepal.length sepal.width petal.length petal.width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa

Data normalization

# Remove "Species" column
iris <- iris[,-5]

# Data normalization using Z-score type, by column
irisN <- normalization(iris, type = "n1", norm = "column")

irisN[1:3,]
#>   sepal.length sepal.width petal.length petal.width
#> 1   -0.8976739   1.0286113    -1.336794   -1.308593
#> 2   -1.1392005  -0.1245404    -1.336794   -1.308593
#> 3   -1.3807271   0.3367203    -1.393470   -1.308593

Generate hierarchical groups

# Perform hierarchical clustering analysis
# and return hierarchical groups
g <- hca(irisN, method = "complete", num.groups = 3)

g[1:10]
#>  1  2  3  4  5  6  7  8  9 10 
#>  1  1  1  1  1  1  1  1  1  1

Plot principal component analysis

# Plot PCA considering the generated hierarchical groups
pca(irisN, groups = g)
Metadata

Version

0.1.0

License

Unknown

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows