MyNixOS website logo
Description

Optimal Histogram Binning Using Shimazaki-Shinomoto Method.

Implements the Shimazaki-Shinomoto method for optimizing the bin width of a histogram. This method minimizes the mean integrated squared error (MISE) and features a 'C++' backend for high performance and shift-averaging to remove edge-position bias. Ideally suits for time-dependent rate estimation and identifying intrinsic data structures. Supports both 1D and 2D data distributions. For more details see Shimazaki and Shinomoto (2007) "A Method for Selecting the Bin Size of a Time Histogram" <doi:10.1162/neco.2007.19.6.1503>.

CRANstatus CRANdownloads CRANdownloads CRANdownloads CRANdownloads AskDeepWiki

sshist

The sshist package implements the Shimazaki-Shinomoto method for finding the optimal number of bins in histograms.

Unlike the standard Freedman-Diaconis rule (used by default in ggplot2), this method minimizes the expected L2 loss function between the histogram and the unknown underlying density function. It is particularly effective for:

  • Time-dependent rate estimation (PSTH).
  • Identifying intrinsic data structures in multimodal distributions.
  • Optimizing both 1D and 2D data binnings.

Installation

# stable version from CRAN
install.packages("sshist")

You can install the development version of sshist like so:

# install.packages("devtools")
devtools::install_github("celebithil/sshist")

Example 1: Basic 1D Usage

Here is a basic example using the Old Faithful Geyser data.

library(sshist)

# Load data
data(faithful)
x_data <- faithful$waiting

# Calculate optimal binning
res <- sshist(x_data)

# Print summary
print(res)
#> Shimazaki-Shinomoto Histogram Optimization
#> ------------------------------------------
#> Optimal Bins (N): 21 
#> Bin Width (D):    2.524 
#> Cost Minimum:     -8.525

hist(res$data, breaks=res$edges, freq=FALSE,
       main=paste("Optimal Hist (N=", res$opt_n, ")"),
       col="lightblue", border="white", xlab="Data")

Example 2: Integration with ggplot2

sshist calculates the optimal parameters, which you can easily pass to ggplot2.

library(ggplot2)

# Create a data frame
df <- data.frame(waiting = x_data)

ggplot(df, aes(x = waiting)) +
  geom_histogram(breaks = res$edges, fill = "#69b3a2", color = "white", alpha = 0.8) +
  geom_rug(alpha = 0.1) +
  ggtitle(paste0("Shimazaki-Shinomoto Optimization (N = ", res$opt_n, ")")) +
  theme_minimal()

Example 3: 2D Histogram Optimization

For bivariate data, sshist_2d finds the optimal binning for both X and Y axes simultaneously.

# Get bimodal 2D data
y_data <- faithful$eruptions

# Optimize
res2d <- sshist_2d(x_data, y_data)

# Print summary
print(res2d)
#> Shimazaki-Shinomoto 2D Histogram Optimization
#> ---------------------------------------------
#> Optimal Bins X:   9 
#> Optimal Bins Y:   20 
#> Bin Width X:      5.889 
#> Bin Width Y:      0.175 
#> Cost Minimum:     -5.717

Example 4: 2D Optimization with ggplot2

You can easily use the optimized bin counts from sshist_2d in ggplot2 by passing them to the bins argument in geom_bin2d.

# We use the 'res2d' object calculated in Example 3
# containing optimal bins for Old Faithful data

res2d <- sshist_2d(faithful$waiting, faithful$eruptions )

ggplot(faithful, aes(waiting, eruptions)) +
  geom_bin2d(bins = c(res2d$opt_nx, res2d$opt_ny)) +
  scale_fill_distiller(palette = "Spectral") +
  labs(
    title = "Optimal 2D Binning (Old Faithful)",
    subtitle = paste0("Shimazaki-Shinomoto Method: ", 
                      res2d$opt_nx, " x ", res2d$opt_ny, " bins"),
    x = "Waiting Time (min)",
    y = "Eruption Duration (min)"
  ) +
  theme(axis.text = element_text(size = 12),
        title = element_text(size = 12,face="bold"),
        panel.border = element_rect(linewidth = 2, color = "black", fill = NA))

References

Metadata

Version

0.1.3

License

Unknown

Platforms (78)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows