MyNixOS website logo
Description

Histogram-Valued Data Analysis.

In the framework of Symbolic Data Analysis, a relatively new approach to the statistical analysis of multi-valued data, we consider histogram-valued data, i.e., data described by univariate histograms. The methods and the basic statistics for histogram-valued data are mainly based on the L2 Wasserstein metric between distributions, i.e., the Euclidean metric between quantile functions. The package contains unsupervised classification techniques, least square regression and tools for histogram-valued data and for histogram time series. An introducing paper is Irpino A. Verde R. (2015) <doi: 10.1007/s11634-014-0176-4>.

HistDAWass

(Histogram-valued Data analysis using Wasserstein

metric)

In this document we describe the main features of the HistDAWass package. The name is the acronym for Histogram-valued Data analysis using Wasserstein metric. The implemented classes and functions are related to the analysis of data tables containing histograms in each cell instead of the classical numeric values.

In this document we describe the main features of the HistDAWass package. The name is the acronym for Histogram-valued Data analysis using Wasserstein metric. The implemented classes and functions are related to the anlysis of data tables containing histograms in each cell instead of the classical numeric values.

What is the L2 Wasserstein metric?

given two probability density functions f and g, each one has a cumulative distribution function F and G and thei respectively quantile functions (the inverse of a cumulative distribution function) Qf and Qg. The L2 Wasserstein distance is

The implemented classes are those described in the following table

Classwrapper function for initializingDescription
distributionHdistributionH(x,p)A class describing a histogram distibution
MatHMatH(x, nrows, ncols,rownames,varnames, by.row )A class describing a matrix of distributions
TdistributionHTdistributionH()A class derived from distributionH equipped with a timestamp or a time window
HTSHTS()A class describing a Histgram-valued time series
library(HistDAWass)
mydist=distributionH(x=c(0,1,2),p=c(0,0.3,1))

From raw data to histograms

data2hist functions

Basic statistics for a distributionH (A histogram)

  • mean

    • the mean of a histogram
  • standard deviation

    • the standard deviation of a histogram
  • skewness

    • the third standardized moment of a histogram
  • kurthosis

    • the fourth standardized momemt of a histogram

Basic statistics for a MatH (A matrix of histogrm-valued data)

  • The average hisogram of a column

    • It is an average histogram that minimizes the sum of squared Wasserstein distances.
  • The standard deviation of a variable

    • It is a number that measures the dispersion of a set of histograms.
  • The covarince matrix of a MatH

    • It is a matrix that measures the covariances into a set of hitogram variables.
  • The correlation matrix of a MatH

    • It is a matrix that measures the correlation into a set of hitogram variables.

Visualization > plot of a distributionH

plot of a MatH

plot of a HTS

Data Analysis methods

Clustering

  • Kmeans

  • Adaptive distance based Kmeans

  • Fuzzy cmeans

  • Fuzzy cmeans based on adaptive Wasserstein distances

  • Kohonen batch self organizing maps

  • Kohonen batch self organizing maps with Wasserstein adaptive distances

  • Hierarchical clustering

Dimension reduction techniques

  • Principal components analysis of a single histogram variable

  • Principal components analysis of a set of histogram variables (using Multiple Factor Analysis)

Methods for Histogram time series

Smoothing

  • Moving averages

  • Exponential smoothing

Forecasting

  • KNN prediction of histogram time series

Linear regression

A two component model for a linear regression using Least Square method.

Metadata

Version

1.0.8

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows