MyNixOS website logo
Description

Visualizations of High-Dimensional Data.

Gives access to data visualisation methods that are relevant from the data scientist's point of view. The flagship idea of 'DataVisualizations' is the mirrored density plot (MD-plot) for either classified or non-classified multivariate data published in Thrun, M.C. et al.: "Analyzing the Fine Structure of Distributions" (2020), PLoS ONE, <DOI:10.1371/journal.pone.0238835>. The MD-plot outperforms the box-and-whisker diagram (box plot), violin plot and bean plot and geom_violin plot of ggplot2. Furthermore, a collection of various visualization methods for univariate data is provided. In the case of exploratory data analysis, 'DataVisualizations' makes it possible to inspect the distribution of each feature of a dataset visually through a combination of four methods. One of these methods is the Pareto density estimation (PDE) of the probability density function (pdf). Additionally, visualizations of the distribution of distances using PDE, the scatter-density plot using PDE for two variables as well as the Shepard density plot and the Bland-Altman plot are presented here. Pertaining to classified high-dimensional data, a number of visualizations are described, such as f.ex. the heat map and silhouette plot. A political map of the world or Germany can be visualized with the additional information defined by a classification of countries or regions. By extending the political map further, an uncomplicated function for a Choropleth map can be used which is useful for measurements across a geographic area. For categorical features, the Pie charts, slope charts and fan plots, improved by the ABC analysis, become usable. More detailed explanations are found in the book by Thrun, M.C.: "Projection-Based Clustering through Self-Organization and Swarm Intelligence" (2018) <DOI:10.1007/978-3-658-20540-9>.

CRAN_Status_Badge CRAN RStudio mirror downloads CRAN RStudio mirror downloads

DataVisualizations

Table of Contents

1. Introduction
2. Installation
3. Additional Ressources
4. References

1. Introduction

“Exploratory data analysis is detective work” [Tukey, 1977, p.2]. This package enables the user to use graphical tools to find ‘quantitative indications’ enabling a better understanding of the data at hand. “As all detective stories remind us, many of the circumstances surrounding a crime are accidental or misleading. Equally, many of the indications to be discerned in bodies of data are accidental or misleading [Tukey, 1977, p.3].” The solution is to compare many different graphical tools with the goal to find an agreement or to generate an hypothesis and then to confirm it with statistical methods. This package serves as a starting point.

The DataVisualizations package offers various visualization methods and graphical tools for data analysis, including:

  • Synoptic visualizations of data: Synoptic visualization methods such as Pixelmatrices.
  • Distribution analysis and visualization: Visual distribution analysis for one- or higher dimensional data, including MD Plots and PDE (Pareto Density Estimation).
  • Spatial visualizations: Spatial visualizations such as choropleth maps.
  • Visual analysis of Clusters, Correlation, Distances and Projections: Visual analysis of clusters such as Silhouette plots, or visual projection analysis with the Shepard diagrams.
  • Other visualizations: For example ABC-Barplots, Errorplots and more.

Examples of synoptic visualizations:

Get synoptic view of the data, with a pixelmatrix

data("Lsun3D")
Pixelmatrix(Lsun3D$Data)

The Pixelmatrix can be used as a shortcut in visualizing correlations between many variables

n=nrow(Lsun3D$Data)
Data=cbind(Lsun3D$Data,runif(n),rnorm(n),rt(n,2),rlnorm(n),rchisq(100,2))
Header=c('x','y','z','uniform','gauss','t','log-normal','chi')
cc=cor(Data,method='spearman')
diag(cc)=0
Pixelmatrix(cc,YNames = Header,XNames = Header,main = 'Spearman Coeffs')

Examples of distribution analysis:

                

InspectVariables provides a summary of the most important plots for one dimensional distribution analysis such as histogram, continuous data density estimation, QQ-Plot, and Boxplot:

data(ITS)
InspectVariable(ITS)

The MD Plot can be used for visualizing the densities of several variables, the MD Plot combines the syntax of ggplot2 with the Pareto density estimation and additional functionality useful from the Data Scientist’s point of view:

data(MTY)
Data=cbind(ITS,MTY)
MDplot(Data)+ylim(0,6000)+ggtitle('Two Features with MTY Capped')

Create density scatter plots in 2D:

DensityScatter(ITS, MTY, xlab = 'ITS in EUR', ylab ='MTY in EUR', xlim = c(0,1200), ylim = c(0,15000), main='Pareto Density Estimation indicates Bimodality')

                  

Examples of visual cluster analysis:

The heatmap of the distances, ordered by clusters allows to get a synoptic view over the intra- and intercluster distances. Examples and interpretations of Heatmaps and Silhouette plots are presented in [Thrun 2018A, 2018B].

data("Lsun3D")
Heatmap(Lsun3D$Data,Lsun3D$Cls,method = 'euclidean')

Plot Silhuoette plot of clustering:

Silhouetteplot(Lsun3D$Data,Lsun3D$Cls,PlotIt = T)

InputDistances shows the most important plots of the distribution of distances of the data. The distance distribution in the input space can be bimodal, indicating a distinction between the inter- versus intracluster distances. This can serve as an indication of distance-based cluster structures (see [Thrun, 2018A, 2018B]).

InspectDistances(Lsun3D$Data,method="euclidean")

2. Installation

Installation using CRAN

Install automatically with all dependencies via

install.packages("DataVisualizations",dependencies = T)

Installation using Github

Please note, that dependecies have to be installed manually.

remotes::install_github("Mthrun/DataVisualizations")

Installation using R Studio

Please note, that dependecies have to be installed manually.

Tools -> Install Packages -> Repository (CRAN) -> DataVisualizations

Tutorial Examples

The tutorial with several examples can be found on in the vignette on CRAN.

4. References

[Thrun, 2018A] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, https://doi.org/10.1007/978-3-658-20540-9, 2018.

[Thrun, 2018B] Thrun, M. C.: Cluster Analysis of Per Capita Gross Domestic Products, Entrepreneurial Business and Economics Review (EBER), Vol. 7(1), pp. 217-231, https://doi.org/10.15678/EBER.2019.070113, 2019.

[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A.: Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.

[Thrun et al., 2020] Thrun, M. C., Gehlert, T. & Ultsch, A.: Analyzing the Fine Structure of Distributions, PLoS ONE, Vol. 15(10), pp. 1-66, DOI 10.1371/journal.pone.0238835, 2020.

[Tukey, 1977] Tukey, J. W.: Exploratory data analysis, United States Addison-Wesley Publishing Company, ISBN: 0-201-07616-0, 1977.

Metadata

Version

1.3.2

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows