MyNixOS website logo
Description

Obtaining Star Databases from Flat Tables.

Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure. Frequently, the operations to be performed aim to transform a flat table into a ROLAP (Relational On-Line Analytical Processing) star database. The main objective of the package is to allow the definition of these transformations easily. The implementation of the multidimensional database obtained can be exported to work with multidimensional analysis tools on spreadsheets or relational databases.

rolap rolap website

CRANstatus R-CMD-check Codecov testcoverage Downloads

The aim of the multidimensional data model is organize data for supporting data analysis. Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure.

Transformations can be carried out using professional ETL (Extract, Transform and Load) tools. Recently, tools aimed at end users have emerged, which are also aimed at performing transformation operations. All these tools are very useful to carry out the transformation process, they provide a development environment to define the transformation operations in a general way.

Frequently, the operations to be performed aim to transform a set of tables with data that comes from operational systems into a ROLAP (Relational On-Line Analytical Processing) star database, made up of fact and dimension tables, which implements a multidimensional system. With the tools mentioned above, this transformation can be carried out, but it requires a lot of work. We are not aware of any tools with operations designed to specifically support this transformation process.

The goal of rolap is to define transformations that allow us to easily obtain ROLAP star databases, composed by fact and dimension tables, from operational tables, to be able to export them in various formats to be used by OLAP query tools and also be able to exploit them from R.

The rolap package builds on experience with the starschemar package on which it is based. It incorporates the main functionalities for which starschemar was initially intended. In particular, the data model and the way of treating role-playing and role dimensions have been changed, so that it is easier to add future extensions. It has been designed in such a way that migration from starschemar is practically immediate.

Installation

You can install the released version of rolap from CRAN with:

install.packages("rolap")

And the development version from GitHub with:

devtools::install_github("josesamos/rolap")

Example

To illustrate how the package works we will use a small part of the Deaths in 122 U.S. cities - 1962-2016. 122 Cities Mortality Reporting System data set in the form of a flat table, available in the package in the ft_num variable, shown below.

YearWEEKWeek Ending DateREGIONStateCityPneumonia and Influenza DeathsAll Deaths<1 year (all cause deaths)1-24 years (all cause deaths)25-44 years45-64 years (all cause deaths)65+ years (all cause deaths)
1962201/13/19621MABoston112701481170167
1962401/27/19621MABoston12285227873175
1963401/26/19631MABoston1027611141767167
1964301/18/19641MABoston133251772490187
1964602/08/19641MABoston92441391461147
1962301/20/19621CTBridgeport2405131021
1962502/03/19621CTBridgeport5466031522
1962802/24/19621CTBridgeport2452021625
1963401/26/19631CTBridgeport2464031029
1964502/01/19641CTBridgeport8453121128
1962903/03/19621MACambridge439102729
1964201/11/19641MACambridge731102919
1964502/01/19641MACambridge627200817
1964902/29/19641MACambridge026002816
1962401/27/19621CTHartford1477101425
1962702/17/19621CTHartford4573132129
1963301/19/19631CTHartford2667231836
1963702/16/19631CTHartford4776171944
1963802/23/19631CTHartford6493231427
1964201/11/19641CTHartford3537021628

Star database definition

The transformation to obtain a star database from the table using rolap package is as follows:

library(rolap)

where <- dimension_schema(name = "Where",
                          attributes = c("REGION",
                                         "State",
                                         "City"))

s <- star_schema() |>
  define_facts(name = "MRS Cause",
               measures = c("Pneumonia and Influenza Deaths",
                            "All Deaths")) |>
  define_dimension(name = "When",
                   attributes = c("Year")) |>
  define_dimension(where)

db <- star_database(s, ft_num) |>
  snake_case()

The dimension and fact schemas can be defined as variables (where) to be reused or directly in the star schema definition. To make it easier to work in a database environment we transform the table field names to snake case.

Geographic attributes can be associated with vector layers of geographic information.

db <- db |>
  define_geoattribute(
    dimension = "where",
    attribute = "state",
    from_layer = us_layer_state,
    by = "STUSPS"
  )

Result

To better appreciate the result, let’s export it as a tibble list. The tables of dimensions and facts of the obtained star database are shown below.

ls <- db |>
  as_tibble_list()

for (i in 1:length(ls)) {
  pander::pandoc.table(ls[[i]], split.table = Inf)
}
when_keyyear
11962
21963
31964
where_keyregionstatecity
11CTBridgeport
21CTHartford
31MABoston
41MACambridge
when_keywhere_keypneumonia_and_influenza_deathsall_deathsnrow_agg
1191313
1251042
13235552
144391
212461
22121923
23102761
318451
323531
33225692
3413843

The result can be exported in various formats such as csv and xslx files or to a relational database, as shown below.

con <- DBI::dbConnect(RSQLite::SQLite())

db |>
  as_rdb(con)

DBI::dbListTables(con)
#> [1] "mrs_cause" "when"      "where"

DBI::dbDisconnect(con)

In addition to exporting it, we can formulate multidimensional queries from R. Below is an example.

sq <- star_query(db) |>
  select_dimension(name = "where",
                   attributes = "state") |>
  select_dimension(name = "when",
                   attributes = "year") |>
  select_fact(name = "mrs_cause",
              measures = "all_deaths") |>
  filter_dimension(name = "when", year >= "1963") |>
  filter_dimension(name = "where", city == "Bridgeport" | city == "Boston")

db_2 <- db |>
  run_query(sq)

The result can be displayed using the pivottabler package.

ft <- db_2 |>
  as_single_tibble_list()
ft_cause <- ft[["mrs_cause"]]

pt <- pivottabler::qpvt(
  ft_cause,
  c("=", "state"),
  c("year"),
  c("Number of Deaths" = "sum(all_deaths)")
)

pt$renderPivot()

We can obtain a geographic information layer that includes it, to use it in R as an object of class sf.

gl <- db_2 |>
  as_geolayer()

l1 <- gl |>
  get_layer()
class(l1)
#> [1] "sf"         "tbl_df"     "tbl"        "data.frame"

title <- gl |>
  get_variable_description("var_1")

plot(sf::st_geometry(l1[, c("var_1")]), axes = TRUE, main = title)
text(
  sf::st_coordinates(sf::st_centroid(sf::st_geometry(l1))),
  labels = paste0(l1$state, ": ", l1$var_1),
  pos = 3,
  cex = 1.5
)

We can also include all geographic instances originally present in the layer.

l2 <- gl |>
  get_layer(keep_all_variables_na = TRUE)

plot(sf::st_shift_longitude(l2[, "var_1"]), axes = TRUE, main = title)

Or export it in GeoPackage format.

f <- gl |>
  as_GeoPackage(dir = tempdir())

sf::st_layers(f)
#> Driver: GPKG 
#> Available layers:
#>   layer_name geometry_type features fields crs_name
#> 1   geolayer       Polygon        2      3   WGS 84
#> 2  variables            NA        2      3     <NA>

We can work with several star databases to form a constellation. It supports the definition of role-playing and role dimensions, as well as incremental refresh operations and automatic deployment on any RDBMS (Relational Database Management System). Examples and detailed information of these functionalities can be found in the documentation and vignettes of the package.

Metadata

Version

2.5.1

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows