MyNixOS website logo
Description

Testbench for Univariate Time Series Cleaning.

A reliable and efficient tool for cleaning univariate time series data. It implements reliable and efficient procedures for automating the process of cleaning univariate time series data. The package provides integration with already developed and deployed tools for missing value imputation and outlier detection. It also provides a way of visualizing large time-series data in different resolutions.

cleanTS

CRANstatus R-CMD-check Lifecycle:stable

cleanTS package focuses on developing a tool for making the process of cleaning large datasets simple and efficient. Currently it solely focuses on data cleaning for univariate time series data. The package is integrated with already developed and deployed tools for missing value imputation. It also provides a way for visualizing data at different resolutions, allowing micro-scale visualization. The ultimate goal is the creation of a handy software tool that deals with the problems, processes, analysis and visualization of big data time series, with minimum human intervention.

  • cleanTS() checks the data for missing and duplicate timestamps, performs missing value imputation and removes anomalies/outliers from the data.

  • animate_interval() splits the data and generates an animated plot.

  • interact_plot() is similar to animate_interval() but creates an interactive plot which provides relatively more control over the animation.

The package can also be used using a shiny application, available at https://mayur1009.shinyapps.io/cleanTS/.

Package Documentation can be found at https://mayur1009.github.io/cleanTS/.

This project was a part of Google Summer of Code 2021.

Installation

# Install release version from CRAN
install.packages("cleanTS")

# Install development version from GitHub
devtools::install_github("Mayur1009/cleanTS")

Example

library(cleanTS)
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo

# Read sunspot.month dataset
data <- timetk::tk_tbl(sunspot.month)
print(data)
#> # A tibble: 3,177 × 2
#>    index     value
#>    <yearmon> <dbl>
#>  1 Jan 1749   58  
#>  2 Feb 1749   62.6
#>  3 Mar 1749   70  
#>  4 Apr 1749   55.7
#>  5 May 1749   85  
#>  6 Jun 1749   83.5
#>  7 Jul 1749   94.8
#>  8 Aug 1749   66.3
#>  9 Sep 1749   75.9
#> 10 Oct 1749   75.5
#> # ℹ 3,167 more rows

# Randomly insert missing values to simulate missing value imputation
set.seed(10)
ind <- sample(nrow(data), 100)
data$value[ind] <- NA

# Create `cleanTS` object
cts <- cleanTS(data, date_format = c("my"))
summary(cts)
#>                  Length Class      Mode     
#> clean_data       5      data.table list     
#> missing_ts       0      POSIXct    numeric  
#> duplicate_ts     0      POSIXct    numeric  
#> imp_methods      4      -none-     character
#> mcar_err         4      data.frame list     
#> mar_err          4      data.frame list     
#> outliers         4      data.table list     
#> outlier_mcar_err 4      data.frame list     
#> outlier_mar_err  4      data.frame list

# Cleaned Data
head(cts$clean_data)
#>          time value missing_type method_used is_outlier
#> 1: 1749-01-01  58.0         <NA>        <NA>      FALSE
#> 2: 1749-02-01  62.6         <NA>        <NA>      FALSE
#> 3: 1749-03-01  70.0         <NA>        <NA>      FALSE
#> 4: 1749-04-01  55.7         <NA>        <NA>      FALSE
#> 5: 1749-05-01  85.0         <NA>        <NA>      FALSE
#> 6: 1749-06-01  83.5         <NA>        <NA>      FALSE

# Genearate animated plot
a <- animate_interval(cts, interval = "10 year")
gen.animation(a, height = 700, width = 900)

# Generate interactive plot
interact_plot(cts, interval = "10 year")
Metadata

Version

0.1.2

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows