Description

Set Operations on Time Intervals.

Description

Implements the phinterval vector class for representing time spans that may contain gaps (disjoint intervals) or be empty. This class generalizes the 'lubridate' package's interval class to support vectorized set operations (intersection, union, difference, complement) that always return a valid time span, even when disjoint or empty intervals are created.

README.md

cran.r-project.org

phinterval

phinterval is a package for representing and manipulating time spans that may contain gaps. It implements the <phinterval> (think “potentially-holey-interval”) vector class, designed as an extension of the {lubridate}<Interval>, to represent spans of time that are contiguous, disjoint, empty, or missing.

Functionality for manipulating these spans includes:

Merging overlapping or adjacent intervals into non-overlapping sets of time spans.
Performing set operations: union, intersection, difference, and complement.
Testing whether time spans, dates, or times fall within one another or overlap.

Installation

Install the released version from CRAN with:

install.packages("phinterval")

You can install the development version of phinterval from GitHub with:

# install.packages("pak")
pak::pak("EthanSansom/phinterval")

Usage

Each element of a <phinterval> vector is a set of non-overlapping and non-adjacent intervals. For scalar intervals (one span per element), phinterval() works like lubridate::interval():

library(phinterval)
library(lubridate, warn.conflicts = FALSE)

# Create scalar phintervals (equivalent to interval())
phinterval(
  start = ymd(c("2000-01-01", "2000-01-03", "2000-01-04")),
  end = ymd(c("2000-01-02", "2000-01-05", "2000-01-09"))
)
#> <phinterval<UTC>[3]>
#> [1] {2000-01-01--2000-01-02} {2000-01-03--2000-01-05} {2000-01-04--2000-01-09}

To create phintervals with multiple disjoint spans per element, use the by argument to group intervals. Overlapping or adjacent spans within each group are automatically merged:

# Create a phinterval with disjoint spans using the by argument
phint <- phinterval(
  start = ymd(c("2000-01-03", "2000-01-01", "2000-01-04")),
  end = ymd(c("2000-01-05", "2000-01-02", "2000-01-09")),
  by = c(1, 2, 2)
)
phint
#> <phinterval<UTC>[2]>
#> [1] {2000-01-03--2000-01-05}                        
#> [2] {2000-01-01--2000-01-02, 2000-01-04--2000-01-09}

Graphically, the elements of phint are represented as:

In most cases, a <phinterval> vector will appear as the result of manipulating <Interval> vectors. For example, phint_squash() flattens a vector of time spans into a scalar <phinterval>.

jan_1_to_9 <- interval(ymd("2000-01-01"), ymd("2000-01-09"))
jan_1_to_2 <- interval(ymd("2000-01-01"), ymd("2000-01-02"))
jan_3_to_5 <- interval(ymd("2000-01-03"), ymd("2000-01-05"))
jan_4_to_9 <- interval(ymd("2000-01-04"), ymd("2000-01-09"))

ints <- c(jan_1_to_2, jan_3_to_5, jan_4_to_9)
phint_squash(ints)
#> <phinterval<UTC>[1]>
#> [1] {2000-01-01--2000-01-02, 2000-01-03--2000-01-09}

The squashed intervals contain the set of time spans within any of the input intervals, without duplication.

Example: Employment History

The phinterval package is most useful when working with tabular data, such as a longitudinal employment panel.

library(dplyr, warn.conflicts = FALSE)

jobs <- tribble(
  ~name,   ~job_title,             ~start,        ~end,
  "Greg",  "Mascot",               "2018-01-01",  "2018-06-03",
  "Greg",  "Executive Assistant",  "2018-06-10",  "2020-04-01",
  "Shiv",  "Political Consultant", "2017-01-01",  "2019-04-01"
)

employment <- jobs |>
  # Squash overlapping/adjacent intervals into a single phinterval
  group_by(name) |>
  summarize(employed = datetime_squash(ymd(start), ymd(end))) |>
  # Invert the employment timeline to find gaps
  mutate(unemployed = phint_invert(employed))

employment
#> # A tibble: 2 × 3
#>   name  employed                    unemployed              
#>   <chr> <phint<UTC>>                <phint<UTC>>            
#> 1 Greg  {2018-01-01-[2]-2020-04-01} {2018-06-03--2018-06-10}
#> 2 Shiv  {2017-01-01--2019-04-01}    <hole>

<phinterval> column formatting adapts to the available console width. The "[2]" in Greg’s employment interval "{2018-01-01-[2]-2020-04-01}" indicates that his employment history is made up of two disjoint spans, with the first span beginning on 2018-01-01 and the second ending on 2020-04-01. When more space is available, every span is shown explicitly.

employment |> select(name, employed)
#> # A tibble: 2 × 2
#>   name  employed                                        
#>   <chr> <phint<UTC>>                                    
#> 1 Greg  {2018-01-01--2018-06-03, 2018-06-10--2020-04-01}
#> 2 Shiv  {2017-01-01--2019-04-01}

Operations on <phinterval> vectors behave like those on standard intervals. Here, we can see that there was a 7-day gap in Greg’s employment history:

employment |>
  mutate(
    days_employed = employed / ddays(1),
    days_unemployed = unemployed / ddays(1)
  ) |>
  select(name, days_employed, days_unemployed)
#> # A tibble: 2 × 3
#>   name  days_employed days_unemployed
#>   <chr>         <dbl>           <dbl>
#> 1 Greg            814               7
#> 2 Shiv            820               0

phinterval \<-\> lubridate

The <phinterval> class is a generalization of the <Interval> class, meaning any <Interval> can be converted into an equivalent <phinterval> and all phinterval functions accept either <Interval> or <phinterval> inputs. The table below shows the lubridate functions that have drop-in phinterval replacements.

phinterval	lubridate	Returns
`phinterval(start, end)`	`interval(start, end)`	Spans bounded by `start`/`end`
`phint_intersect(x, y)`	`intersect(x, y)`	Times in `x` and `y`
`phint_setdiff(x, y)`	`setdiff(x, y)`	Times in `x`, but not in `y`
`phint_union(x, y)`	`union(x, y)`	Times in `x` or `y`
`phint_start(x)`	`int_start(x)`	The start time of `x`
`phint_end(x)`	`int_end(x)`	The end time of `x`
`phint_length(x)`	`int_length(x)`	The number of seconds in `x`
`phint_overlaps(x, y)`	`int_overlaps(x, y)`	Whether `x` and `y` intersect
`phint_within(x, y)`	`x %within% y`	Whether `y` contains `x`
`x / duration(...)`	`x / duration(...)`	How many durations fit in `x`

All phinterval set operations work as expected with arbitrary time spans, enabling operations that are not supported by lubridate. For example, the intersection of two non-overlapping intervals is an empty time span, called a <hole>.

lubridate::intersect(jan_1_to_2, jan_4_to_9)
#> [1] NA--NA
phint_intersect(jan_1_to_2, jan_4_to_9)
#> <phinterval<UTC>[1]>
#> [1] <hole>

The set-difference of a time span and itself is also a <hole>.

lubridate::setdiff(jan_1_to_2, jan_1_to_2)
#> [1] 2000-01-01 UTC--2000-01-02 UTC
phint_setdiff(jan_1_to_2, jan_1_to_2)
#> <phinterval<UTC>[1]>
#> [1] <hole>

Performing a set-difference may “punch a hole” in a time span, creating a discontinuous interval.

try(lubridate::setdiff(jan_1_to_9, jan_3_to_5))
#> Error in setdiff.Interval(jan_1_to_9, jan_3_to_5) : 
#>   Cases 1 result in discontinuous intervals.
phint_setdiff(jan_1_to_9, jan_3_to_5)
#> <phinterval<UTC>[1]>
#> [1] {2000-01-01--2000-01-03, 2000-01-05--2000-01-09}

The union of two disjoint intervals is a single <phinterval> containing two spans.

lubridate::union(jan_1_to_2, jan_4_to_9)
#> [1] 2000-01-01 UTC--2000-01-09 UTC
phint_union(jan_1_to_2, jan_4_to_9)
#> <phinterval<UTC>[1]>
#> [1] {2000-01-01--2000-01-02, 2000-01-04--2000-01-09}

As with the lubridate equivalents, all phinterval set operations are vectorized.

phint_intersect(
  c(jan_1_to_2, jan_3_to_5, jan_1_to_2),
  c(jan_1_to_9, jan_4_to_9, jan_4_to_9)
)
#> <phinterval<UTC>[3]>
#> [1] {2000-01-01--2000-01-02} {2000-01-04--2000-01-05} <hole>

Inspiration

This package builds on {lubridate}’s <Interval> class for representing contiguous time spans. The prototype <phinterval> data structure (a list of matrices) and the C++ implementation of phint_squash() were inspired by the {intervals} package by Richard Bourgon and Edzer Pebesma. The figures used in this README were inspired by Davis Vaughan’s {ivs} package documentation.

r-phinterval

phinterval

Installation

Usage

Example: Employment History

phinterval \<-\> lubridate

Inspiration

Version

License

Status

Source

Homepage

Platforms (80)

phinterval

Installation

Usage

Example: Employment History

phinterval \<-\> lubridate

Inspiration

Version

License

Status

Source

Homepage

Platforms80 (80)

Platforms (80)