MyNixOS website logo
Description

Set Operations on Time Intervals.

Implements the phinterval vector class for representing time spans that may contain gaps (disjoint intervals) or be empty. This class generalizes the 'lubridate' package's interval class to support vectorized set operations (intersection, union, difference, complement) that always return a valid time span, even when disjoint or empty intervals are created.

phinterval

Codecov testcoverage Lifecycle:experimental R-CMD-check

phinterval is a package for representing and manipulating time spans that may contain gaps. It implements the <phinterval> (think “potentially-holey-interval”) vector class, designed as an extension of the {lubridate}<Interval>, to represent spans of time that are contiguous, disjoint, empty, or missing.

Functionality for manipulating these spans includes:

  • Merging overlapping or adjacent intervals into non-overlapping sets of time spans.
  • Performing set operations: union, intersection, difference, and complement.
  • Testing whether time spans, dates, or times fall within one another or overlap.

Installation

Install the released version from CRAN with:

install.packages("phinterval")

You can install the development version of phinterval from GitHub with:

# install.packages("pak")
pak::pak("EthanSansom/phinterval")

Usage

Each element of a <phinterval> vector is a set of non-overlapping and non-adjacent intervals. For scalar intervals (one span per element), phinterval() works like lubridate::interval():

library(phinterval)
library(lubridate, warn.conflicts = FALSE)

# Create scalar phintervals (equivalent to interval())
phinterval(
  start = ymd(c("2000-01-01", "2000-01-03", "2000-01-04")),
  end = ymd(c("2000-01-02", "2000-01-05", "2000-01-09"))
)
#> <phinterval<UTC>[3]>
#> [1] {2000-01-01--2000-01-02} {2000-01-03--2000-01-05} {2000-01-04--2000-01-09}

To create phintervals with multiple disjoint spans per element, use the by argument to group intervals. Overlapping or adjacent spans within each group are automatically merged:

# Create a phinterval with disjoint spans using the by argument
phint <- phinterval(
  start = ymd(c("2000-01-03", "2000-01-01", "2000-01-04")),
  end = ymd(c("2000-01-05", "2000-01-02", "2000-01-09")),
  by = c(1, 2, 2)
)
phint
#> <phinterval<UTC>[2]>
#> [1] {2000-01-03--2000-01-05}                        
#> [2] {2000-01-01--2000-01-02, 2000-01-04--2000-01-09}

Graphically, the elements of phint are represented as:

In most cases, a <phinterval> vector will appear as the result of manipulating <Interval> vectors. For example, phint_squash() flattens a vector of time spans into a scalar <phinterval>.

jan_1_to_9 <- interval(ymd("2000-01-01"), ymd("2000-01-09"))
jan_1_to_2 <- interval(ymd("2000-01-01"), ymd("2000-01-02"))
jan_3_to_5 <- interval(ymd("2000-01-03"), ymd("2000-01-05"))
jan_4_to_9 <- interval(ymd("2000-01-04"), ymd("2000-01-09"))

ints <- c(jan_1_to_2, jan_3_to_5, jan_4_to_9)
phint_squash(ints)
#> <phinterval<UTC>[1]>
#> [1] {2000-01-01--2000-01-02, 2000-01-03--2000-01-09}

The squashed intervals contain the set of time spans within any of the input intervals, without duplication.

Example: Employment History

The phinterval package is most useful when working with tabular data, such as a longitudinal employment panel.

library(dplyr, warn.conflicts = FALSE)

jobs <- tribble(
  ~name,   ~job_title,             ~start,        ~end,
  "Greg",  "Mascot",               "2018-01-01",  "2018-06-03",
  "Greg",  "Executive Assistant",  "2018-06-10",  "2020-04-01",
  "Shiv",  "Political Consultant", "2017-01-01",  "2019-04-01"
)

employment <- jobs |>
  # Squash overlapping/adjacent intervals into a single phinterval
  group_by(name) |>
  summarize(employed = datetime_squash(ymd(start), ymd(end))) |>
  # Invert the employment timeline to find gaps
  mutate(unemployed = phint_invert(employed))

employment
#> # A tibble: 2 × 3
#>   name  employed                    unemployed              
#>   <chr> <phint<UTC>>                <phint<UTC>>            
#> 1 Greg  {2018-01-01-[2]-2020-04-01} {2018-06-03--2018-06-10}
#> 2 Shiv  {2017-01-01--2019-04-01}    <hole>

<phinterval> column formatting adapts to the available console width. The "[2]" in Greg’s employment interval "{2018-01-01-[2]-2020-04-01}" indicates that his employment history is made up of two disjoint spans, with the first span beginning on 2018-01-01 and the second ending on 2020-04-01. When more space is available, every span is shown explicitly.

employment |> select(name, employed)
#> # A tibble: 2 × 2
#>   name  employed                                        
#>   <chr> <phint<UTC>>                                    
#> 1 Greg  {2018-01-01--2018-06-03, 2018-06-10--2020-04-01}
#> 2 Shiv  {2017-01-01--2019-04-01}

Operations on <phinterval> vectors behave like those on standard intervals. Here, we can see that there was a 7-day gap in Greg’s employment history:

employment |>
  mutate(
    days_employed = employed / ddays(1),
    days_unemployed = unemployed / ddays(1)
  ) |>
  select(name, days_employed, days_unemployed)
#> # A tibble: 2 × 3
#>   name  days_employed days_unemployed
#>   <chr>         <dbl>           <dbl>
#> 1 Greg            814               7
#> 2 Shiv            820               0

phinterval \<-\> lubridate

The <phinterval> class is a generalization of the <Interval> class, meaning any <Interval> can be converted into an equivalent <phinterval> and all phinterval functions accept either <Interval> or <phinterval> inputs. The table below shows the lubridate functions that have drop-in phinterval replacements.

phintervallubridateReturns
phinterval(start, end)interval(start, end)Spans bounded by start/end
phint_intersect(x, y)intersect(x, y)Times in x and y
phint_setdiff(x, y)setdiff(x, y)Times in x, but not in y
phint_union(x, y)union(x, y)Times in x or y
phint_start(x)int_start(x)The start time of x
phint_end(x)int_end(x)The end time of x
phint_length(x)int_length(x)The number of seconds in x
phint_overlaps(x, y)int_overlaps(x, y)Whether x and y intersect
phint_within(x, y)x %within% yWhether y contains x
x / duration(...)x / duration(...)How many durations fit in x

All phinterval set operations work as expected with arbitrary time spans, enabling operations that are not supported by lubridate. For example, the intersection of two non-overlapping intervals is an empty time span, called a <hole>.

lubridate::intersect(jan_1_to_2, jan_4_to_9)
#> [1] NA--NA
phint_intersect(jan_1_to_2, jan_4_to_9)
#> <phinterval<UTC>[1]>
#> [1] <hole>

The set-difference of a time span and itself is also a <hole>.

lubridate::setdiff(jan_1_to_2, jan_1_to_2)
#> [1] 2000-01-01 UTC--2000-01-02 UTC
phint_setdiff(jan_1_to_2, jan_1_to_2)
#> <phinterval<UTC>[1]>
#> [1] <hole>

Performing a set-difference may “punch a hole” in a time span, creating a discontinuous interval.

try(lubridate::setdiff(jan_1_to_9, jan_3_to_5))
#> Error in setdiff.Interval(jan_1_to_9, jan_3_to_5) : 
#>   Cases 1 result in discontinuous intervals.
phint_setdiff(jan_1_to_9, jan_3_to_5)
#> <phinterval<UTC>[1]>
#> [1] {2000-01-01--2000-01-03, 2000-01-05--2000-01-09}

The union of two disjoint intervals is a single <phinterval> containing two spans.

lubridate::union(jan_1_to_2, jan_4_to_9)
#> [1] 2000-01-01 UTC--2000-01-09 UTC
phint_union(jan_1_to_2, jan_4_to_9)
#> <phinterval<UTC>[1]>
#> [1] {2000-01-01--2000-01-02, 2000-01-04--2000-01-09}

As with the lubridate equivalents, all phinterval set operations are vectorized.

phint_intersect(
  c(jan_1_to_2, jan_3_to_5, jan_1_to_2),
  c(jan_1_to_9, jan_4_to_9, jan_4_to_9)
)
#> <phinterval<UTC>[3]>
#> [1] {2000-01-01--2000-01-02} {2000-01-04--2000-01-05} <hole>

Inspiration

This package builds on {lubridate}’s <Interval> class for representing contiguous time spans. The prototype <phinterval> data structure (a list of matrices) and the C++ implementation of phint_squash() were inspired by the {intervals} package by Richard Bourgon and Edzer Pebesma. The figures used in this README were inspired by Davis Vaughan’s {ivs} package documentation.

Metadata

Version

1.0.0

License

Unknown

Platforms (78)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows