MyNixOS website logo
Description

Staggered Difference-in-Differences with Nonlinear Outcomes.

Supports staggered difference-in-differences designs with nonlinear outcomes for both panel and repeated cross-section data. Implements estimators for staggered treatment adoption with binary, count, and other nonlinear outcomes, extending Callaway and Sant'Anna (2021) <doi:10.1016/j.jeconom.2020.12.001> to settings with nonlinear outcome models such as logit, probit, and Poisson. For panel data, units are followed over time and 'idname' identifies repeated observations. For repeated cross-section data, observations are independent within each time period; 'idname' is optional and may identify survey records or households, but the estimator does not require the same units to appear across periods. Repeated cross-section estimation includes pooled quasi-maximum likelihood approaches motivated by Wooldridge (2023) <doi:10.1093/ectj/utad016>, with optional weighting and clustered inference. Methods also draw on Roth and Sant'Anna (2023) <doi:10.3982/ECTA19402> and Sant'Anna and Zhao (2020) <doi:10.1016/j.jeconom.2020.06.003>.

NonlinearDiD

Staggered Difference-in-Differences with Nonlinear Outcomes — panel and repeated cross-section data

R-CMD-check CRAN status License: MIT

The Problem

The Callaway & Sant'Anna (2021) framework for staggered DiD is hugely influential — but it assumes continuous outcomes with linear parallel trends.

For binary outcomes (employed/not, hospitalized/not, defaulted/not), this creates fundamental problems:

ProblemWhy it matters
Scale sensitivityParallel trends in P(Y=1) ≠ parallel trends in log-odds. Pre-trends can appear flat or steep depending on the scale.
Jensen's inequalityTreatment effects on the probability scale mix the "real" effect with curvature of the CDF.
Heterogeneous baseline ratesUnits with different baseline probabilities will show "spurious" violations of parallel trends even under no treatment effect.

NonlinearDiD extends the CS2021 framework to properly handle logit, probit, Poisson, and negative binomial outcome models — for both panel and repeated cross-section data.

What's New in 0.2.0

  • Repeated cross-section support — different individuals each period (e.g. BRFSS, NHIS, CPS supplements). Set data_type = "repeated_cross_section"; idname becomes optional.
  • Sampling weights — pass weightsname = "wt" and the weight is threaded through the outcome regression, propensity score, and pooled QMLE.
  • Clustered inference — pass cluster_var = "state" for sandwich::vcovCL() analytical SEs and cluster-resampling bootstrap.
  • No more compilation — the Rcpp helpers from 0.1.0 are now pure R, so installation is one step on every platform.
  • All v0.1.0 functions and arguments preserved — existing scripts using named arguments continue to work unchanged.

Installation

# CRAN
install.packages("NonlinearDiD")

# GitHub (development version)
remotes::install_github("causalfragility-lab/NonlinearDiD")

Quick Start: Panel Data

library(NonlinearDiD)

# 1. Simulate staggered binary panel data
dat <- sim_binary_panel(n = 500, nperiods = 8, n_cohorts = 3,
                        prop_treated = 0.5, true_att = 0.25, seed = 42)

# 2. Estimate ATT(g,t) with logistic outcome model
res <- nonlinear_attgt(
  data          = dat,
  yname         = "y",
  tname         = "period",
  idname        = "id",
  gname         = "g",
  xformla       = ~ x1 + x2,
  outcome_model = "logit",
  estimand      = "att",
  control_group = "nevertreated",
  doubly_robust = TRUE
)

# 3. Aggregate into event-study
agg <- nonlinear_aggte(res, type = "dynamic")
plot(agg)

# 4. Pre-treatment parallel trends test
nonlinear_pretest(res)

Quick Start: Repeated Cross-Section Data

library(NonlinearDiD)

# 1. Simulate repeated cross-section binary data
rcs <- sim_binary_rcs(n_per_period = 500, nperiods = 8,
                      prop_treated = 0.5, true_att = 0.3, seed = 7)

# 2. Estimate ATT(g,t) — note the new data_type argument
res <- nonlinear_attgt(
  data          = rcs,
  yname         = "y",
  tname         = "period",
  gname         = "g",
  outcome_model = "logit",
  estimand      = "ape",
  data_type     = "repeated_cross_section",
  control_group = "notyetreated"
)

plot(nonlinear_aggte(res, type = "dynamic"))

Survey-Weighted Real-World Example

Repeated cross-section survey data (e.g. CPS Food Security Supplement) with sampling weights and state-level clustering:

res <- nonlinear_attgt(
  data          = snap_data,
  yname         = "food_insecure",
  tname         = "year",
  gname         = "policy_end_year",
  idname        = "household_id",    # optional — used only as a record ID
  data_type     = "repeated_cross_section",
  outcome_model = "logit",
  estimand      = "ape",
  weightsname   = "survey_weight",
  cluster_var   = "state",
  control_group = "notyetreated"
)

summary(res)
nonlinear_aggte(res, type = "dynamic")

Key Functions

FunctionDescription
nonlinear_attgt()Main engine: estimates ATT(g,t) for all cohort × time cells. Panel and repeated cross-section.
nonlinear_aggte()Aggregates ATT(g,t) into event-study, group, calendar, or overall ATT
nonlinear_pretest()Tests pre-treatment parallel trends (joint + individual + HonestDiD)
binary_did_logit()Simple 2×2 DiD with logistic outcome
binary_did_probit()Simple 2×2 DiD with probit outcome
binary_did_dr()Doubly-robust binary DiD
count_did_poisson()Poisson QMLE DiD for count outcomes (Wooldridge 2023)
odds_ratio_did()Odds-ratio DiD estimator
nonlinear_bounds()Nonparametric Manski / PT bounds
sim_binary_panel()Simulate binary panel data for testing
sim_binary_rcs()Simulate binary repeated cross-section data (new in 0.2.0)
sim_count_panel()Simulate count panel data for testing

Estimands

EstimandScaleWhen to use
"att"Link scale (log-odds / probit index / log-count)Compare with linear DiD coefficient
"ape"Probability scaleWhat practitioners usually report
"odds_ratio"MultiplicativeScale-free; natural for 2×2 tables

Panel vs Repeated Cross-Section

NonlinearDiD supports staggered difference-in-differences designs with nonlinear outcomes for both panel and repeated cross-section data.

  • Panel data (data_type = "panel", default): units are followed over time and idname identifies repeated observations. Estimation uses within-unit outcome changes following Callaway & Sant'Anna (2021).

  • Repeated cross-section data (data_type = "repeated_cross_section"): observations are independent within each time period. idname is optional and may identify survey records or households, but the estimator does not require the same units to appear across periods. Estimation uses pooled quasi-maximum likelihood approaches motivated by Wooldridge (2023), with an optional IPW-augmented doubly-robust variant.

Outcome Models

outcome_modelParallel Trends AssumptionOutcome Type
"logit"Parallel in log-oddsBinary (0/1)
"probit"Parallel in probit indexBinary (0/1)
"poisson"Parallel in log-countCount (≥ 0)
"negbin"Parallel in log-countOverdispersed count
"linear"Parallel in mean (LPM)Continuous / binary

References

  • Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200–230. https://doi.org/10.1016/j.jeconom.2020.12.001
  • Roth, J., & Sant'Anna, P. H. C. (2023). When is parallel trends sensitive to functional form? Econometrica, 91(2), 737–747. https://doi.org/10.3982/ECTA19402
  • Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. The Econometrics Journal, 26(3), C31–C66. https://doi.org/10.1093/ectj/utad016
  • Sant'Anna, P. H. C., & Zhao, J. (2020). Doubly robust difference-in-differences estimators. Journal of Econometrics, 219(1), 101–122. https://doi.org/10.1016/j.jeconom.2020.06.003
  • Manski, C. F. (1990). Nonparametric bounds on treatment effects. American Economic Review, 80(2), 319–323.

Contributing

This package addresses an active research frontier. Contributions, bug reports, and methodological suggestions are welcome — please open an issue or pull request on GitHub.

License

MIT © 2026 Subir Hait.

Metadata

Version

0.2.0

License

Unknown

Platforms (80)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arc-linux
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • sh4-linux
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows