Aggregation Trees
R package to implement aggregation trees, a nonparametric approach to discovering heterogeneous subgroups in a selection-on-observables framework.
The approach consists of three steps:
- Estimate the conditional average treatment effects (CATEs);
- Approximate the CATEs by a decision tree;
- Prune the tree.
This way, we generate a sequence of groupings, one for each granularity level. The resulting sequence is nested in the sense that subgroups formed at a given level of granularity are never broken at coarser levels. This guarantees consistency of the results across the different granularity levels, generally considered a basic requirement that every classification system should satisfy. Moreover, each grouping features an optimality property in that it ensures that the loss in explained heterogeneity resulting from aggregation is minimized.
Given the sequence of groupings, we can estimate the group average treatment effects (GATEs) as we like. The package supports two estimators, based on differences in mean outcomes between treated and control units (unbiased in randomized experiments) and on sample averages of doubly-robust scores (unbiased also in observational studies). The package also allows to get standard errors for the GATEs by estimating via OLS appropriate linear models. Under an "honesty"" condition, we can use the estimated standard errors to conduct valid inference about the GATEs as usual, e.g., by constructing conventional confidence intervals.
To get started, please check the online vignette for a short tutorial.
Installation
The package can be downloaded from CRAN:
install.packages("aggTrees")
Alternatively, the current development version of the package can be installed using the devtools
package:
devtools::install_github("riccardo-df/aggTrees") # run install.packages("devtools") if needed.
References
Athey, S., & Imbens, G. W. (2016). Recursive Partitioning for Heterogeneous Causal Effects.Proceedings of the National Academy of Sciences, 113(27). [paper]
Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized Random Forests.Annals of Statistics, 47(2). [paper]
Chernozhukov, V., Demirer, M., Duflo, E., & Fernandez-Val, I. (2017). Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments.National Bureau of Economic Research. [paper]
Cotterman, R., & Peracchi, F. (1992). Classification and aggregation: An application to industrial classification in cps data.Journal of Applied Econometrics, 7(1). [paper]
Di Francesco, R. (2022). Aggregation Trees.CEIS Research Paper, 546. [paper]
Holm, S. (1979). A Simple Sequentially Rejective Multiple Test Procedure.Scandinavian Journal of Statistics, 6(2). [paper]
Semenova, V., & Chernozhukov, V. (2021). Debiased Machine Learning of Conditional Average Treatment Effects and Other Causal Functions.The Econometrics Journal, 24(2). [paper]