This paper introduces aggregate Bayesian Causal Forests (aBCF), a new Bayesian model for causal inference using aggregated data. Aggregated data are common in policy evaluations where we observe individuals such as students, but participation in an intervention is determined at a higher level of aggregation, such as schools implementing a curriculum. Interventions often have millions of individuals but far fewer higher-level units, making aggregation computationally attractive. To analyze aggregated data, a model must account for heteroskedasticity and intraclass correlation (ICC). Like Bayesian Causal Forests (BCF), aBCF estimates heterogeneous treatment effects with minimal parametric assumptions, but accounts for these aggregated data features, improving estimation of average and aggregate unit-specific effects. After introducing the aBCF model, we demonstrate via simulation that aBCF improves performance for aggregated data over BCF. We anchor our simulation on an evaluation of a large-scale Medicare primary care model. We demonstrate that aBCF produces treatment effect estimates with a lower root mean squared error and narrower uncertainty intervals while achieving the same level of coverage. We show that aBCF is not sensitive to the prior distribution used and that estimation improvements relative to BCF decline as the ICC approaches one. Code is available at https://github.com/mathematica-mpr/bcf-1.
翻译:暂无翻译