两小组数据回归的两个故事 (A Tale of Two Panel Data Regressions)

A central goal in social science is to evaluate the causal effect of a policy. In this pursuit, researchers often organize their observations in a panel data format, where a subset of units are exposed to a policy (treatment) for some time periods while the remaining units are unaffected (control). The spread of information across time and space motivates two general approaches to estimate and infer causal effects: (i) unconfoundedness, which exploits time series patterns, and (ii) synthetic controls, which exploits cross-sectional patterns. Although conventional wisdom decrees that the two approaches are fundamentally different, we show that they yield numerically identical estimates under several popular settings that we coin the symmetric class. We study the two approaches for said class under a generalized regression framework and argue that valid inference relies on both correlation patterns. Accordingly, we construct a mixed confidence interval that captures the uncertainty across both time and space. We illustrate its advantages over inference procedures that only account for one dimension using data-inspired simulations and empirical applications. Building on these insights, we advocate for panel data agnostic (PANDA) regression--rooted in model checking and based on symmetric estimators and mixed confidence intervals--when the data generating process is unknown.

翻译：社会科学的一个中心目标是评估政策的因果关系。在这项工作中,研究人员往往以小组数据格式组织观察,其中一组单位在一段时间内受到政策(处理)的影响,其余单位不受影响(控制)。信息在时间和空间之间的传播促使采取两种一般方法来估计和推断因果关系:(一) 缺乏根据,利用时间序列模式,以及(二) 利用跨部门模式的合成控制。虽然传统智慧规定这两种方法根本不同,但我们显示,在几种流行环境中,他们得出数字相同的估计,我们创出对称等级。我们根据普遍回归框架研究该类别的两个方法,并论证有效的推断取决于两个相关模式。因此,我们构建一个混合的信任间隔,捕捉时间和空间的不确定性。我们用数据启发模拟和实验应用来说明其优点。我们根据这些洞察力,我们主张在模型检查中采用小组数据回归(PANDA),并基于模型检查和模型测距的测距数据。