Consider a panel data setting with observations of $N$ units over $T$ time steps. Each of the $N$ units undergoes exactly one of $D$ interventions at time step $T_0$, with $1 \le T_0 < T$, prior to which all units experience no intervention, i.e., control. We present a causal framework, synthetic interventions (SI), to estimate the counterfactual outcome of each unit under each of the $D$ interventions, averaged over the post-intervention time period. We prove identification of the causal parameter of interest under a latent factor model across time, units, and interventions. We furnish an algorithm to estimate the causal parameter, which utilizes principal component regression (PCR) as a key subroutine. We argue that PCR implicitly de-noises the observations, which are corrupted by idiosyncratic measurement error, and thus advocate for its usage in panel data settings. Formally, we establish consistency and asymptotic normality of the estimated causal parameter. We then compare our assumptions and results with those in the synthetic control (SC) literature. In doing so, we establish identification and inference results for SC as well. We further introduce a novel hypothesis test, with provable guarantees, to validate when to use SI (and thereby SC). Empirically, we showcase the efficacy of the SI framework on synthetic and real-world data. Finally, we discuss connections between the SI causal framework and tensor estimation.
翻译:考虑小组数据设置,对美元单位的汇率为美元单位超过1美元的时间步骤进行观察。美元单位的每个单位在时间步骤为0.00美元时,完全使用美元干预,每单位在时间步骤为0.00美元,1美元T_0美元以下T$,在此之前所有单位均没有干预,即控制。我们提出了一个因果框架、合成干预(SI),以估计每个单位在每次以美元为单位的干预措施下的反事实结果,平均在干预后时期的平均数。我们证明在时间、单位和干预中,在潜在因素模型下确定了利息的因果参数。我们提供了估算因果参数的算法,该参数使用主要组成部分回归(PCR)作为关键的次轨迹。我们说,PCR隐含了对观察意见的否认,这些观察因偏差测量错误而腐蚀,从而倡导在小组数据设置过程中使用这一结果。形式上,我们确定了估计因果参数的一致性和偶然性。我们随后将我们的假设和结果与合成控制(SC)文献中的数据(PCR)中的那些结果进行了比较。我们这样做时,最后将确定并用精确地检验了标准来检验。