The synthetic control (SC) method is a popular approach for estimating treatment effects from observational panel data. It rests on a crucial assumption that we can write the treated unit as a linear combination of the untreated units. This linearity assumption, however, can be unlikely to hold in practice and, when violated, the resulting SC estimates are incorrect. In this paper we examine two questions: (1) How large can the misspecification error be? (2) How can we limit it? First, we provide theoretical bounds to quantify the misspecification error. The bounds are comforting: small misspecifications induce small errors. With these bounds in hand, we then develop new SC estimators that are specially designed to minimize misspecification error. The estimators are based on additional data about each unit, which is used to produce the SC weights. (For example, if the units are countries then the additional data might be demographic information about each.) We study our estimators on synthetic data; we find they produce more accurate causal estimates than standard synthetic controls. We then re-analyze the California tobacco-program data of the original SC paper, now including additional data from the US census about per-state demographics. Our estimators show that the observations in the pre-treatment period lie within the bounds of misspecification error, and that the observations post-treatment lie outside of those bounds. This is evidence that our SC methods have uncovered a true effect.
翻译:合成控制(SC)法是估算观察面板数据的治疗效果的流行方法。它基于一个至关重要的假设,即我们可以将处理过的单位写成未经处理的单位的线性组合。然而,这一线性假设可能不大可能在实际中保持,如果违反的话,由此得出的SC估计数是不正确的。在本文件中,我们研究两个问题:(1) 误差的错误有多大?(2) 我们如何限制它? 首先,我们提供理论界限来量化误差的量化误差; 界限是令人欣慰的:小误差导致小误差。然后,我们用这些界限来开发新的SC估计器,专门设计这些测算器是为了尽量减少误差的误差。 估计器基于每个单位的额外数据,用来产生SC的权重。 (例如,如果这些单位是国家,那么额外的数据可能是关于每种误差的统计数据; 我们研究合成数据的估计器,我们发现它们比标准的合成控制方法更准确的因果关系估计。 然后,我们用这些误差来重新分析最初的SC测算器的测算器, 包括目前美国秘密处理中的额外误差数据。</s>