Randomized controlled trials generate experimental variation that can credibly identify causal effects, but often suffer from limited scale, while observational datasets are large, but often violate desired identification assumptions. To improve estimation efficiency, I propose a method that combines experimental and observational datasets when 1) units from these two datasets are similar and 2) some characteristics of these units are observed. I show that if these characteristics can partially explain treatment assignment in the observational data, they can be used to derive moment restrictions that, in combination with the experimental data, improve estimation efficiency. I outline three estimators (weighting, shrinkage, or GMM) for implementing this strategy, and show that my methods can reduce variance by up to 50% in typical experimental designs; therefore, only half of the experimental sample is required to attain the same statistical precision. If researchers are allowed to design experiments differently, I show that they can further improve the precision by directly leveraging this correlation between characteristics and assignment. I apply my method to a search listing dataset from Expedia that studies the causal effect of search rankings, and show that the method can substantially improve the precision.
翻译:由随机控制的试验会产生实验变异,可以令人信服地确定因果关系,但往往受到有限规模的影响,而观察数据集则庞大,但往往违反预期的识别假设。为了提高估计效率,我提议一种方法,将实验和观察数据集结合起来,只要1个来自这两个数据集的单元相似,2个单元的一些特点被观察到。我表明,如果这些特性可以部分解释观察数据中的治疗任务,它们可以用来产生与实验数据相结合的瞬间限制,提高估计效率。我概述了执行这一战略的3个估计数据(加权、缩水或GMM),并表明在典型的实验设计中,我的方法可以将差异减少高达50%;因此,只有一半的实验样品需要达到同样的统计精确度。如果允许研究人员以不同的方式设计实验,我表明它们可以通过直接利用这些特性和任务之间的关联来进一步提高精确度。我的方法用于搜索Expedia的数据集,以研究搜索等级的因果关系,并表明该方法可以大大改进精确度。