Generalizing estimates of causal effects from an experiment to a target population is of interest to scientists. However, researchers are usually constrained by available covariate information. Analysts can often collect much fewer variables from population samples than from experimental samples, which has limited applicability of existing approaches that assume rich covariate data from both experimental and population samples. In this article, we examine how to select covariates necessary for generalizing experimental results under such data constraints. In our concrete context of a large-scale development program in Uganda, although more than 40 pre-treatment covariates are available in the experiment, only 8 of them were also measured in a target population. We propose a method to estimate a separating set -- a set of variables affecting both the sampling mechanism and treatment effect heterogeneity -- and show that the population average treatment effect (PATE) can be identified by adjusting for estimated separating sets. Our algorithm only requires a rich set of covariates in the experimental data, not in the target population, by incorporating researcher-specific constraints on what variables are measured in the population data. Analyzing the development experiment in Uganda, we show that the proposed algorithm can allow for the PATE estimation in situations where conventional methods fail due to data requirements.
翻译:科学家们对将实验结果普遍化到目标人群的因果关系估计感兴趣,然而,研究人员通常受到现有共变信息的限制。分析家从人口样本中收集的变量往往比从实验样本中收集的变量少得多。分析家通常从人口样本中收集的变量要少得多,而试验样本和人口样本中都具有丰富的共变数据。在本条中,我们研究如何选择在这类数据限制下将实验结果普遍化所必需的变量。在乌干达大规模发展方案的具体背景下,尽管实验中存在40多个预处理共变体,但其中只有8个还在目标人群中进行测量。我们建议采用一种方法来估计一个分离的数据集 -- -- 一套影响取样机制和处理效果异质的变量 -- -- 并表明,通过调整估计的分离组数来确定人口平均治疗效果(PATE)。我们的算法只需要在实验数据中,而不是在目标人群中,通过纳入研究人员对人口数据中测算的变量的具体制约。我们分析乌干达的发展实验,我们表明,拟议的算法可以允许在常规方法无法估计的情况下对PATET进行数据估算。