Randomized experiments can provide unbiased estimates of sample average treatment effects. However, estimates of population treatment effects can be biased when the experimental sample and the target population differ. In this case, the population average treatment effect can be identified by combining experimental and observational data. A good experiment design trumps all the analyses that come after. While most of the existing literature centers around improving analyses after RCTs, we instead focus on the design stage, fundamentally improving the efficiency of the combined causal estimator through the selection of experimental samples. We explore how the covariate distribution of RCT samples influences the estimation efficiency and derive the optimal covariate allocation that leads to the lowest variance. Our results show that the optimal allocation does not necessarily follow the exact distribution of the target cohort, but adjusted for the conditional variability of potential outcomes. We formulate a metric to compare and choose from candidate RCT sample compositions. We also develop variations of our main results to cater for practical scenarios with various cost constraints and precision requirements. The ultimate goal of this paper is to provide practitioners with a clear and actionable strategy to select RCT samples that will lead to efficient causal inference.
翻译:然而,当实验抽样和目标人口不同时,对人口治疗效果的估计可能会有偏差。在这种情况下,人口平均治疗效果可以通过合并实验和观察数据来确定。一个良好的实验设计胜过随后进行的所有分析。虽然大多数现有文献中心都围绕改进RCT分析,但我们却把重点放在设计阶段,通过选择实验样品从根本上提高综合因果估计器的效率。我们探讨RCT样本的共差分布如何影响估计效率,并得出导致最低差异的最佳共变分配。我们的结果显示,最佳分配不一定遵循目标组的确切分布,而是根据潜在结果的有条件变化进行调整。我们制定指标,比较和选择候选RCT样本构成。我们还开发主要结果的变异,以适应具有各种成本限制和精确要求的实际假设。本文的最终目标是向从业人员提供选择RCT样本的明确和可操作的战略,从而导致有效的因果关系推断。