We consider estimating the conditional average treatment effect for everyone by eliminating confounding and selection bias. Unfortunately, randomized clinical trials (RCTs) eliminate confounding but impose strict exclusion criteria that prevent sampling of the entire clinical population. Observational datasets are more inclusive but suffer from confounding. We therefore analyze RCT and observational data simultaneously in order to extract the strengths of each. Our solution builds upon Difference in Differences (DD), an algorithm that eliminates confounding from observational data by comparing outcomes before and after treatment administration. DD requires a parallel slopes assumption that may not apply in practice when confounding shifts across time. We instead propose Synthesized Difference in Differences (SDD) that infers the correct (possibly non-parallel) slopes by linearly adjusting a conditional version of DD using additional RCT data. The algorithm achieves state of the art performance across multiple synthetic and real datasets even when the RCT excludes the majority of patients.
翻译:我们考虑通过消除混乱和选择偏见来估计对每个人的有条件平均治疗效果。 不幸的是,随机临床试验消除了混乱,但规定了严格的排除标准,以防止对整个临床人口进行抽样。观察数据集更具包容性,但也有混乱。因此,我们同时分析RCT和观察数据,以便利用每种数据的优点。我们的解决方案以差异(DD)为基础,这一算法通过比较治疗前和治疗后的结果来消除与观察数据的混乱。DD需要一种平行的斜坡假设,这种假设在混淆时间的变化时可能无法在实际中适用。我们相反地提出了合成差异(SDD),通过用额外的RCT数据线性地调整一个有条件的DD(可能是非平行的)斜坡来推断正确的(可能是非平行的)斜坡。算法实现了多种合成和真实数据集的艺术性能状况,即使RCT排除了大多数病人。