Negative control variables are increasingly used to adjust for unmeasured confounding bias in causal inference using observational data. They are typically identified by subject matter knowledge and there is currently a severe lack of data-driven methods to find negative controls. In this paper, we present a statistical test for discovering negative controls of a special type -- disconnected negative controls -- that can serve as surrogates of the unmeasured confounder, and we incorporate that test into the Data-driven Automated Negative Control Estimation (DANCE) algorithm. DANCE first uses the new validation test to identify subsets of a set of candidate negative control variables that satisfy the assumptions of disconnected negative controls. It then applies a negative control method to each pair of these validated negative control variables, and aggregates the output to produce an unbiased point estimate and confidence interval for a causal effect in the presence of unmeasured confounding. We (1) prove the correctness of this validation test, and thus of DANCE; (2) demonstrate via simulation experiments that DANCE outperforms both naive analysis ignoring unmeasured confounding and negative control method with randomly selected candidate negative controls; and (3) demonstrate the effectiveness of DANCE on a challenging real-world problem.
翻译:负控制变量正越来越多地用于调整,以适应使用观测数据的因果推断中未测得的偏差。这些变量通常由主题知识确定,目前严重缺乏以数据驱动的方法来寻找负控制。在本文件中,我们提出了一个统计测试,以发现特殊类型的消极控制 -- -- 断开的负控制 -- -- 可用于替代未测的混结者,我们将该测试纳入数据驱动的自动负控制估计算法。Cancer首先使用新的验证测试来识别符合断开负控制假设的一组候选负控制变量的子集。然后对每对经验证的负控制变量采用负控制方法,并汇总输出以产生不偏差的点估计和信任间隔,从而产生未测混结的因果关系。我们(1) 证明这一验证测试的正确性,并因此将Decer纳入Dance的算法。(2) 通过模拟实验证明,丹斯的天性分析既不符合不测得的和负控制方法,又不理会忽略随机选择的候选负面控制;(3) 挑战Dance在现实世界上的有效性。