Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected by hidden confounding and propose the {\em Doubly Debiased Lasso} estimator for individual components of the regression coefficient vector. Our advocated method simultaneously corrects both the bias due to estimation of high-dimensional parameters as well as the bias caused by the hidden confounding. We establish its asymptotic normality and also prove that it is efficient in the Gauss-Markov sense. The validity of our methodology relies on a dense confounding assumption, i.e. that every confounding variable affects many covariates. The finite sample performance is illustrated with an extensive simulation study and a genomic application.
翻译:从观测数据中推断因果关系或相关关联可能因存在隐藏的混淆而无效。 我们侧重于高维线性回归环境, 测量的共变体会受到隐藏的混乱影响, 并提议对回归系数矢量的单个组成部分使用 \ yem Doubly Debiased Lasso} 估计符。 我们提倡的方法同时纠正由于估计高维参数而产生的偏差以及隐藏的混乱造成的偏差。 我们确立了其无药可依的正常性, 并证明它在高斯- Markov 意义上是有效的。 我们方法的有效性取决于一个密集的折叠的假设, 即每个相融合的变量都会影响许多共变体。 通过广泛的模拟研究和基因组学应用来说明有限的样本性能。