This paper proposes an innovative method for constructing confidence intervals and assessing p-values in statistical inference for high-dimensional linear models. The proposed method has successfully broken the high-dimensional inference problem into a series of low-dimensional inference problems: For each regression coefficient $\beta_i$, the confidence interval and $p$-value are computed by regressing on a subset of variables selected according to the conditional independence relations between the corresponding variable $X_i$ and other variables. Since the subset of variables forms a Markov neighborhood of $X_i$ in the Markov network formed by all the variables $X_1,X_2,\ldots,X_p$, the proposed method is coined as Markov neighborhood regression. The proposed method is tested on high-dimensional linear, logistic and Cox regression. The numerical results indicate that the proposed method significantly outperforms the existing ones. Based on the Markov neighborhood regression, a method of learning causal structures for high-dimensional linear models is proposed and applied to identification of drug sensitive genes and cancer driver genes. The idea of using conditional independence relations for dimension reduction is general and potentially can be extended to other high-dimensional or big data problems as well.
翻译:本文提出了一种创新方法,用于构建信任间隔和评估高维线性模型统计推导值中的p值。拟议方法成功地将高维推导问题破碎成一系列低维推推论问题:对于每个回归系数$\beta_i$美元,信任间隔和美元价值的计算方法是根据相应变量X美元和其他变量之间的有条件独立关系所选择的一组变量回归。由于变量子组构成由所有变量X_1,X_2,\ldots,X_p$4美元组成的Markov网络中的Markov区区为$X_i$_i$,因此拟议方法被折成一系列低维推论问题:对于每个回归系数的数值系数为$\_1,X_2,\ldots,X_p_$美元。拟议方法的计算方法是在高维线线、物流和Cox回归方面进行测试。数字结果显示,拟议方法大大超过现有变量。根据Markov街区回归,建议采用一种为高维线性模型学习因果结构的方法,用于识别药物敏感基因和癌症驱动基因。使用有条件独立关系的想法是高维的,可以扩展数据。