We propose a novel resampling-based method to construct an asymptotically exact test for any subset of hypotheses on coefficients in high-dimensional linear regression. It can be embedded into any multiple testing procedure to make confidence statements on relevant predictor variables. The method constructs permutation test statistics for any individual hypothesis by means of repeated splits of the data and a variable selection technique; then it defines a test for any subset by suitably aggregating its variables' test statistics. The resulting procedure is extremely flexible, as it allows different selection techniques and several combining functions. We present it in two ways: an exact method and an approximate one, that requires less memory usage and shorter computation time, and can be scaled up to higher dimensions. We illustrate the performance of the method with simulations and the analysis of real gene expression data.
翻译:我们提出一种新的基于重新抽样的方法,以构建一个对高维线性回归系数的任何子集假设的零星精确测试。它可以嵌入任何多个测试程序,以对相关预测变量做出信任声明。该方法通过数据反复分割和变量选择技术,为任何单个假设构建变异测试统计数据;然后通过适当汇总变量测试统计数据来定义任何子集的测试。由此产生的程序非常灵活,因为它允许不同的选择技术和几个组合功能。我们以两种方式展示该方法:精确方法和近似方法,需要减少记忆使用和缩短计算时间,并可以扩大到更高的层面。我们用模拟和分析真实基因表达数据来说明该方法的性能。