沿最小速回归路径进行多重测试和变量选择 (Multiple Testing and Variable Selection along Least Angle Regression's path)

from arxiv, 62 pages; new: FDR control and power comparison between Knockoff, FCD, Slope and our proposed method; new: the introduction has been revised and now present a synthetic presentation of the main results. We believe that this introduction brings new insists compared to previous versions

In this article, we investigate multiple testing and variable selection using Least Angle Regression (LARS) algorithm in high dimensions under the Gaussian noise assumption. LARS is known to produce a piecewise affine solutions path with change points referred to as knots of the LARS path. The cornerstone of the present work is the expression in closed form of the exact joint law of K-uplets of knots conditional on the variables selected by LARS, namely the so-called post-selection joint law of the LARS knots. Numerical experiments demonstrate the perfect fit of our finding. Our main contributions are three fold. First, we build testing procedures on variables entering the model along the LARS path in the general design case when the noise level can be unknown. This testing procedures are referred to as the Generalized t-Spacing tests (GtSt) and we prove that they have exact non-asymptotic level (i.e., Type I error is exactly controlled). In that way, we extend a work from (Taylor et al., 2014) where the Spacing test works for consecutive knots and known variance. Second, we introduce a new exact multiple false negatives test after model selection in the general design case when the noise level can be unknown. We prove that this testing procedure has exact non-asymptotic level for general design and unknown noise level. Last, we give an exact control of the false discovery rate (FDR) under orthogonal design assumption. Monte-Carlo simulations and a real data experiment are provided to illustrate our results in this case. Of independent interest, we introduce an equivalent formulation of LARS algorithm based on a recursive function.

翻译：在此篇文章中, 我们使用最小角回归( LARS) 算法, 在高斯噪音假设下, 以高维值来调查多重测试和变量选择。已知的 LARS 将生成一个折叠式折叠式解决方案路径, 更改点被称为 LARS 路径的节节节。目前工作的基石是以 LARS 所选变量为条件的K- 点结节精确联合法的封闭表达形式, 即所谓的LARS 节选后联合法。数值实验显示我们发现时的完美匹配。我们的主要贡献是三折叠。首先, 我们建立测试程序, 在一般设计程序中, 在噪音水平可能未知的情况下, 沿 LARS 路径输入模型的变量。这个测试程序被称为通用的t- 间歇性测试( GtSt), 我们证明它们具有精确的不减损等级( i), 即, 准确的I 错误是完全控制。这样, 我们提供了一种独立的工作, 从 ( Weylor et al., 2014) 开始, 开始, 反复进行一个不精确的计算, 在连续设计中, 反复测试中, 反复测试一个不精确选择一个不精确的选项中, 。