沿最小速回归路径的多重测试和变量选择 (Multiple Testing and Variable Selection along the path of the Least Angle Regression)

from arxiv, 58 pages; new: FDR control and power comparison between Knockoff, FCD, Slope and our proposed method; new: the introduction has been revised and now present a synthetic presentation of the main results. We believe that this introduction brings new insists compared to previous versions

We investigate multiple testing and variable selection using the Least Angle Regression (LARS) algorithm in high dimensions under the assumption of Gaussian noise. LARS is known to produce a piecewise affine solution path with change points referred to as the knots of the LARS path. The key to our results is an expression in closed form of the exact joint law of a $K$-tuple of knots conditional on the variables selected by LARS, namely the so-called post-selection joint law of the LARS knots. Numerical experiments demonstrate the perfect fit of our findings. This paper makes three main contributions. First, we build testing procedures on variables entering the model along the LARS path in the general design case when the noise level can be unknown. These testing procedures are referred to as the Generalized $t$-Spacing tests (GtSt) and we prove that they have an exact non-asymptotic level (i.e., the Type I error is exactly controlled). This extends work of (Taylor et al., 2014) where the spacing test works for consecutive knots and known variance. Second, we introduce a new exact multiple false negatives test after model selection in the general design case when the noise level may be unknown. We prove that this testing procedure has exact non-asymptotic level for general design and unknown noise level. Third, we give an exact control of the false discovery rate under orthogonal design assumption. Monte Carlo simulations and a real data experiment are provided to illustrate our results in this case. Of independent interest, we introduce an equivalent formulation of the LARS algorithm based on a recursive function.

翻译：我们使用最小角回归(LARS) 算法,在高山噪音的假设下,在高山噪音的假设下,用高地的高度进行多重测试和变量选择。已知的LARS将产生一个小费的折叠式解决方案路径, 其变化点被称为 LARS 路径的节点。我们结果的关键是以LARS 所选择的变量为条件, 即所谓的LARS节选后联合法(LARS) 的封闭式组合法表达的。数值实验显示我们的调查结果的完美匹配。本文做出三大贡献。首先, 我们建立测试程序, 在一般设计过程中, 在一般设计过程中, 通常的 $- 美元- 间距测试( GtSt), 并且我们证明它们有一个准确的非默认的级别( 即, 类型I 错误是完全控制的 ) 。此扩展工作( Taylorororal et al., 在连续结点和已知的变异点中, 在一般设计中, 我们的模型测试中, 反复测试一个不精确的计算程序, 之后, 将一个不精确的精确测试一个不精确的计算。。