In the high dimensional regression analysis when the number of predictors is much larger than the sample size, an important question is to select the important variable which are relevant to the response variable of interest. Variable selection and the multiple testing are both tools to address this issue. However, there is little discussion on the connection of these two areas. When the signal strength is strong enough such that the selection consistency is achievable, it seems to be unnecessary to control the false discovery rate. In this paper, we consider the regime where the signals are both rare and weak such that the selection consistency is not achievable and propose a method which controls the false discovery rate asymptotically. It is theoretically shown that the false non-discovery rate of the proposed method converges to zero at the optimal rate. Numerical results are provided to demonstrate the advantage of the proposed method.
翻译:在高维回归分析中,当预测器数量大大大于抽样规模时,一个重要问题是选择与响应兴趣变量有关的重要变量。变量选择和多重测试都是解决这一问题的工具。然而,对于这两个区域的关联几乎没有讨论。当信号强度足够强,从而可以实现选择的一致性时,似乎没有必要控制虚假的发现率。在本文中,我们认为信号既稀有又薄弱的系统,因此选择的一致性无法实现,并提出一种方法来控制虚假的发现率。理论上表明,拟议方法的虚假的未发现率以最佳速率达到零。提供了数字结果,以证明拟议方法的优势。