Variable selection properties of procedures utilizing penalized-likelihood estimates is a central topic in the study of high dimensional linear regression problems. Existing literature emphasizes the quality of ranking of the variables by such procedures as reflected in the receiver operating characteristic curve or in prediction performance. Specifically, recent works have harnessed modern theory of approximate message-passing (AMP) to obtain, in a particular setting, exact asymptotic predictions of the type I-type II error tradeoff for selection procedures that rely on $\ell_{p}$-regularized estimators. In practice, effective ranking by itself is often not sufficient because some calibration for Type I error is required. In this work we study theoretically the power of selection procedures that similarly rank the features by the size of an $\ell_{p}$-regularized estimator, but further use Model-X knockoffs to control the false discovery rate in the realistic situation where no prior information about the signal is available. In analyzing the power of the resulting procedure, we extend existing results in AMP theory to handle the pairing between original variables and their knockoffs. This is used to derive exact asymptotic predictions for power. We apply the general results to compare the power of the knockoffs versions of Lasso and thresholded-Lasso selection, and demonstrate that in the i.i.d. covariate setting under consideration, tuning by cross-validation on the augmented design matrix is nearly optimal. We further demonstrate how the techniques allow to analyze also the Type S error, and a corresponding notion of power, when selections are supplemented with a decision on the sign of the coefficient.
翻译:在高维线性回归问题的研究中,使用惩罚性相似性估计值的程序的变量选择特性是一个中心议题。 现有文献强调,通过接收器运行特征曲线或预测性性能所反映程序对变量进行排序的质量。 具体地说,最近的工作利用了近似信息传递(AMP)的现代理论,在特定环境下,在缺乏先前信号信息的情况下,在现实情况下,对I型第二类误差的精确无症状预测。在分析由此产生的程序的力量时,我们扩展了AMP理论中的现有结果,以进一步处理原始变量和其错位之间的配对比。在这项工作中,我们从理论上研究选择程序的能力,这种能力与美元=ell ⁇ p}常规化估计值相似,但进一步使用模型-X的敲击来控制真实性发现率,在无法获得信号信息的情况下,我们将AMP理论中的现有结果用于补充原始变量及其错位的调。 这项工作也用于对精准性精准性精准性精准的精准性精准度, 将精准性精准的精准性精准性精准性精准性精准性精准性精准性精准性精准度的精准度的精准度用于对精准度的精准度的精准度, 。,, 将精准性精准性精准性精准性精准性精准度的精准性精准性精准性精准性精准性精准性精准性精准性精准度的精准精准度的精准性精准性精准性精准度的精准度的精准度的精准度的精准性精准性精准度的精准性精准性精准度的精准性精准度,,, 。