Simultaneously finding multiple influential variables and controlling the false discovery rate (FDR) for linear regression models is a fundamental problem. We here propose the Gaussian Mirror (GM) method, which creates for each predictor variable a pair of mirror variables by adding and subtracting a randomly generated Gaussian perturbation, and proceeds with a certain regression method, such as the ordinary least-square or the Lasso (the mirror variables can also be created after selection). The mirror variables naturally lead to test statistics effective for controlling the FDR. Under a mild assumption on the dependence among the covariates, we show that the FDR can be controlled at any designated level asymptotically. We also demonstrate through extensive numerical studies that the GM method is more powerful than many existing methods for selecting relevant variables subject to FDR control, especially for cases when the covariates are highly correlated and the influential variables are not overly sparse.
翻译:同时寻找多个有影响的变量并控制线性回归模型的虚假发现率(FDR)是一个根本性问题。 我们在此提议了高山镜(GM)方法,该方法通过增减随机生成的高山扰动,为每个预测或变数创造一对镜像变量,并采用某种回归法,如普通最小方或拉索(镜象变量也可以在选择后生成)。镜像变量自然导致测试控制FDR的有效统计数据。根据对共变变量之间依赖性的轻微假设,我们表明FDR可以在任何指定级别上以静态方式控制。我们还通过广泛的数字研究表明,在选择受FDR控制的有关变量时,GM方法比许多现有方法更强大,特别是当共变变量高度关联且有影响力的变量不过分稀少时。