In high dimensional variable selection problems, statisticians often seek to design multiple testing procedures controlling the false discovery rate (FDR) and simultaneously discovering more relevant variables. Model-X methods, such as Knockoffs and conditional randomization tests, achieve the first goal of finite-sample FDR control under the assumption of known covariates distribution. However, it is not clear whether these methods can concurrently achieve the second goal of maximizing the number of discoveries. In fact, designing procedures to discover more relevant variables with finite-sample FDR control is a largely open question, even in the arguably simplest linear models. In this paper, we derive near-optimal testing procedures in high dimensional Bayesian linear models with isotropic covariates. We propose a Model-X multiple testing procedure, PoEdCe, which provably controls the frequentist FDR from finite samples even under model misspecification, and conjecturally achieves near-optimal power when the data follow the Bayesian linear model with a known prior. PoEdCe has three important ingredients: Posterior Expectation, distilled Conditional randomization test (dCRT), and the Benjamini-Hochberg procedure with e-values (eBH). The optimality conjecture of PoEdCe is based on a heuristic calculation of its asymptotic true positive proportion (TPP) and false discovery proportion (FDP), which is supported by methods from statistical physics as well as extensive numerical simulations. Furthermore, when the prior is unknown, we show that an empirical Bayes variant of PoEdCe still has finite-sample FDR control and achieves near-optimal power.
翻译:在高维变量选择问题中,统计人员往往试图设计多种测试程序来控制虚假发现率(FDR),并同时发现更相关的变量。模型X方法,例如击球和有条件随机测试,在已知的共变分布假设下实现了有限抽样FDR控制的第一个目标。然而,尚不清楚这些方法是否能够同时实现最大限度地增加发现数量的第二个目标。事实上,设计程序以发现更多相关变量来控制有限抽样 FDR控制是一个大都尚未解决的问题,即使是在可论证的最简单的线性模型中也是如此。在本文件中,我们用高维贝叶氏线性直径直线性模型中,我们用异调调调调调等调调调调调调调调调器的近最佳性测试程序。我们建议采用模型X多调制测试程序PoedCe,即使根据模型的错误描述,这些方法也能同时同时发现更多发现更多的发现数量。当数据遵循Bayesian-线性模型时,PDRive-devical 仍然有三种重要成份要素:远近端的预期、静端端端端端端端端端端端端端端端端端的统计-直径的精确度计算方法,作为正的精确的精确度的精确度计算方法,以亚化方法,以亚化为最深级的精确的亚化方法以亚化方法显示。