Simultaneously identifying contributory variables and controlling the false discovery rate (FDR) in high-dimensional data is an important statistical problem. In this paper, we propose a novel model-free variable selection procedure in sufficient dimension reduction via data splitting technique. The variable selection problem is first connected with a least square procedure with several response transformations. We construct a series of statistics with global symmetry property and then utilize the symmetry to derive a data-driven threshold to achieve error rate control. This method can achieve finite-sample and asymptotic FDR control under some mild conditions. Numerical experiments indicate that our procedure has satisfactory FDR control and higher power compared with existing methods.
翻译:同时确定贡献变量和控制高维数据的虚假发现率(FDR)是一个重要的统计问题。在本文中,我们提出一个新的无模型的变量选择程序,通过数据分离技术充分减少维度。变量选择问题首先与一个最小的平方程序相关,同时进行若干反应变换。我们用全球对称属性构建了一系列统计数据,然后利用对称得出数据驱动阈值,以实现误差率控制。这种方法可以在一些温和条件下实现有限抽样和无药可治的FDR控制。数字实验表明,我们的程序具有令人满意的FDR控制和比现有方法更高的威力。