Two major research tasks lie at the heart of high dimensional data analysis: accurate parameter estimation and correct support recovery. Existing literature mostly aim for either the best parameter estimation or the best model selection result, however little has been done to understand the potential interaction between the estimation precision and the selection behavior. In this work, our minimax result shows that an estimator's performance of type I error control critically depends on its $L_2$ estimation error rate, and reveals a trade-off phenomenon between the rate of convergence and the false discovery control: better estimation accuracy leads to more false discoveries. In particular, we characterize the false discovery control behavior of rate-optimal and rate-suboptimal estimators under different sparsity regimes, and discover a rigid dichotomy between these two estimators under near-linear and linear sparsity settings. In addition, this work provides a rigorous explanation to the incompatibility phenomenon between selection consistency and rate-minimaxity which has been frequently observed in the high dimensional literature.
翻译:高维数据分析的核心是两大研究任务:准确的参数估计和正确的支持恢复。现有文献主要针对最佳参数估计或最佳的模型选择结果,然而,在理解估算精确度和选择行为之间潜在互动方面做得很少。在这项工作中,我们的微量分析结果表明,估算器在I型错误控制中的性能严重取决于其$L_2的估算误差率,并揭示了趋同率和虚假发现控制之间的权衡现象:更好的估计精确度导致更多的虚假发现。特别是,我们把不同宽度制度下的率-最佳和比率-次优度-估计器的错误发现控制行为定性,并发现在近线性和线性环境下这两个估计器之间的僵硬分法。此外,这项工作对选择一致性和最小度-最小度之间的不相容性现象提供了严格的解释,高维量文献中经常观察到这种现象。