Two major research tasks lie at the heart of high dimensional data analysis: accurate parameter estimation and correct support recovery. The existing literature mostly aims for either the best parameter estimation or the best model selection result, however little has been done to understand the potential interaction between the estimation precision and the selection behavior. In this work, our minimax result shows that an estimator's performance of type I error control directly links with its $L_2$ estimation error rate, and reveals a trade-off phenomenon between the rate of convergence and the false discovery control: to achieve better accuracy, one risks yielding more false discoveries. In particular, we characterize the false discovery control behavior of rate optimal and rate suboptimal estimators under different sparsity regimes, and discover a rigid dichotomy between these two estimators under near-linear and linear sparsity settings. In addition, this work provides a rigorous explanation to the incompatibility phenomenon between selection consistency and rate minimaxity which has been frequently observed in the high dimensional literature.
翻译:高维数据分析的核心是两大研究任务:准确的参数估计和正确的支持恢复。现有文献主要是为了最佳的参数估计或最佳的模型选择结果,然而,在理解估计精确度和选择行为之间潜在互动方面做得很少。在这项工作中,我们的微量分析结果表明,估算器的I型错误控制性能与其$L_2的估算误差率直接相关,并揭示了趋同率和虚假发现控制之间的权衡现象:为了实现更好的准确性,一种风险产生更多的虚假发现。特别是,我们把不同广度制度下的率最佳和速率次优估测器的虚假发现控制行为定性,并发现在近线性和线性宽度环境中这两种估计器之间的僵硬分裂。此外,这项工作对选择一致性和速率最小度之间的不相容性现象作了严格的解释,高度文献中经常观察到这种现象。