Eigenvector perturbation analysis plays a vital role in various data science applications. A large body of prior works, however, focused on establishing $\ell_{2}$ eigenvector perturbation bounds, which are often highly inadequate in addressing tasks that rely on fine-grained behavior of an eigenvector. This paper makes progress on this by studying the perturbation of linear functions of an unknown eigenvector. Focusing on two fundamental problems -- matrix denoising and principal component analysis -- in the presence of Gaussian noise, we develop a suite of statistical theory that characterizes the perturbation of arbitrary linear functions of an unknown eigenvector. In order to mitigate a non-negligible bias issue inherent to the natural ``plug-in'' estimator, we develop de-biased estimators that (1) achieve minimax lower bounds for a family of scenarios (modulo some logarithmic factor), and (2) can be computed in a data-driven manner without sample splitting. Noteworthily, the proposed estimators are nearly minimax optimal even when the associated eigen-gap is {\em substantially smaller} than what is required in prior statistical theory.
翻译:在各种数据科学应用中,基因突扰分析发挥着关键作用。然而,许多先前的著作都侧重于建立 $\ ell\ ⁇ 2 } $ egenvestor accurbation burbation 边框,这些边框往往严重不足,无法解决依赖基因突变器精细微行为的任务。本文件通过研究未知树皮突变动器线性功能的扰动作用,在这方面取得进展。在高西亚噪音的出现下,我们集中关注两个基本问题 -- -- 矩阵分解和主要组成部分分析 -- --,我们开发了一套统计理论,将未知的脑突扰动线性功能定性为特征。为了减轻自然“插出”估计器所固有的一个不可忽略的偏差问题,我们开发了非偏差估计器,以便(1) 为各种情景组合达到微轴下界(微调某些对数系数),以及(2) 以数据驱动的方式进行计算,而不进行抽样分离。注意的是,拟议的统计理论先期最优的模型几乎是最低的。