Eigenvector perturbation analysis plays a vital role in various statistical data science applications. A large body of prior works, however, focused on establishing $\ell_{2}$ eigenvector perturbation bounds, which are often highly inadequate in addressing tasks that rely on fine-grained behavior of an eigenvector. This paper makes progress on this by studying the perturbation of linear functions of an unknown eigenvector. Focusing on two fundamental problems -- matrix denoising and principal component analysis -- in the presence of Gaussian noise, we develop a suite of statistical theory that characterizes the perturbation of arbitrary linear functions of an unknown eigenvector. In order to mitigate a non-negligible bias issue inherent to the natural "plug-in" estimator, we develop de-biased estimators that (1) achieve minimax lower bounds for a family of scenarios (modulo some logarithmic factor), and (2) can be computed in a data-driven manner without sample splitting. Noteworthily, the proposed estimators are nearly minimax optimal even when the associated eigen-gap is substantially smaller than what is required in prior theory.
翻译:在各种统计数据科学应用中,基因突扰分析发挥着关键作用。然而,许多先前的著作都侧重于建立 $\ ell\ ⁇ 2 } $ egenvictor accurbation sublication line spective complications discriptions complication complication complications in explication discriminations discrible at explications. 但是,我们以前的工作重点是建立 $\ ell\ ⁇ 2}$ $ egenevicent sublication subilation sublications, 而这一系列统计理论的特征是未知的密封线性功能。为了减轻自然“ 插图” 估测仪所固有的一个不可忽略的偏差问题,我们通过研究一个未知的树皮质的树叶动功能的线性功能的扰动性测,从而在这方面取得进展。 侧重于两个基本问题 -- 矩阵分解和主要组成部分分析 -- -- -- 矩阵分解的分解分析 -- -- -- 在样本中,我们所拟议的估测算的模型中,即使小的理论中, 也要求小于最优于最优。