We derive a formula for optimal hard thresholding of the singular value decomposition in the presence of correlated additive noise; although it nominally involves unobservables, we show how to apply it even where the noise covariance structure is not a-priori known or is not independently estimable. The proposed method, which we call ScreeNOT, is a mathematically solid alternative to Cattell's ever-popular but vague Scree Plot heuristic from 1966. ScreeNOT has a surprising oracle property: it typically achieves exactly, in large finite samples, the lowest possible MSE for matrix recovery, on each given problem instance - i.e. the specific threshold it selects gives exactly the smallest achievable MSE loss among all possible threshold choices for that noisy dataset and that unknown underlying true low rank model. The method is computationally efficient and robust against perturbations of the underlying covariance structure. Our results depend on the assumption that the singular values of the noise have a limiting empirical distribution of compact support; this model, which is standard in random matrix theory, is satisfied by many models exhibiting either cross-row correlation structure or cross-column correlation structure, and also by many situations where there is inter-element correlation structure. Simulations demonstrate the effectiveness of the method even at moderate matrix sizes. The paper is supplemented by ready-to-use software packages implementing the proposed algorithm.
翻译:在相关添加的噪音面前,我们为单值分解的最大硬阈值制定了一个公式;虽然名义上它涉及不可观察的杂杂音,但我们展示了如何应用它,即使噪音共变结构并不是已知的优先要闻或不可独立估计的。我们称之为ScreeNOT的拟议方法,是Cattell的1966年永远流行但模糊的Scree Plot heuristic 的数学而坚实的替代方法。ScreeNOT有一个令人惊讶的外观属性:它通常在大型有限样本中,在每种问题实例中都达到最可能最小的矩阵回收 MSE(MSE),即它选择的具体阈值在所有可能的临界值选择中给该噪音数据集带来最小最小的可实现的 MSE损失,以及这个未知的真正低级模型模型。这个方法与Cattelltelly和Ctreal Control 结构的扭曲性和稳健和稳健性。我们的结果取决于以下假设,即噪音的单一值是对契约支持的经验性分布;这个模型是标准的随机矩阵理论标准,许多模型或相互对比结构,在其中展示了相互对等结构的对比结构。