We derive a formula for optimal hard thresholding of the singular value decomposition in the presence of correlated additive noise; although it nominally involves unobservables, we show how to apply it even where the noise covariance structure is not a-priori known or is not independently estimable. The proposed method, which we call ScreeNOT, is a mathematically solid alternative to Cattell's ever-popular but vague Scree Plot heuristic from 1966. ScreeNOT has a surprising oracle property: it typically achieves exactly, in large finite samples, the lowest possible MSE for matrix recovery, on each given problem instance - i.e. the specific threshold it selects gives exactly the smallest achievable MSE loss among all possible threshold choices for that noisy dataset and that unknown underlying true low rank model. The method is computationally efficient and robust against perturbations of the underlying covariance structure. Our results depend on the assumption that the singular values of the noise have a limiting empirical distribution of compact support; this model, which is standard in random matrix theory, is satisfied by many models exhibiting either cross-row correlation structure or cross-column correlation structure, and also by many situations where there is inter-element correlation structure. Simulations demonstrate the effectiveness of the method even at moderate matrix sizes. The paper is supplemented by ready-to-use software packages implementing the proposed algorithm: package ScreeNOT in Python (via PyPI) and R (via CRAN).
翻译:精准的ScreeNOT: 针对相关噪声的确切MSE最优奇异值阈值化
摘要:
本文推导了一个公式,用于在存在相关性的加性噪声时进行最优硬阈值化的奇异值分解。虽然它名义上涉及不可观测的量,但我们展示了如何应用它,即使噪声协方差结构不是先验知识,或者无法独立地估计。我们称之为ScreeNOT的方法是一个数学上坚实的替代1966年Cattell广为流传的但模糊的Scree Plot启发式方法。ScreeNOT具有惊人的神谕特性:它通常在每个给定的问题实例上,在有限大的样本中恰好实现最小的可达MSE损失。也就是说,它选定的特定阈值在原始数据集和未知的真实低秩模型上,给出了所有可能的阈值选择中得到最小可达MSE损失。该方法计算效率高,并且对于基础协方差结构的扰动具有鲁棒性。我们的结果取决于噪声的奇异值具有紧致支持的极限经验分布的假设。这个模型在随机矩阵理论中很常见,并适用于许多模型,这些模型表现为跨行相关结构或跨列相关结构,以及许多存在元素间的相关关系结构的情况。模拟证明了即使在适度的矩阵尺寸下,该方法的有效性。本文附带使用所提出算法的准备好的软件包:Python(通过PyPI)和R(通过CRAN)中的ScreeNOT软件包。