An increasing number of data science and machine learning problems rely on computation with tensors, which better capture the multi-way relationships and interactions of data than matrices. When tapping into this critical advantage, a key challenge is to develop computationally efficient and provably correct algorithms for extracting useful information from tensor data that are simultaneously robust to corruptions and ill-conditioning. This paper tackles tensor robust principal component analysis (RPCA), which aims to recover a low-rank tensor from its observations contaminated by sparse corruptions, under the Tucker decomposition. To minimize the computation and memory footprints, we propose to directly recover the low-dimensional tensor factors -- starting from a tailored spectral initialization -- via scaled gradient descent (ScaledGD), coupled with an iteration-varying thresholding operation to adaptively remove the impact of corruptions. Theoretically, we establish that the proposed algorithm converges linearly to the true low-rank tensor at a constant rate that is independent with its condition number, as long as the level of corruptions is not too large. Empirically, we demonstrate that the proposed algorithm achieves better and more scalable performance than state-of-the-art matrix and tensor RPCA algorithms through synthetic experiments and real-world applications.
翻译:越来越多的数据科学和机器学习问题依赖于与数据元件的计算,它们更好地捕捉数据比矩阵的多路关系和互动。在利用这一关键优势时,关键的挑战是如何制定计算高效和可变的正确算法,从对腐败和不和谐同时具有强大力的强力数据中提取有用信息。本文涉及强力强的主要组成部分分析(RPCA),其目的是在塔克分解法下,从被稀薄腐败污染的观测中回收低水平的强力。为了最大限度地减少计算和记忆足迹,我们提议直接恢复低维数因素 -- -- 从定制的光谱初始化开始 -- -- 通过按比例的梯度下降(SqandGD),加上一个递增式的临界值操作,以适应性地消除腐败的影响。理论上,我们确定,拟议的算法在与其条件数独立的固定速率上,线性地将真正的低压强力强的强力强力强力拉,只要腐败程度不太大。我们设想的算法在不过分大的情况下,直接恢复低维度的低维数。