We study the Sparse Plus Low-Rank decomposition problem (SLR), which is the problem of decomposing a corrupted data matrix into a sparse matrix of perturbations plus a low-rank matrix containing the ground truth. SLR is a fundamental problem in Operations Research and Machine Learning which arises in various applications, including data compression, latent semantic indexing, collaborative filtering, and medical imaging. We introduce a novel formulation for SLR that directly models its underlying discreteness. For this formulation, we develop an alternating minimization heuristic that computes high-quality solutions and a novel semidefinite relaxation that provides meaningful bounds for the solutions returned by our heuristic. We also develop a custom branch-and-bound algorithm that leverages our heuristic and convex relaxations to solve small instances of SLR to certifiable (near) optimality. Given an input $n$-by-$n$ matrix, our heuristic scales to solve instances where $n=10000$ in minutes, our relaxation scales to instances where $n=200$ in hours, and our branch-and-bound algorithm scales to instances where $n=25$ in minutes. Our numerical results demonstrate that our approach outperforms existing state-of-the-art approaches in terms of rank, sparsity, and mean-square error while maintaining a comparable runtime.
翻译:本文研究稀疏加低秩分解问题(SLR),即将一组损坏的数据矩阵分解成包含真实信息的低秩矩阵和包含扰动的稀疏矩阵。SLR是运筹学和机器学习中的基本问题,适用于各种应用领域,包括数据压缩、潜在语义索引、协同过滤和医学成像。我们提出了一种新颖的SLR形式,直接建模其离散性。针对此形式,我们开发了一种交替最小化算法,用于计算高质量解,并提供了一种新颖的半定松弛方法,用于对我们的算法的解提供有意义的界限。我们还开发了一种定制化的分支定界算法,利用我们的交替最小化算法和凸松弛求解小规模SLR问题,以近乎最优的方式解决问题。对于输入的nxn矩阵,我们的启发式算法可以在几分钟内解决n=10000的情况,我们的松弛算法可以在数小时内解决n=200的情况,而我们的分支定界算法可以在数分钟内解决n=25的情况。我们的数值结果表明,相比其他最先进的方法,我们的方法在秩、稀疏性和均方误差方面表现更好,同时保持了相当的运行时间。