We study the Sparse Plus Low Rank decomposition problem (SLR), which is the problem of decomposing a corrupted data matrix $\mathbf{D}$ into a sparse matrix $\mathbf{Y}$ containing the perturbations plus a low rank matrix $\mathbf{X}$. SLR is a fundamental problem in Operations Research and Machine Learning arising in many applications such as data compression, latent semantic indexing, collaborative filtering and medical imaging. We introduce a novel formulation for SLR that directly models the underlying discreteness of the problem. For this formulation, we develop an alternating minimization heuristic to compute high quality solutions and a novel semidefinite relaxation that provides meaningful bounds for the solutions returned by our heuristic. We further develop a custom branch and bound routine that leverages our heuristic and convex relaxation that solves small instances of SLR to certifiable near-optimality. Our heuristic can scale to $n=10000$ in hours, our relaxation can scale to $n=200$ in hours, and our branch and bound algorithm can scale to $n=25$ in minutes. Our numerical results demonstrate that our approach outperforms existing state-of-the-art approaches in terms of the MSE of the low rank matrix and that of the sparse matrix.
翻译:我们研究的是Sparse Plus 低层分解问题(SLR),这是将腐败的数据矩阵 $\ mathbf{D} 解密成一个稀薄的矩阵 $\ mathbf{Y} 美元的问题。 SLR是许多应用中产生的操作研究和机器学习的根本问题,例如数据压缩、潜在语义索引、协作过滤和医学成像。我们为SLR引入了一种新颖的配方,直接模拟问题的基本离散性。为此,我们开发了一种交替最小化的最小值,以计算出高质量的解决方案,以及一种新型半永久性的放松,为我们由超模数返回的解决方案提供了有意义的界限。我们进一步开发了一种定制的分支和约束性常规,利用了我们的超量和二次松绑定的松绑定,解决了SLR的小型案例,以近于最理想性地验证。我们的超低层结构可以在数小时内将成本缩成1 000美元,我们的放松度在数小时内可以缩到$=200美元,我们的低层矩阵中以正位方式展示我们目前的数字和闭式的矩阵。