Low-rank matrix estimation is a canonical problem that finds numerous applications in signal processing, machine learning and imaging science. A popular approach in practice is to factorize the matrix into two compact low-rank factors, and then optimize these factors directly via simple iterative methods such as gradient descent and alternating minimization. Despite nonconvexity, recent literatures have shown that these simple heuristics in fact achieve linear convergence when initialized properly for a growing number of problems of interest. However, upon closer examination, existing approaches can still be computationally expensive especially for ill-conditioned matrices: the convergence rate of gradient descent depends linearly on the condition number of the low-rank matrix, while the per-iteration cost of alternating minimization is often prohibitive for large matrices. The goal of this paper is to set forth a competitive algorithmic approach dubbed Scaled Gradient Descent (ScaledGD) which can be viewed as pre-conditioned or diagonally-scaled gradient descent, where the pre-conditioners are adaptive and iteration-varying with a minimal computational overhead. With tailored variants for low-rank matrix sensing, robust principal component analysis and matrix completion, we theoretically show that ScaledGD achieves the best of both worlds: it converges linearly at a rate independent of the condition number of the low-rank matrix similar as alternating minimization, while maintaining the low per-iteration cost of gradient descent. Our analysis is also applicable to general loss functions that are restricted strongly convex and smooth over low-rank matrices. To the best of our knowledge, ScaledGD is the first algorithm that provably has such properties over a wide range of low-rank matrix estimation tasks.
翻译:在信号处理、机器学习和成像科学中,低层次的矩阵估算是一个典型问题,在信号处理、机器学习和成像科学中有许多应用,这是一个典型问题。在实践中,流行的方法是将矩阵的趋同率化成两个低级缩略图,然后通过简单的迭代方法直接优化这些因素,例如梯度下降和交替最小化。尽管不相干,但最近的文献显示,这些简单的顺差性事实上实现了线性趋同,而对于越来越多的感兴趣的问题,这些简单的顺差在初始化时可以被恰当地初始化为线性趋同。然而,经过仔细研究,现有方法仍然可以计算得非常昂贵,特别是对于条件差的矩阵:梯度下降的趋同率取决于低级矩阵的条件数量,而交替最小化的最小化最小化成本成本成本成本成本成本,交替最小化的最小化成本成本成本成本成本成本成本。 本文的目标是提出具有竞争性的缩略缩略图(Squstill),我们从最低水平的基数到最低水平的基数,我们从最低水平的基数的基数到最精确的基数的基数的基数都显示,最精确的基数的基数是最佳的递增缩缩缩缩分析。