We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($λ$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($λ$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
翻译:针对真实秩未知且矩阵可能病态的低秩矩阵感知问题,我们提出一种预条件梯度下降方法$\textsf{ScaledGD($λ$)}$。该方法采用过参数化的因子表示,从小型随机初始化出发,通过具有特定形式阻尼预条件的梯度下降来克服过参数化与病态性导致的恶劣曲率。虽然预条件器会带来轻微的计算开销,但相比普通梯度下降法($\textsf{GD}$),$\textsf{ScaledGD($λ$)}$即使在过参数化情况下也对病态性表现出显著鲁棒性。具体而言,我们证明在高斯设计下,$\textsf{ScaledGD($λ$)}$仅需经过迭代次数与条件数及问题维度呈对数关系的前期阶段,便能以恒定线性速率收敛到真实低秩矩阵。这显著改善了普通$\textsf{GD}$受限于条件数多项式依赖的收敛速率。我们的研究为预条件技术在加速过参数化学习收敛且不损害泛化能力方面提供了理论依据。