We propose $\textsf{ScaledGD($\lambda$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($\lambda$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($\lambda$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($\lambda$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
翻译:我们提议 $\ textsf{ SergeredGD ($\ lambda$)} 美元, 一种解决低级别矩阵感测问题的先决条件性梯度下降方法, 当真实级别未知时, 当矩阵可能条件不成熟时, 解决低级别矩阵感测问题。 $\ textsf{ Serged ($\ lambda$) 美元 使用过度平衡系数表示, $\ textsf{ SergedGD ($\ lambda$) 开始于小规模随机初始化, 以梯度下降收益为特定形式的梯度收益, 防止过度参数化和不完善。 $textsfsfda$ 的光度计算计算成本, 美元=textsgdgd ($\ lambda$) 与香草梯梯色梯度下降值相比, 美元对于不合理调节的调节性调整性($) 。 我们具体地显示, 在高层次设计下, 美元 将真实的低级矩阵化矩阵化矩阵化的模型组合组合, 只能使我们 的递增缩缩化 。