We consider using gradient descent to minimize the nonconvex function $f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $\phi$ is an underlying smooth convex cost function defined over $n\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally rank deficient, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $r$ of the current iterate $X$ to be overparameterized with respect to the rank $r^{\star}$ of the global minimizer $X^{\star}$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $r=r^{\star}$ to a sublinear rate when $r>r^{\star}$, even when $\phi$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $X^{\star}$.
翻译:我们考虑使用梯度下降来尽量减少非convex函数$f(X){{{{{{{{{{{{{{{{}}}}}美元,美元乘以美元系数基底的平滑曲线成本函数,以美元基底,以美元基底计算。虽然只有二级固定点X美元可以在合理的时间内找到,如果X美元是额外的不足,那么它的等级缺陷证明它是全球最佳的。这种认证方式必然要求当前意大利的搜索等级为X美元,相对于全球最低点的美元基底,美元基底是超度的。不幸的是,超度量化大大减缓了梯度下降的趋同速度,从线性率与$r={{ztar}美元拖慢到亚线性利率,即使美元是很强的。在本文中,我们提议一个较廉价的前提条件,可以恢复全球最低水平的美元基底基底值,同时使全球最低度下降速度恢复到全球水平的基底基底值,同时使全球的梯度下降速度恢复到可能的全球直度。