We consider using gradient descent to minimize the nonconvex function $f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $\phi$ is an underlying smooth convex cost function defined over $n\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally rank deficient, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $r$ of the current iterate $X$ to be overparameterized with respect to the rank $r^{\star}$ of the global minimizer $X^{\star}$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $r=r^{\star}$ to a sublinear rate when $r>r^{\star}$, even when $\phi$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $X^{\star}$.
翻译:利用预处理梯度下降的方式实现过参数化非凸Burer-Monteiro分解并获得全局最优解证明
翻译后的摘要:
我们研究使用梯度下降来最小化非凸函数 $f(X)=\phi(XX^{T})$ ,其中 $\phi$ 是在 $n\times n$ 矩阵上定义的平滑凸代价函数。虽然只能在合理时间内证明找到二阶稳定点 $X$ ,但如果 $X$ 还是秩不足的,则它的秩缺失证明它是全局最优解。这种证明全局最优解的方式必须要求当前迭代 $X$ 的搜索秩 $r$ 相对于全局最小化器 $X^{\star}$ 的秩 $r^{\star}$ 过于参数化。不幸的是,参数化会大大减缓梯度下降的收敛速度,从 $r=r^{\star}$ 线性速率变为 $r>r^{\star}$ 的次线性速率,甚至当 $\phi$ 是强凸的时也同样如此。在本文中,我们提出了一个廉价的预处理器,使梯度下降的收敛速度在过参数化情况下恢复到线性,并使其对全局最小化器 $X^{\star}$ 的可能的病态不敏感。