Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems. The theory behind such empirical observations, however, is still largely unknown. This paper studies this fundamental problem through investigating the nonconvex rectangular matrix factorization problem, which has infinitely many global minima due to rotation and scaling invariance. Hence, gradient descent (GD) can converge to any optimum, depending on the initialization. In contrast, we show that a perturbed form of GD with an arbitrary initialization converges to a global optimum that is uniquely determined by the injected noise. Our result implies that the noise imposes implicit bias towards certain optima. Numerical experiments are provided to support our theory.
翻译:许多实证证据证实了噪音在非convex优化问题中的重要性,然而,这种实证观察背后的理论基本上仍不为人所知。本文通过调查非convex矩形矩阵因子化问题研究这一根本问题,由于旋转和缩放变化,这个问题在全球范围造成了无限的微小问题。因此,梯度下移(GD)可以达到任何最佳程度,视初始化情况而定。相比之下,我们表明,一种带有任意初始化的环绕式GD形式与一种由注入的噪音决定的独特的全球最佳环境相汇而成。我们的结果意味着,这种噪音会给某些Opima带来隐含的偏差。提供了数字实验来支持我们的理论。